I've read some of the discussions here, as well as followed links to other explanations, but I'm still not able to understand the mathematical connection between "changing state" and "not changing state" as it pertains to our functional programming versus non-FP debate.

The concepts of "change" and "state"

*implicitly*involve some notion of time. A mathematical function does not.

Mathematical functions are, however, an incredibly useful concept. So, too, are the concepts of change and state.

There are two trivial solutions: ignore the problem and avoid the problem. We can simply ignore the problem of change and state and not attempt to use mathematical functions to model our program. Or we can avoid the problem of change and state and avoid programming in a way that makes implicit use of time. There are more complex solutions as well.

The "debate", such as it is, is whether the benefits of functional programming are worth the effort of either avoiding change or explicitly modeling it.

Then it get foggy in my mind.

All we need now is to sow discord.

Here's an example. Let's say I want to display closed, block-like polygons on an x-y field. In typical GIS software I understand everything is stored as directed, closed graphs, i.e. a square is four "vectors," their heads and ends connected. The raw data representation is just the individual Cartesian start and end points of each vector. And, of course, there is a "function" in the GIS software that "processes" all these coordinate sets.

Can we avoid overloading the word "function"? It will be confusing. Let's consider function to be a mathematical concept. I'll call the programming structure a subroutine, just to be old fashioned. A program is a collection of subroutines. If we so choose, we can avoid the implicit use of time in our programs and subroutines. If we do that, then it is easy to model our subroutines as mathematical functions, and model our program through functional composition.

So let me rephrase your sentence: And, of course, there is "program" in the GIS software that "processes" all these coordinate sets.

Good. But what if we represent each polygon in a mathematical way, e.g., a rectangle in the positive x, negative y quadrant might be:

Z = {(x,y) | 3 <= x <= 5, -2 <= y <= -1}

So we might have many Z-functions, each one expressing an individual polygon -- and not being a whiz with my matrix math, maybe these "functions" could then be represented as matrices . . . but I digress.

You're making a subtle shift in your point of view, and that may be part of what is confusing you. The earlier paragraph addressed the idea of using functions to model computer subroutines. This paragraph addresses the idea of using computer subroutines to model functions.

These are not equivalent concepts. Imperative programs, for example, can also use subroutines to model functions. But naturally, imperative programmers don't generally want to do this, and people that prefer functional programming tend to do this all the time.

Now if we can use a mathematical function to model a subroutine, then the code itself may be able to use that subroutine to model the mathematical function (these would be first-class procedures. It would be odd, but conceivable to have a functional language without first class procedures.). This is conditional on how we write our code. If we write in imperative style, then we may not be able to model subroutines as a mathematical functions. But if we write in functional style, we get both the ability to model subroutines as functions and to use those same routines to model the functions. You can ignore the distinction between code and the mathematical function it is equivalent to. Functional programmers think this is really cool. Imperative programmers are not so impressed.

So with the usual raw vector-data method, I might have one function in my code that "changes state" as it processes each set of coordinates and then draws each polygon (and then deals with polygons changing), while the one-and-only-one-Z-function-per-polygon method would seem to hold to the "don't change state" rule exactly -- or at least not change memory state all that much. Right? Or am I way off here? It seems like the old-fashioned, one-function-processing-raw-coordinate-data is not mutating the domain-range purity law really. I'm confused....

Slow down a bit. We're talking about a program and we want to analyze it. (By "analyze", I don't mean "gain a deep understanding", I mean "any effort to try to understand anything at all".) So in your hypothetical code you have polygons that are modeled by functional subroutines. That's great. You can model these parts of the code as mathematical functions.

But apparently, there is a subroutine in your program that "changes state". This means that there is an implicit time component, and therefore you cannot trivially model this part of the program with a mathematical function. That's all there is to it.

But the many individual polygon Z-functions method would seem to hold a perfect picture of state in memory -- no?

No. You have shifted to a different abstraction. What you mean by "state" in this case is an abstract set of polygons. Since the polygons are immutable, you could write a program which deals with this set of polygons in a completely functional manner. But there is no requirement that you do so. You could write an imperative program that mutates the data structures that hold the polygons.

Then I was reminded of my days (1980s) in Cartography when an object-oriented GIS package (written in Smalltalk) was being discussed. Everyone dismissed it as ludicrous because it tried to have exactly that: all the cartographic objects (roads, areal, symbols, polygons, etc.) in live memory at once. But isn't this ultimately what FP wants?

This is yet another abstraction. One could implement this imperatively or functionally. It's also a much older idea than the 80s.

You may be confusing yourself by attempting to shoehorn too many ideas into the word "functional".

Another avenue of my inspiration came from reading about a new idea of image processing where instead of slamming racks of pixels, each "frame" would be represented by one big function capable of quasi "gnu-plotting" the whole image: edges, colors, gradients, etc. Now I ask, Is this germane? I guess I'm trying to fathom why I would want to represent, say, a street map of polygons (e.g. city blocks) one way or the other.

You might want to render in different sizes. It is likely that a high resolution bitmap for a large window would take far more memory than the specification of how to generate the bitmap. Polygons carry more information than bitmaps. You might want to do something other than rendering.

I keep hearing functional language advocates dance around the idea that a mathematical function is pure and safe and good and ultimately Utopian, while the non-FP software function is some sort of sloppy kludge holding us back from Borg-like bliss.

I don't hang around the places you do.

But again, more confusing is memory management vis-a-vis FP versus non-FP. What I keep hearing (e.g. parallel programming) is that FP isn't changing a "memory state" as much as, say, a C/C++ program does.

Again, if you program functionally, you avoid introducting implicit time dependencies. This makes it much, much easier to determine where you can safely program in parallel (basically anywhere). You can do it the hard way, too.

Is this like the Google File System (or my Smalltalk GIS example) where literally everything is just sitting out there in a virtual memory pool, rather than being data moved in and out of databases, bzw. memory locations? Somehow I sense all these things are related.

Well, there is a connection. Notice you said "data moved in and out of databases". This has an implicit time component: sometimes the data is in, other times it is out. If we want to model our database with mathematical functions, we'll have a problem. One solution is to simplify the model by abstracting away the actual location of the data, that is, make it appear as if "literally everything is just sitting out there in a virtual memory pool". (Unfortunately this only gets you so far because the data is mutable.)

Therefore, it seems like the perfect FP program is just one single function (possibly made up of many sub-functions like nesting Russian dolls) doing all tasks

Programs of this sort are quite easy to write. I wouldn't call them perfect.

-- although a quick glance at any elisp code seems to be a study of programming schizophrenia on this count.

Nor would I call typical elisp code functional.