Abstract Heresies

Tuesday, April 27, 2010

More on persistence

I haven't forgotten about persistence, I've just been busy.

When I last blogged about it, I discussed the need for an object-map. The object-map maps logical object ids to the physical location that the durable store placed the object. We'll be putting a couple more kinds of things in the durable store, so at this point it is a good idea to add a record abstraction on top of the durable store. It is easier to show this than to describe it. Here are the actual contents of a test durable store:

;;; A persistent store.  Do not edit.
(object 1 929 <string> 1 "zip")
(leaf 1 38)
(object 1 269 <string> 1 "zero?")
(leaf 1 82)
(branch 1 116 0 38)
(object 1 387 <string> 1 "yield-current-thread")
(leaf 1 148)
(branch 1 0 197 82)
(leaf 1 38)
(leaf 1 82)
(branch 1 242 230 148)
(object 1 535 <string> 1 "xsubstring-move!")
(leaf 1 277)
(branch 1 322 0 38)
(branch 1 242 335 148)
(object 1 295 <string> 1 "xsubstring-find-previous-char-in-set")

A ‘record’ is simply a list. The first element is one of object, leaf, or branch. The second element is the version number of the record layout in case I decide to change how the record is represented. The remaining elements are the contents of the record. For a leaf, it is simply the address of the object that this leaf represents. For a branch, it is the addresses of the left and right child and the address of the object the branch represents. (This is using a functional binary tree as detailed in the code.) An object record has four additional components. First is the object-id. Second is the data tag of the object. Third is a version number of the serialization method for the object and finally the serialized representation of the object itself.

This is clearly not the most parsimonious representation, and it could be made much more robust by including record length information, checksums or CRCs, and by ensuring that user data (such as those strings) cannot spoof a record boundary, but this is demonstration code, not production code.

In the next installment, we add another record type.

Friday, April 23, 2010

The Education of JRM

I learn so much from the Internet. Yesterday, someone posted a link to my post about Java. I got close to 3 orders of magnitude increase in traffic. Many people commiserated with me, and I thank them. Some, however, went the extra mile:

An ellipsis has three dots.

Mea culpa. I was sloppy. Fortunately, another reader defended me:

An ellipsis at the end of a sentence is three dots, followed by a period. He/she did it right.

Alas, if I did it right it was only by accident.

There should be a space between the ellipsis and the period. Putting four dots ala George Lucas is incorrect....

I'm flattered to be placed in such good company!

One reader pointed out that the problem might not lie in Java, but within me:

This is what reflection is for. If you can't solve this problem with reflection, then you don't really have reflection.

while others offered concrete advice:

Maybe you just need to learn to type faster.

and still others tactfully suggested I broaden my horizons:

ever code in C++? this isn't unique to Java, noob.

I thank you all for the advice and the spirit in which you offered it. I believe I'll reflect upon it this weekend — assuming I really have reflection. I will continue posting in the hope that I can be further edified.

Wednesday, April 21, 2010

Whenever I write code in Java....

Whenever I write code in Java I feel like I'm filling out endless forms in triplicate.

“Ok, sir, I'll just need your type signature here, here, and ... here. Now will this be everything, or...”

‘Well, I might need to raise an exception.‘

The compiler purses its lips.“An exception? Hmmm... let's see.... Yes, I think we can do that... I have the form over here... Yes, here it is. Now I need you to list all the exceptions you expect to raise here. Oh, wait, you have other classes? We'll have to file an amendment to them. Just put the type signature here, here, ... yes, copy that list of exceptions....

Student M overhears the argument between Students A and T. “What seems to be the problem?”

Student T explains, ‘We're at an impasse. I want to be able to change the semantics of the language...’

Student A says “... and I want the semantics to be stable.”

Student T says ‘It seems to me that if the program is a tool for solving a problem, then the language itself is just another tool. I should be able to make changes to the language if it helps solve the problem.’

Student A replies “And I have no problem with that in principle, but when I'm writing a program, I need to know what the language semantics are so I can have some idea about whether my program will work. If the semantics change after I write the program, I won't even know whether I have a program. Mangle the semantics all you want, but then tell me what they are and don't change them!”

Student T says ‘I'm not going to “mangle” the semantics. But I don't want to lock myself out of a solution for your convenience. I want to be able to take whatever baseline semantics we have and tweak them as appropriate, but I can't change the semantics before I write the program the changes them!’

Student A says “This is just impossible.”

Student M interrupts “Time out, everyone. Ok, now let me get this straight. You...” and he points to student A, “want the meaning of your code to be stable, and you...” (he points to student T) “ want to be able to tailor the semantics to suit you. That's simple! Why are you arguing?”

Students A and T look confused. ‘Simple?!’

Student M says, “Of course. You both agree that changing the language will change the semantics, and conversely changing the semantics changes the language...”

Students A and T nod.

Student M turns to student A “So you write your code in your language (let's call it L), and we'll be sure that L has stable, static semantics that never change.”

Student A says “That's what I'm talking about.”

Student T objects ‘Hold on! You're just completely ignoring me! What if I don't like the semantics?’

Student M turns to student T “I'm not done, yet. You want the freedom to morph language L into language L', which is very much like L, but with maybe a few changes...”

Student T interrupts ‘.. or a lot of changes. And not just L', I want L, L', L'', L''', whatever changes I think are necessary and whenever I think they are necessary.’

Student M continues “... or a lot of changes. I get you. Now here's the question: If I have a program in language L, but I have a machine that runs a different language — call it C — what do I do?”

Student T replies ‘Write an interpreter...’

Student A answers “ ... or a compiler.”

Student M says “Bingo! Problem solved. Simply use language L as the base language for an interpreter or compiler for programs in L' (or L'' or L''', etc.)”

Student A thinks for a moment then says “Works for me. If I need to understand code written in L', I can just compile it down to L.”

Student T looks dubious. ‘Wait a second. You're suggesting I write a compiler? What if I want several different language features to mix and match? I'm supposed to write compilers for each of them?!’

Student M shrugs “Well, you're the one that wants to change the language, you can't expect student A to write them. Besides, it isn't that much work.”

Student T protests ‘Not that much work? It's a compiler! Flow control, analysis, register allocation, code generation! I just want to tweak the language, not invent a whole new one from scratch!’

Student M counters “Who said you need to do all that? A compiler is simply a program that translates code from one language to another, right? If you can express the constructs of language L' (or L'' etc.) in terms of language L, then that's all you need to do. If you're really just tweaking the language, then your ‘compiler’ is mostly the identity function.“

‘And the parts that aren't?’

“Macros.”

Is student M's solution reasonable?

Thursday, April 15, 2010

Another wacky idea

Student T exclaims “I have a great idea!

“Sometimes I want to make a computer language that is very similar to an existing language with just a couple of exceptions. Wouldn't it be cool if we could just tweak the semantics of the existing language?”

Student A asks ‘What do you mean? What part of the semantics?‘

“Oh, I don't know... Anything!”

‘Anything? Anything at all?’

“Sure. Why limit ourselves?”

‘Like, say, adding or removing special forms?’

“Of course. Anything!”

‘How about changing from call-by-value to call-by-name?’

“That would be harder, but why not? Imagine you're running a program and you realize you need lazy evaluation for some part of it. Well, you just turn on lazy evaluation for that segment of code and turn it off again when it returns! Voila!”

‘What? While the code is running?’ Student A thinks for a moment and says, ‘So in essence you want the language semantics to be a function of time.’

Student T replies “No no no. I just want to be able to change them on the fly if I want.”

Student A says ’Yes, that's what I mean. At different times you could have different semantics.’

Student T says “Yes, but only if you change them.”

‘And how do I change them?’

“You just call a special primitive or something from your program.“

‘So if the language semantics can change over time, doesn't that imply that the meaning of a program can change over time as well?’

Is Student A right?

Wednesday, April 14, 2010

Let me propose these thoughts: a “semantics” is a way of associating meaning with programs. A “formal semantics” uses well-known mathematical models in an attempt to be thorough, precise, and consistent. An “informal semantics” uses human language, analogies, and examples in an attempt to be more concise and easier to understand.

As a programmer, you need to have some idea of semantics in order to get anything done. Even something as simple as the expression x + 1 presupposes that the symbol + has something to do with addition, the symbol 1 names a literal number, and the symbol x names something that can be understood numerically. Your understanding doesn't have to be complete (how does that supposed addition handle overflow?), and it doesn't have to be correct (maybe x is a string and + performs concatenation), or even consistent, but you can't work with no understanding whatsoever.

Does that seem like a reasonable point of view?

Tuesday, April 13, 2010

It's clear that you don't need to fully understand the formal semantics of a language in order to program in it. (There are a lot of languages for which the formal semantics haven't been specified.) How far could you get without knowing the informal semantics?