Tuesday, March 25, 2025

Vibe Coding in Common Lisp, continued

I unwedged the AI with regard to the package system, so I asked the AI to write the game loop.

Now there are a number of ways to approach a game loop, but there is one strategy that really wins: a nested set of with-... macros. These win because they tie resource management to the dynamic execution scope. You write with-window, and upon entry to the form, a window is allocated and initialized and comes into scope during the body of the code. When the body returns, the window is destroyed and deallocated.

These with-... macros are built around allocation/deallocation pairs of primitives that are called from an unwind protect. The abstraction is the inverse of a function call: instead using a function to hide a body of code, you use a function to wrap a body of code. The body of code doesn’t hide in the callee, but is passed as an argument from the caller. One important feature of programming in this way is that resources are never returned from a function, but are only passed downwards from the allocation point. This keeps objects from escaping their dynamic scope.

The entry to the game loop consists of a few nested with-... macros that initialize the library, allocate a window, allocate a drawing context, and enter a event loop. When the event loop exits, the resources are torn down in reverse order of allocation leaving the system in a clean state.

But the AI did not use with-... macros. The code it generated had subroutines that would create a window or allocate a drawing context, but it would assign the created objects into a global variable. This means that the object is set to escape its dynamic scope when it is created. There is nothing to prevent (or even discourage) access to the object after it has been deallocated. There were no unwind-protects anywhere, so objects, once allocated, were eternal — you could never close a window.

In the large, the code was built to fail. In the small, it immediately failed. Calling conventions were not even followed. Keyword agument functions were called with positional arguments, or with an odd number of arguments, irrelevant extra arguments were passed in, the AI would pass in flags that didn’t exist. We’ll grant that the AI does not ultimately understand what it is doing, but it should at least make the argument lists superficially line up. That doesn’t require AI, a simple pattern match can detect this.

The event loop did not load, let alone compile. It referred to symbols that did not exist. We’ll we can expect this, but it needs to be corrected. When I pointed out that the symbol didn’t exist, the AI began to thrash. It would try the same symbol, but with asterisks around it. It would try a variant of the same symbol. Then it would go back and try the original symbol again, try the asterisks again, try the same variant name again, etc. There is nothing to be done here but manual intervention.

There are some macros that set up an event loop, poll for an event, disptach to some code for that event while extracting the event particulars. You can roll your own event loop, or you can just use one of pre-built macros. When the AI began to thrash on the event loop, I intervened, deleted the code it was thrashing on and put in the event loop macro. The AI immediately put back in the code I had removed and started thrashing on it again.

Again, it is clear that the AI has no knowledge at all of what it is doing. It doesn’t understand syntax or the simplest of semantics. It cannot even tell if a symbol is bound to a value. Even the most junior developer won’t just make up functions that are not in the library. The AI doesn’t consult the documentation to validate if the generated code even remotely passes a sniff test.

You cannot “vibe code” Common Lisp. The AI begins to thrash and you simply must step in to get it unwedged. It doesn’t converge to any solution whatsoever. I suspect that this is because there is simply not enough training data. Common Lisp would appear to need some semantic understanding in order to write plausibly working code. Just mimicking some syntax you found on the web (which is ultimately what the AI is doing) will not get you very far at all.

5 comments:

fadrian said...

You are correct as to the cause of the problem - there simply isn't enough CL code out there to drive an answer through the LLM. I've had the same issue with Clojure. It's good for finding micro-functions that could have been found through web searching, but anything larger than that requires actual understanding - something that LLMs are notoriously bad at.

I think the larger worry is that if vibe coding becomes a thing, this will add even more emphasis to using top-ten languages, to the detriment of lesser-used, but more powerful, languages.

But even at it's best, vibe coding requires the prompter to carefully guide the AI to come up with code that is architecturally sound. This has been true when I've used it to generate Python code and I assume it would be true even if some other highly-used language (JavaScript, Java, SQL, etc.) were being generated.

So anyway, given its level of "expertise", which I would describe as "entry-level programmer with access to Stack Overflow", I'd like to say that this method will be little more than a fad. But sadly, much code is written by people who have this level of expertise, so I'm afraid that it's here to stay. Again, this has the probable outcome of making lesser-used languages even more lesser-used and more badly-written code being out there, leading to the detriment of the languages I love and of the industry as a whole. The only good news is once the mess is big enough, there will be good consultancy money in cleaning it up.

Anonymous said...

Today i had success writing Tetris in scheme (guile).

Do you think this is much easier an llm?

I had an architectural plan myself and worked in smal steps beginning with drawing a rectangle.

Joe Marshall said...

I honestly don't know. I see people vibe coding in javascript or Python and I think "I can do that ", but when it comes to Lisp, I have little success. It could be me.

Anonymous said...

I'm currently using the latest version of Claude Code to do some Common Lisp programming. It has seemed to take a giant leap in what it understands and the quality of code it produces. Though I am not using it in "vibe coding" (i.e. completely unguided coding other than via a single prompt), I am having a great time with it as an intelligent junior programmer. I'm guiding it through the design of an interpreter for a lisp dialect - what I call Functional Object Lisp or FOL - it's a cross between Common Lisp, Clojure, and Apple's prefix Dylan. All objects are persistent, there are transducers, all objects are immutable, and it has lazy sequences, and a readtable that gives it the same syntax as in Clojure; it keeps the class structure and MOP of CLOS rooting the object hierarchy in a class called which stores the slots in a persistent hash-table; it has the module structure and everything being a class like Dylan. It's written in Common Lisp and I'm finishing up the built-in function and macro coding right now. I was able to add multiple-destructuring-pattern dispatch to functions, generic functions, methods, and macros by describing the destructuring syntax and letting it go. Along the way, it's constructed over 4000 tests which seem to be fairly comprehensive. I'm currently adding cl-ppcre regular expressions and associated functions using relatively complex prompts. It was able to decode what I wanted, and even though it didn't use the algorithm I suggested, it was able to use iterative calls to scan to do what I wanted.

It also wrote about 50 tests and the documentation for the re-find and re-seq functions. So, all-in-all, I've been impressed. Things seem to be progressing quite well on the Lisp-coding front.

fadrian said...

Try Claude code. I've been using it as a coding assistant to code a Lisp interpreter for something I call Functional Object Lisp, or FOL. It's a cross between Common Lisp, Clojure, and Apple's old prefix Dylan. In it, everything is an object, it has a simplified error-handling structure using try/catch/throw, and I'm using the module system from Dylan; I'm using the object structure and MOP from Common Lisp; and From Clojure, I've stolen transducers, lazy sequences, object persistence, immutability, and a readtable that steals it's syntax. I'm finishing up the built-in functions and macros right now. As you probably know, Clojure has multi-arity dispatch, so when I went to add that to the language, I decided why not have full multi-destructuring-pattern dispatch, so I described the destructuring syntax to Claude code and, using the MOP, it was able to add it to functions, generic functions, methods, and macros without much guidance from me. It also handles the documentation for the system and has written over 4000 tests so far that seem to be relatively comprehensive. The code base is up to about 20,000 LOC - about 1/3 source code, 1/2 tests, and 1/6 documentation. And it's been written in about five weeks. All-in-all, a fairly impressive showing from Claude.