In my experiments with vibe coding, I found that LLMs (Large Language Models) struggle with Lisp code. I think I know why.
Consider some library that exposes some resources to the
programmer. It has an AllocFoo
function that allocates
a Foo
object, and a FreeFoo
function that
frees it. The library his bindings in several languages, so maybe
there is a Python binding, a C binding, etc. In these languages,
you'll find that functions that call AllocFoo
often call FreeFoo
within the same function. There are
a lot of libraries that do this, and it is a common pattern.
Documents, such as source code files, can be thougth of as
“points” in a very high dimensional space. Source code
files in a particular language will be somewhat near each other in a
region of this space. But within the region of space that contains
source code in some language, there will be sub-regions that exhibit
particular patterns. There will be a sub-region that
contains Alloc
/Free
pairs. This
sub-region will be displaced from the center of the region for the
language. But here's the important part: in each language, independent of the particulars of the language, the
subregion that contains Alloc
/Free
pairs
will be displaced in roughly the same direction. This is how the
LLM can learn to recognize the pattern of usage across different
languages.
When we encounter a new document, we know that if it is going to
contain an Alloc
/Free
pair, it is going to
be displaced in the same direction as other documents that contain
such pairs. This allows us to pair
up Alloc
/Free
calls in code we have never
seen before in languages we have never seen before.
Now consider Lisp. In Lisp, we have a function that allocates a
foo object, and a function that frees it. The LLM would have no
problem pairing up alloc-foo
and free-foo
in Lisp. But Lisp programmers don't do that. They write
a with-foo
macro that contains
an unwind-protect
that frees the foo when the code is
done. The LLM will observe the alloc/free pair in the source code
of the macro — it looks like your typical alloc/free pair
— but then you use the macro everywhere instead of the
explicit calls to Alloc
/Free
. The LLM
doesn't know this abstraction pattern. People don't
write with-foo
macros or their equivalents in other
languages, so the LLM doesn't have a way to recognize the pattern.
The LLM is good at recognizing patterns, and source code typically contains a lot of patterns, and these patterns don't hugely vary across curly-brace languages. But when a Lisp programmer sees a pattern, he abstracts it and makes it go away with a macro or a higher-order function. People tend not to do that in other languages (largely because either the language cannot express it or it is insanely cumbersome). The LLM has a much harder time with Lisp because the programmers can easily hide the patterns from it.
I found in my experiments that the LLMs would generate Lisp code
that would allocate or initialize a resource and then add
deallocation and uninitialization code in every branch of the
function. It did not seem to know about the with-…
macros that would abstract this away.
No comments:
Post a Comment