Wednesday, October 16, 2024

Lisp vs. golang

It's no secret that I'm an aficionado of Lisp. It's my go to language, especially when I don't know what I'm doing. I call it research and prototyping, but it's really just playing around until something works.

We had a need for some auditing of some of our databases at work. They ought to agree with each other and with what GitHub and CircleCI think. It took a couple of weeks part time to prototype a solution in Common Lisp. It showed that the databases were in 99% agreement and found the few points of disagreement and anomalies that we ought to fix or look out for.

I want to integrate this information into a dashboard on one of our tools. I prototyped this by spinning up a Common Lisp microservice that returns the information in JSON format.

But management prefers that new services are written in golang. It would be easier for me to rewrite the service in golang than to try to persuade others to use Common Lisp. It also gives me the opportunity to compare the two languages head to head on a real world problem.

No, this is not a fair comparison. When I wrote the Lisp code I was exploring the problem space and prototyping. I'm much more experienced with Lisp than with golang. The golang version has the advantage that I know what I want to do and how to do it. In theory, I can just translate the Common Lisp code into golang. But then again, this is a “second system” which is not a prototype and has slightly larger scope and fuller requirements. So this cannot be a true head to head comparison.

The first point of comparison is macros (or lack thereof). I generally don't use a lot of macros in Common Lisp, but they come in handy when I do use them. One macro I wrote is called audit-step, which you can wrap around any expresion and it prints out a message before and after the expression is evaluated. The steps are numbered in sequence, and nested steps get nested numbers (like step 2.3.1). If you wrap the major function bodies with this macro, you get a nice trace of the call sequence in the log.

Golang doesn't have macros, but it has first class functions. It's easy enough to write a function that takes a function as an argument and wraps it to output the trace messages. In fact, the macro version in Common Lisp just rewrites the form into such a function call. But the macro version hides a level of indentation and a lambda. In golang, my major functions all start with

func MajorFunction (args) int {
        return AuditStep("MajorFunction", "aux message", func() int {
                // body of MajorFunction
                // Actual code goes here.
        })    
}

The bodies of all my major functions are indented by 16 spaces, which is a little much.

I like higher order functions. I can write one higher order function and parameterize it with functions that handle the specific cases. In my auditing code, one such workhorse function is called collate. It takes a list of objects and creates a table that maps values to all objects in the list that contain that value. To give an example, imaging you have a list of objects that all have a field called foo. The foo field is a string. The collate function can return a table that maps strings to all objects that have that string in the foo field.

collate is very general. It takes a list of objects and four keyword arguments. The :key argument is a function that extracts the value to collate on. The :test argument is a function that compares two keys (it defaults to eql if not specified). The :merger argument is a function to add the mapped object to its appropriate collection in the table (it defaults to adjoin). The :default argument specifies the initial value of a collection in the table (it defaults to nil).

The :merger function is the most interesting. It takes the key and the object and the current value of the table at that key. It returns the new value of the table at that key. The default merger function is adjoin, which adds the object to the collection at the key if it is not already there. But you can specify a different merger function. For example, if you want to count the number of objects at each key, you can specify a merger function that increments a counter.

The functional arguments to the collate function are often the results of other higher order functions. For example, the :key argument is often the result of composing selector functions. The :merger argument is often the result of composing a binary merge function with a unary transformer function. The transformer function is often the result of composing a number of primitive selectors and transformers.

In Common Lisp, it is quite easy to write these higher order functions. We can compose two unary functions with the compose2 function:

(defun compose2 (f g)
  (lambda (x) (funcall f (funcall g x)))

and then compose as many functions as we like by fold-left of compose2 starting with the identity function:

(defun compose (&rest fs)
  (fold-left #'compose2 #'identity fs))

We can compose a binary function with a unary function in three ways: we can pipe the output of the binary function into the unary function, or we can pipe the output of the unary function into one or the other of the inputs of the binary function.

(defun binary-compose-output (f g)
  (lambda (x y) (funcall f (funcall g x y))))

(defun binary-compose-left (f g)
  (lambda (x y) (funcall f (funcall g x) y)))

(defun binary-compose-right (f g)
  (lambda (x y) (funcall f x (funcall g y))))

The collate function can now assume that a lot of the work is done by the :key and :merger functions that are passed in. It simply builds a hash table and fills it:

(defun collate (item &key (key #'identity) (test #'eql) (merger (merge-adjoin #'eql)) (default nil))
  (let ((table (make-hash-table :test test)))
    (dolist (item items table)
      (let ((k (funcall key item)))
        (setf (gethash k table) (funcall merger (gethash k table default) item))))))

(defun merge-adjoin (test)
  (lambda (collection item)
    (adjoin item collection :test test)))

So suppose, for example, that we have a list of records. Each record is a three element list. The third element is a struct that contains a string. We want a table mapping strings to the two element lists you get when you strip out the struct. This is easily done with collate:

(collate records
  :key (compose #'get-string #'third)
  :test #'equal      ; or #'string= if you prefer
  :merger (binary-compose-right (merge-adjoin #'equal) #'butlast))

The audit code reads lists of records from the database and from GitHub and from CircleCI and uses collate to build hash tables we can use to quickly walk and validate the data.

Translating this into golang isn't quite so easy. Golang has first class function, true, but golang is a statically typed language. This causes two problems. First, the signature of the higher order functions includes the types of the arguments and the return value. This means you cannot just slap on the lambda symbol, you have to annotate each argument and the return value. This is far more verbose. Second, higher order functions map onto parameterized (generic) types. Generic type systems come with their own little constraint language so that the computer can figure out what concrete types can correctly match the generic types. This makes higher order functions fairly unweildy.

Consider compose2. The functions f and g each have an input and output type, but the output type of g is the input type of f so only three types are involved

func Compose2[T any, U any, V any](f func(U) V, g func(T) U) func(T) V {
	return func(x T) V {
		return f(g(x))
	}
}

If want to compose three functions, we can write this:

func Compose3[T any, U any, V any, W any](f func(V) W, g func(U) V, h func(T) U) func(T) W {
	return func(x T) W {
		return f(g(h(x)))
	}
}
The generic type specifiers take up as much space as the code itself.

I don't see a way to write an n-ary compose function. It would have to be dynamically parameterized by the intermediate types of all the functions it was composing.

For the collate function, we can write this:

func Collate[R any, K comparable, V any](
	list *Cons[R],
	keyfunc func(R) K,
	merger func(V, R) V,
	defaultValue V) map[K]V {
	answer := make(map[K]V)
	for list != nil {
		key := keyfunc(list.Car)
		probe, ok := answer[key]
		if !ok {
			probe = defaultValue
		}
		answer[key] = merger(probe, list.Car)
		list = list.Cdr
	}
	return answer
}

We have three types to parameterize over: the type of the list elements (i.e. the record type) R, the type of the key K, and the type of the value V. The key type is needs to be constrained to be a valid key in a map, so we use the comparable constraint. Now that we have the types, we can annotate the arguments and return value. The list we are collating is a list of R elements. The key function takes an R and returns a K. The merger takes an existing value of type V and the record of type R and returns a new value of type V.

The magic of type inference means that I do not have to annotate all the variables in the body of the function, but the compiler cannot read my mind and infer the types of the arguments and return value. Golang forces you to think about the types of arguments and return values at every step of the way. Yes, one should be aware of what types are being passed around, but it is a burden to have to formally specify them at every step. I could write the Common Lisp code without worrying too much about types. Of couse the types would have to be consistent at runtime, but I could write the code just by considering what was connected to what. In golang, the types are in your face at every function definition. You not only have to think about what is connected to what, you have to think about what sort of thing is passed through the connection.

I'm sure that many would argue that type safety is worth the trouble of annotation. I don't want to argue that it isn't. But the type system is cumbersome, awkward, and unweildy, especially when you are trying to write higher order functions.

It is taking me longer to write the golang version of the audit service than it did to write the Common Lisp version. There are several reasons. First, I am more experienced with Common Lisp than golang, so the right Common Lisp idioms just come to mind. I have to look up many of the golang idioms. Second, the golang code is trying to do more than the Common Lisp code. But third, golang itself introduces more friction than Common Lisp. Programs have to do more than express the algorithm, they have to satisfy the type system.

There are more points of comparison between the two languages. When I get frustrated enough, I'll probably write another post.

Wednesday, July 31, 2024

Continuation passing style resource management

One good use of continuation passing style is to manage dynamic resources. The resource allocation function is written in continuation passing style and it takes a callback that it invokes once it has allocated and initialized the resource. When the callback exits, the resource is uninitialized and deallocated.

(defun call-with-resource (receiver)
  (let ((resource nil))
    (unwind-protect
        (progn
          (setq resource (allocate-resource))
          (funcall receiver resource))
      (when resource
        (deallocate-resource resource)))))

  ;; example usage:
  (call-with-resource
    (lambda (res)
      (do-something-with res)))

;;; In Lisp, we would provide a convenient WITH- macro
(defmacro with-resource ((resource) &body body)
  ‘(CALL-WITH-RESOURCE (LAMBDA (,resource) ,@body)))

  ;; example usage:
  (with-resource (res)
    (do-something-with res))

This pattern of usage separates and abstracts the resource usage from the resource management. Notice how the unwind-protect is hidden inside call-with-resource so that the user of the resource doesn’t have to remember to deallocate the resource.

The with-resource macro is idiomatic to Lisp. You obviously can’t provide such a macro in a language without macros, but you can still provide the call-with-resource function.

Continuation passing style for resource management can be used in other languages, but it often requires some hairier syntax. Because call-with-resource takes a callback argument, it is actually a higher-order function. The syntax for passing higher-order functions in many languages is quite cumbersome. The return value of the callback becomes the return value of call-with-resource, so the return type of the callback must be compatible with the return type of the function. (Hence the type of call-with-resource is actually parameterized on the return value of the callback.) Languages without sophisticated type inference may balk at this.

Another advantage of the functional call-with-resource pattern is that you can dynamically select the resource allocator. Here is an example. I want to resolve git hashes against a git repository. The git repository is large, so I don’t want to clone it unless I have to. So I write two resource allocators: CallCloningGitRepository, which takes the URL of the repository to clone, and CallOpeningGitRepository which takes the pathname of an already cloned repository. The "cloning" allocator will clone the repository to a temporary directory and delete the repository when it is done. The "opening" allocator will open the repository and close it when it is done. The callback that will be invoked won’t care which allocator was used.

Here is what this looks like in golang:

// Invoke receiver with a temporary directory, removing the directory when receiver returns.
func CallWithTemporaryDirectory(dir string, pattern string, receiver func(dir string) any) any {
	dir, err := os.MkdirTemp(dir, pattern)
	CheckErr(err)
	defer os.RemoveAll(dir)

	return receiver(dir)
}

// Invoke receiver with open git repository.
func CallOpeningGitRepository(repodir string, receiver func(string, *git.Repository) any) any {
	repo, err := git.PlainOpen(repodir)
	if err != nil {
		log.Fatal(err)
	}
	return receiver(repodir, repo)
}

// Invoke receiver with a cloned git repository, removing the repository when receiver returns.
func CallCloningGitRepository(dir string, pattern string, url string, receiver func(tmpdir string, repo *git.Repository) any) any {
	if url == "" {
		return nil
	}
	return CallWithTemporaryDirectory(
		dir,
		pattern,
		func(tempdir string) any {
			log.Print("Cloning " + url + " into " + tempdir)
			repo, err := git.PlainClone(tempdir, true, &git.CloneOptions{
				Auth: &gitHttp.BasicAuth{
					Username: username,
					Password: password,
				},
				URL:      url,
				Progress: os.Stdout,
			})
			CheckErr(err)
			log.Print("Cloned.")
			return receiver(tempdir, repo)
		})
}

You specify a repository either with a URL or a pathname. We select the appropriate resource allocator based on whether the specifier begins with "https".

func RepositoryGetter (specifier string) func (receiver func(_ string, repo *git.Repository) any) any {
	if strings.EqualFold(specifier[0:5], "https") {
                return GetRemoteGitRepository (specifier)
	} else {
                return GetLocalGitRepository (specifier)
	}
}

func GetRemoteGitRepository(url string) func(receiver func(_ string, repo *git.Repository) any) any {
	return func(receiver func(_ string, repo *git.Repository) any) any {
		return CallCloningGitRepository("", "git", url, receiver)
	}
}

func GetLocalGitRepository(repodir string) func(receiver func(_ string, repo *git.Repository) any) any {
	return func(receiver func(_ string, repo *git.Repository) any) any {
		return CallOpeningGitRepository(repodir, receiver)
	}
}

To open a repository, we call RepositoryGetter(specifier) to get a getRepository function. Then we invoke the computed getRepository function on a receiver callback that accepts the local pathname and the repository:

  getRepository := RepositoryGetter(specifier)

  return getRepository(
         func (_ string, repo *git.Repository) any {
                 // resolve git hashes against the repository
                 ....
                 return nil
         })

If given a URL, this code will clone the repo into a temporary directory and open the cloned repo. If given a pathname, it will just open the repo at the pathname. It runs the callback and does the necessary cleanup when the callback returns.

The biggest point of confusion in this code (at least to me) are the type specifiers of the functions that manipulate the resource allocators. Static types don’t seem to mix well with continuation passing style.

Wednesday, July 10, 2024

Monkeys vs. Shakespeare

Google's golang was designed for mediocre programmers that aren't “capable of understanding a brilliant language”. It omits features that are hard to understand or hard to use, like conditional expressions, macros, exceptions, and object systems.

Lisp, on the other hand, was originally designed for smart programmers in order to research artificial intelligence. It contains language constructs like conditional expressions, a powerful macro system, a highly customizable error handling system, and a sophisticated object system.

A million monkeys typing on a million keyboards will eventually produce the works of Shakespeare. Lisp is a language for Shakespeares, while golang is a language for monkeys.

What is the fundamental problem we are solving? If the problem is simply an insufficient amount of software, then we need to write as much software as possible as rapidly as possible. Hire more monkeys. If the problem is gainfully employing all the monkeys quickly, give them crude tools that are easy to use. But if the problem is lack of quality software, then we need the tools to write the best software possible. Hire more Shakespeares and give them powerful tools.

Hiring monkeys is easier and cheaper than hiring Shakespeares. So why not just keep hiring monkeys until you have a Shakespeare equivalent? Experience shows how well this works. Just think of all the projects that were brought in on time and under budget when they doubled the number of monkeys. Just think of any project that was brought in on time and under budget when they doubled the number of monkeys. Surely there must be one?

Thursday, July 4, 2024

A YouTuber's First Look at Lisp

There is a guy who is making a series of videos where he takes a “first look” at various programming languages. I watched his “first look” at Lisp. Now he wasn’t going into it completely cold — he had looked at Scheme previously and had seen some Lisp — but it was still an early experience for him.

He gave it a good go. He didn’t have a pathological hatred for parenthesis and seemed willing to give it a fair shake. He downloaded SBCL and started playing with it.

Since he only had allocated an hour to it, he didn't cover much. But he encountered a few pitfalls that Lisp newbies often fall into. I found myself wanting to talk back at the screen and point out where he was going wrong when simple typos were causing errors.

For example, he was trying to format some output, but he forgot the closing double quote on the format string. The reader kept waiting for him to finish the string and simply gobbled up the format args and closing parens as part of the string. He was confused as to why nothing was happening. When he finally typed Control-C, he was faced with a spew of stack trace and a debugger prompt that was of no help to him.

A second problem he encountered was when he typed (print (first (*mylist*))). Any experienced Lisp hacker will obviously see the problem right away: *mylist* is not a function. But the error message was buried in a 30 level deep stack trace that wound back through the reader and REPL and completely obscured the problem.

The Lisp debugger is amazing, but it can be a bit overwhelming to a newbie. When I was TAing Lisp classes, I would often find that students were intimidated by the debugger. They felt that while they were in the debugger there was some sort of impending doom hanging over them and that they had to exit and get back to the normal REPL as quickly as possible. I think if I were teaching a Lisp course now, I would start off by deliberately causing errors and then showing students how to use the debugger to find the problem and recover from it.

PLT Scheme has an interesting “beginner” mode that disables some of the more advanced features of Lisp. Many typos in Lisp cause errors not because they are invalid, but because they mean something advanced, but unintended. For example, the beginner mode disables first-class functions, so accidentally putting in extra parenthesis becomes a syntax error instead of an attempt to funcall something inappropriate. Perhaps a “training wheels” approach would work with Lisp beginners.

But SBCL generates pretty good error messages. It was a case of TLDR. The reviewer just didn't take the time to read and understand what the error message said. He was too busy trying to figure out how to get back to the REPL.

All in all, the reviewer gave it a good go and came away with a positive impression of Lisp. He also took he time to research the language and provided some examples of where Lisp is used today. The video is at https://www.youtube.com/watch?v=nVhzHu2zSEQ if you are interested.

Wednesday, June 26, 2024

Goto not that harmful

I use goto all the time. When I have a loop, I goto the top of the loop after each iteration. When a function delegates to another function, I just goto the other function.

Gotos that just transfer control have the problem that the context is implicit. It isn’t obvious from the code what parts of the context are expected to persist across the control transfer. But if you explicitly pass arguments along with your control transfer, then you can see exactly what is carried across the control transfer.

A tail call isn’t “optimized” to a goto, it is a goto. It is a goto that passes arguments. Gotos that pass arguments aren’t harmful.

Monday, June 24, 2024

Embrace the Suck

The key point here is our programmers are Googlers, they’re not researchers. They’re typically fairly young, fresh out of school, probably learned Java, maybe learned C or C++, probably learned Python. They’re not capable of understanding a brilliant language but we want to use them to build good software. So, the language that we give them has to be easy for them to understand and easy to adopt.
— Rob Pike

How sad. Google used to hire top-notch programmers. Now it appears that they have hired a bunch of mediocre programmers who can't even learn how to properly use exceptions or ternary conditionals. Programmers that need “training wheels” on their C.

And how insulting to Googlers to be told that they are not capable of understanding a brilliant language. To be seen as merely a fungible resource to be “used” to build software. I'm sure some of them are capable of understanding a brilliant language. I'm sure that some are capable of learning how to use Haskell or OCaml or Clojure (or F# or Rust or Erlang or Scala) whatever language is best for the task.

Does success come to those that strive for excellence or those that embrace mediocrity?

Sunday, June 16, 2024

Decode a Float (Solution)

We can multiply or divide a floating point number by 2 without changing the bits in the mantissa. So we can rescale the number to be in the range [1, 2) by repeatedly multiplying or dividing by 2. The leftmost bit of the mantissa is always 1, but next bit determines whether the number is in the top half of the range or the bottom half. So if the number is equal to or greater than 1.5, the next bit is 1, otherwise it is 0.

If the number is in the range [1, 2), we can subtract 1 from it. The remaining bits will be shifted left until the leftmost bit is 1. If the number is greater than or equal to 1.5, then subtracting 1 will shift the bits left by one. But if the number is in [1, 1.5), we won’t know how many zero bits will be shifted out when we subtract 1. What we’ll do is add .5 to the number to turn on the next bit, then subtract 1 to shift the bits left by one.

Here is the code in Common Lisp:

(defun float->bits (float)
  (cons 1 (read-bits float 0)))

(defun read-bits (float count)
  (cond ((>= count 52) nil)
        ((> float 2.0d0) (read-bits (/ float 2.0d0) count))
        ((< float 1.0d0) (read-bits (* float 2.0d0) count))
        ((>= float 1.5d0) (cons 1 (read-bits (- float 1.0d0) (1+ count))))
        (t (cons 0 (read-bits (- (+ float .5d0) 1.0d0) (1+ count))))))

Note that all these operations are exact and do not cause round off.