Wednesday, June 5, 2024

Multithreading and Immutable Data

I was amusing myself by looking at Lisp tutorials. They used the idea of a Tic-Tac-Toe service as a motivating example. You’d be able to play Tic-Tac-Toe against the computer or another opponent.

My immediate thought went to the issue of multithreading. If you were going to serve hundreds of people at once, you’d need to have a multi-threaded service. Multi-threaded code is hard to write and debug, and it is much better if you have a plan before you start than if you try to retrofit it later (that trick never works).

The magic bullet for multi-threading is immutable data. Immutable data is inherently thread-safe. It doesn’t need synchronization or locks. If all your data are immutable, you can pretty much ignore multi-threading issues and your code will just work.

Using a 2D array to represent a Tic-Tac-Toe board is the obvious thing that first comes to mind, but not only are arrays mutable, they virtually require mutation to be of any use. The Lisp tutorials I was looking at all used arrays to represent the board, none of them locked the board or used atomic operations to update it, and all had the potential for race conditions if two threads tried to update the board at the same time. Arrays are essentially inherently thread-unsafe.

I thought about alternative representations for the board. Different representations are more or less amenable for writing code that avoids mutation. I came up with a few ideas:

  • Use a 2d array, but copy it before each mutation. This is horribly inefficient, but it is simple.
  • Use a 1d array, again copying it before each mutation. This isn’t much different from the 2d array, but iterating over the cells in the board is simpler.
  • Keep a list of moves. Each move is a pair of player and position. To determine the state of the board, you iterate over the list of moves and apply them in order. This is a bit more complicated than the array representations, but it is inherently immutable. It also has the advantage that you can rewind the board to any prior position.
  • Encode the board as a pair of bitmaps, one for each player.
  • Encode the board as a single bitmap, with each cell represented by two bits.
  • There are only 39 ways to fill out a Tic-Tac-Toe grid, so you could represent the board as an integer.

Each one of these representations has pros and cons. I wrote up some sample code for each representation and I found that the representation had a large influence on the character of the code that used that representation. In other words, there wasn’t a single general Tic-Tac-Toe program that ended up being specialized to each representation, but rather there were six different Tic-Tac-Toe programs each derived from its own idiosyncratic representation.

In conclusion, it is a good idea to plan on using immutable data when you might be working with a multi-threaded system, and it is worth brainstorming several different representations of your immutable data rather than choosing the first one that comes to mind.

2 comments:

Scott L. Burson said...

Have you looked at FSet?

It's overkill for this use case, of course, since the board state fits in a fixnum. But just by way of example, you could represent the board as a single FSet seq (like a 1-D array), a seq of seqs (like a 2-D array), or a map keyed by coordinate pairs. FSet will even let you write things like (setf (@ (@ board row) col) 'X) and handle the copying for you — even though that may look like it's mutating a 2-D array in place, it's actually consing an updated copy and assigning it back to 'board'.

David McClain said...

Thinking about your comment regarding 2D arrays, etc.

I wholly agree, in general. But I often violate the precepts, but do it safely.

For example, when performing a 2D FFT on a large image, I will split the array into logical chunks and hand off each chunk to a thread handling just a portion of the image FFT. First for rows, then for columns.

So none of the threads are stepping on each other's territory. This is safe, and needs no further coordination between the threads.

But in general, propably the only safe way to access a shared vector or array is to make an FPL Tree and use the array indices as keys into the tree.