Thursday, May 29, 2025

Dependency Injection with Thunks vs. Global Variables

Revision 2

A thunk (in MIT parlance) is a function that takes no arguments and returns a value. Thunks are simple, opaque objects. The only thing you can do with a thunk is call it. The only control you can exert over a thunk is whether and when to call it.

A thunk separates the two concerns of what to compute and when to compute it. When a thunk is created, it captures the lexical bindings of its free variables (it is, after all, just a lexical closure). When the thunk is invoked, it uses the captured lexical values to compute the answer.

There are a couple of common ways one might use a thunk. The first is to delay computation. The computation doesn't occur until the thunk is invoked. The second is as a weaker, safer form of a pointer. If the thunk is simply a reference to a lexical variable, then invoking the thunk returns the current value of the variable.

I once saw an article about Python that mentioned that you could create functions of no arguments. It went on to say that such a function had no use because you could always just pass its return value directly. I don't know how common this misconception is, but thunks are a very useful tool in programming.

Here is a use case that came up recently:

In most programs but the smallest, you will break the program into modules that you try to keep independent. The code within a module may be fairly tightly coupled, but you try to keep the dependencies between the modules at a minimum. You don't want, for example, the internals of the keyboard module to affect the internals of the display module. The point of modularizing the code is to reduce the combinatorics of interactions between the various parts of the program.

If you do this, you will find that your program has a “linking” phase at the beginning where you instantiate and initialize the various modules and let them know about each other. This is where you would use a technique like “depenedency injection” to set up the dependencies between the modules.

If you want to share data between modules, you have options. The crudest way to do this is to use global variables. If you're smart, one module will have the privilege of writing to the global variable and the other modules will only be allowed to read it. That way, you won't have two modules fighting over the value. But global variables come with a bit of baggage. Usually, they are globally writable, so nothing is enforcing the rule that only one module is in charge of the value. In addition, anyone can decide to depend on the value. Some junior programmer could add a line of code in the display handler that reads a global value that the keyboard module maintains and suddenly you have a dependency that you didn't plan on.

A better option is to use a thunk that returns the value you want to share. You can use dependency injection to pass the thunk to the modules that need it. When the module needs the value, it invokes the thunk. Modules cannot modify the shared value because the thunk has no way to modify what it returns. Modules cannot accidentally acquire a dependency on the value because they need the thunk to be passed to them explicitly upon initialization. The value that the thunk returns can be initialized after the thunk is created, so you can link up all the dependencies between the modules before you start computing any values.

I used this technique recently in some code. One thread sits in a loop and scrapes data from some web services. It updates a couple dozen variables and keeps them up to date every few hours. Other parts of the code need to read these values, but I didn't want to have a couple dozen global variables just hanging around flapping in the breeze. Instead, I created thunks for the values and injected them into the constructors for the URL handlers that needed them. When a handler gets a request, it invokes its thunks to get the latest values of the relevant variables. The values are private to the module that updates them, so no other modules can modify them, and they aren't global.

I showed this technique to one of our junior programmers the other day. He hadn't seen it before. He just assumed that there was a global variable. He couldn't figure out why the global variables were never updated. He was trying to pass global variables by value at initialization time. The variables are empty at that time, and updates to the variable later on have no effect on value that have already been passed. This vexed him for some time until I showed him that he should be passing a thunk that refers to the value rather than the value itself.

No comments: