Wednesday, August 27, 2025

Thinking About Thinking

The latest LLMs can be run in “thinking” mode where they take an extra processing step before generating output. In this extra step they refine the prompt and generate intermediate information to be used in generating the final output. This supposedly leads to better results, but you have to pay for the extra processing. The question is, is the extra processing worth it?

At our company, we have adopted Cursor as our AI tool. Cursor allows you to choose from several different LLMs and allows you to choose whether to enable “thinking” when using the LLM. Cursor has a billing plan where you have a subscription to a certain amount of token processing per month, and the token processing cost differs by model and by whether you choose “thinking”. An older model without thinking is substantially cheaper than a brand new model with thinking. If you choose a new model with thinking and you let Cursor run as an “agent” where you give it a task and let it automatically make prompts, you can easily run your through your monthly allotment of tokens in a day or two. So management has asked us to be mindful of “thinking” and careful of our use of Cursor in “agent” mode.

But in my admittedly limited experience, the results with “thinking” are quite a bit better than without. With generation of Lisp code from pseudocode, “thinking” can make the difference between working output and nonsense. With automatic documentation, “thinking” can turn adequate documentation into high quality documentation. When running in “agent” mode, “thinking” can get the agent unstuck where the agent without “thinking” gets stuck in a loop. The tools simply work better with “thinking” enabled, and you can notice the difference.

The newer models are better than the older ones, too. I try to use the cheapest model I can get away with, but I have found that the cheaper models can get confused more easily and can make a real mess out of your source code. I had one experience where an older model started mangling the syntactic structure of the code to the point that it didn't parse anymore. I switched to the newest available model and told it that its older cousin had mangled my code and that it should fix it. The new model successfully unmangled the code.

For me, the extra cost of “thinking” and the newer models is well worth it. I get far better results and I spend less time fixing problems. If it were just me paying for myself, I'd definitely spend the extra money for “thinking” and the newer models. I understand that this doesn't scale for a company with a thousand engineers, however. If you are in the position of making this decision for a large group, I think you should be ready to pay for the premium models with “thinking” enabled and not be stingy. You'll get what you pay for.

No comments: