Monday, September 15, 2025

Analyzing Lisp Redux: One Form At a Time

Instead of analyzing an entire Common Lisp file at once, you might want to analyze the file one function (or top-level form) at a time. Functions are basically black-box abstractions. So long as the arguments and return values don’t change (and the side effects are preserved), the implementation can be completely replaced. The LLM is much more constrained in this case. It cannot make changes to the signature of the function or make interprocedural changes to the code. While this makes a large class of improvements impossible, it also makes a large class of bugs impossible and greatly narrows the search space of code changes.

We use a specialized READ-ANALYZE-PRINT loop. We use a special version of read that preserves comments (see yesterday’s post) to read the file one top-level form at a time. Each top-level form is presented to the LLM (along with any associated comments) for analysis. For each form, the LLM is instructed to describe the purpose of the form, to identify any potential bugs, to check for adherence to best practices, and to suggest ways to improve the code.

The system instruction for analysis is as follows:

"You are a world class Common Lisp programmer."
"You will be analyzing a Common Lisp file one top-level form at a time."
"Your analysis should be thorough and insightful, demonstrating
a deep understanding of Common Lisp programming practices."

In addition we have these directives:

"If there is no package definition, assume an appropriate
one exists elsewhere and do not mention this."

Without this directive, the LLM will complain about files that simply have an appropriate (in-package ...) form.

"Assume that popular utility packages such as alexandria
and series have been loaded and made available."
"Assume that undefined functions are defined elsewhere and do not mention this."

Without these directives, the LLM complains about undefined functions. We specifically tell it to assume we know what we are doing.

"Do not suggest using defconstant, even if it would be appropriate."

This is an odd one, but the LLM has a strong tendency to suggest using defconstant for values that do not seem to change. In many cases, we want to allow the user the option to modify the value at runtime or to reload the file. Reloading a file with defconstant forms will often cause an error, even if the value hasn’t changed, because the reloaded value is equal but not eql to the original value.

Then we direct the LLM to perform the analysis:

"For each top-level form:"
"  * Determine its purpose."
"  * Determine whether a library function might achieve the same thing."
"  * Check that it follows best practices."
"  * Look for typical Common Lisp errors and pitfalls."
"  * Look for bugs."
"Then report on the form:"
"  * Describe the purpose of the form."
"  * If the form is missing a docstring, suggest one."
"  * If the form fails to follow best practices, say so."
"  * If there are bugs or errors, describe them."
"  * If improvements can be made, give up to three concrete, actionable suggestions for improvement."
"  * For each suggestion, indicate whether it is a critical improvement or merely nice to have."
"  * For each suggestion, describe your reasoning."
"  * If the form is error free and does not need improvement, say so."

The prompt for each top-level form recapitulates the form specific section of the system instruction. This is followed by the form itself.

Example Analysis

Here is an example of the analysis of the analyze.lisp file.

;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 9.23 seconds.
;; Prompt Tokens:          7,160
;; Thoughts Tokens:          766
;; Candidate Tokens:          94
;; Accumulated Prompt Tokens:      8,276,974
;; Accumulated Response Tokens:      341,259

;; Analyzing form:

;;; -*- Lisp -*-

(in-package "GEMINI")

The purpose of the form (in-package "GEMINI") is to set the current package for the subsequent forms in the file to "GEMINI". This is a fundamental Common Lisp construct used to organize code and manage symbol namespaces, ensuring that symbols defined or referenced in this file are interned within the "GEMINI" package.

This form is error-free and does not need improvement. It follows best practices for Common Lisp file organization.

;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 8.84 seconds.
;; Prompt Tokens:          7,577
;; Thoughts Tokens:          652
;; Candidate Tokens:         140
;; Accumulated Prompt Tokens:      8,284,551
;; Accumulated Response Tokens:      342,051

;; Analyzing form:

(defparameter +analyze-source-system-instruction+
  "You are a world class Common Lisp programmer.  You will be analyzing files from the '~a' system.  Your analysis should be thorough and insightful, demonstrating a deep understanding of Common Lisp programming practices."
  "LLM system instruction to analyze a Common Lisp ASDF system.")

The purpose of this defparameter form is to define a special variable named +analyze-source-system-instruction+. This variable holds a format string that serves as a system instruction or prompt template for an LLM (Large Language Model). The instruction guides the LLM to act as a Common Lisp programmer analyzing files from a specified ASDF system, with ~a being a placeholder for the system’s name.

The form is error-free and does not need improvement. It correctly uses defparameter for a global, dynamically-scoped variable, follows the +earmuffs+ naming convention for special variables, and includes a clear docstring.

;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 6.94 seconds.
;; Prompt Tokens:          8,069
;; Thoughts Tokens:          523
;; Candidate Tokens:         163
;; Accumulated Prompt Tokens:      8,292,620
;; Accumulated Response Tokens:      342,737

;; Analyzing form:

(defparameter +analyze-system-definition-prompt+
  (str:join
   #\Space
   (list
    "The following is the definition of the `~a` system."
    "Analyze the system definition, checking for best practices."
    "Make sure there is a primary package definition component that has no *other internal ASDF dependencies* and is depended upon by all other components."))
  "LLM prompt to analyze a Common Lisp ASDF system definition.")

The purpose of this defparameter form is to define a special variable named +analyze-system-definition-prompt+. This variable holds a string that serves as a prompt template for an LLM. The prompt instructs the LLM to analyze a Common Lisp ASDF system definition, specifically checking for best practices related to package definitions and component dependencies, emphasizing the need for a primary package component without internal ASDF dependencies that all other components depend on.

The form is error-free and does not need improvement. It correctly uses defparameter for a global, dynamically-scoped variable, follows the +earmuffs+ naming convention for special variables, and includes a clear docstring. The use of str:join to construct the prompt string is idiomatic and readable.

;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 21.61 seconds.
;; Prompt Tokens:          8,824
;; Thoughts Tokens:        1,703
;; Candidate Tokens:         557
;; Accumulated Prompt Tokens:      8,301,444
;; Accumulated Response Tokens:      344,997

;; Analyzing form:

(defun analyze-system-definition (system-name)
  "Analyzes the ASDF system definition file for the specified SYSTEM-NAME.
   It checks for best practices, especially concerning the primary package
   definition component and its internal ASDF dependencies, by sending
   the system definition to an LLM for detailed analysis."
  (check-type system-name (or string symbol))
  (let ((asdf-system (asdf:find-system system-name)))
    (unless asdf-system
      (error "ANALYZE-SYSTEM-DEFINITION: ASDF system '~a' not found." system-name))
    (let ((source-file (asdf:system-source-file asdf-system)))
      (unless (and source-file (probe-file source-file))
        (error "ANALYZE-SYSTEM-DEFINITION: Source file for system '~a' not found." system-name))

      (let ((*system-instruction*
              (content
               :parts
               (list
                (part (format nil +analyze-source-system-instruction+ system-name)))
               :role "system")))
        (invoke-gemini
         (list
          (part (format nil +analyze-system-definition-prompt+ system-name))
          (part (handler-case (uiop:read-file-string source-file)
                  (error (c)
                    (error "ANALYZE-SYSTEM-DEFINITION: Failed to read source file '~a': ~a"
                           source-file c))))))))))

The purpose of the analyze-system-definition function is to retrieve the ASDF system definition file for a given system-name, read its contents, and then send this content along with a specific prompt and system instruction to an LLM (Large Language Model) for analysis. The LLM is instructed to act as a Common Lisp programmer and analyze the system definition for best practices, particularly concerning the primary package definition component and its dependencies.

The function has a good docstring that clearly explains its purpose, arguments, and the process it undertakes.

The form is largely error-free and follows best practices, including robust error handling for missing systems, non-existent source files, and file reading failures, as well as check-type for argument validation.

Here are some minor suggestions for improvement:

  1. Minor Improvement: Refine error message consistency.
    • Reasoning: While the current error messages are informative, adopting a consistent prefix for all errors originating from this function (e.g., "ANALYZE-SYSTEM-DEFINITION Error: ...") can improve readability in logs and make it easier to identify the source of an error.
    • Actionable Suggestion: Modify the error messages to include a consistent prefix, for example:
      • (error "ANALYZE-SYSTEM-DEFINITION Error: ASDF system '~a' not found." system-name)
      • (error "ANALYZE-SYSTEM-DEFINITION Error: Source file for system '~a' not found." system-name)
      • (error "ANALYZE-SYSTEM-DEFINITION Error: Failed to read source file '~a': ~a" source-file c)
  2. Barely worth mentioning: Consider defgeneric and methods for invoke-gemini context.
    • Reasoning: The use of a special variable *system-instruction* to pass context to invoke-gemini is a valid Common Lisp pattern. However, for more complex scenarios or if invoke-gemini were to have different ways of handling system instructions based on the context, a defgeneric with methods could offer a more extensible and object-oriented approach. For this specific use case, the special variable is perfectly adequate and simpler.
    • Actionable Suggestion: (No concrete action needed for this specific form, as the current approach is fine.) If the LLM interaction logic becomes more intricate, consider defining invoke-gemini as a generic function and dispatching on context objects rather than relying solely on special variables.

The form is otherwise error-free and well-written.

;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 6.14 seconds.
;; Prompt Tokens:          9,796
;; Thoughts Tokens:          345
;; Candidate Tokens:         183
;; Accumulated Prompt Tokens:      8,311,240
;; Accumulated Response Tokens:      345,525

;; Analyzing form:

(defparameter +analyze-source-file-prompt+
  "Given the following ASDF system definition for '~a' and the source code for component '~a', analyze the component's code and provide a detailed summary of its functionality, purpose, and any notable features or patterns you observe.  In addition, note where the code is not following best practices, where there are potential bugs, and where the code can be improved.  Give up to three concrete, actionable suggestions for how the code could be improved or refactored.  If there are no suggestions for improvement, say so explicitly.  Indicate for each suggestion whether the change is a critical improvement or merely nice to have. Be sure to include your reasoning for each suggestion."
  "LLM prompt to analyze a Common Lisp source file.")

The purpose of this defparameter form is to define a special variable named +analyze-source-file-prompt+. This variable stores a detailed prompt template for an LLM (Large Language Model). The prompt instructs the LLM to analyze a Common Lisp source file, given the context of its ASDF system definition. The LLM is expected to provide a summary of the component’s functionality, identify best practice deviations, potential bugs, and offer up to three concrete, actionable suggestions for improvement, including reasoning and an indication of criticality.

The form is error-free and does not need improvement. It correctly uses defparameter for a global, dynamically-scoped variable, adheres to the +earmuffs+ naming convention for special variables, and includes a clear and concise docstring. The prompt itself is well-crafted to elicit a comprehensive analysis from the LLM.

;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 5.62 seconds.
;; Prompt Tokens:         10,450
;; Thoughts Tokens:          273
;; Candidate Tokens:         311
;; Accumulated Prompt Tokens:      8,321,690
;; Accumulated Response Tokens:      346,109

;; Analyzing form:

(defun get-system-and-component (system-name component-name)
  "Retrieves the ASDF system object and the specified component object
   for a given component in a specified system. Returns two values:
   the ASDF system object and the component object. Signals an error
   if the system or component is not found."
  (check-type system-name (or string symbol))
  (check-type component-name (or string symbol))
  (let ((asdf-system (asdf:find-system system-name)))
    (unless asdf-system
      (error "GET-SYSTEM-AND-COMPONENT: ASDF system '~a' not found." system-name))
    (let ((component (asdf:find-component asdf-system component-name)))
      (unless component
        (error "GET-SYSTEM-AND-COMPONENT: Component '~a' not found in system '~a'." component-name system-name))
      (values asdf-system component))))

The purpose of the get-system-and-component function is to locate and return both the ASDF system object and a specific component object within that system, given the system’s name and the component’s name. It returns these two objects as multiple values. The function includes robust error handling, signaling an error if either the system or the component cannot be found.

The function has an excellent docstring that clearly describes its purpose, arguments, return values, and error conditions.

The form is error-free and follows best practices, including check-type for argument validation and clear, specific error messages.

Here is a minor suggestion for improvement:

  1. Minor Improvement: Refine error message consistency.
    • Reasoning: While the current error messages are informative, adopting a consistent prefix for all errors originating from this function (e.g., "GET-SYSTEM-AND-COMPONENT Error: ...") can improve readability in logs and make it easier to identify the source of an error.
    • Actionable Suggestion: Modify the error messages to include a consistent prefix, for example:
      • (error "GET-SYSTEM-AND-COMPONENT Error: ASDF system '~a' not found." system-name)
      • (error "GET-SYSTEM-AND-COMPONENT Error: Component '~a' not found in system '~a'." component-name system-name)

The form is otherwise error-free and well-written.

;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 17.45 seconds.
;; Prompt Tokens:         11,416
;; Thoughts Tokens:          757
;; Candidate Tokens:       1,018
;; Accumulated Prompt Tokens:      8,333,106
;; Accumulated Response Tokens:      347,884

;; Analyzing form:

(defun get-system-and-component-contents (system-name component-name)
  "Retrieves the ASDF system object, its definition file content,
   the ASDF component object, and its source file content.
   Returns four values:
      1. The ASDF system object.
      2. A string containing the system definition file's content.
      3. The ASDF component object.
      4. A string containing the component's source file content.
   Signals an error if the system, component, or their respective
   source files are not found."
  (multiple-value-bind (system component) (get-system-and-component system-name component-name)
    (let ((system-pathname (asdf:system-source-file system))
          (component-pathname (asdf:component-pathname component)))
      (unless (and system-pathname (probe-file system-pathname))
        (error "GET-SYSTEM-AND-COMPONENT-CONTENTS: Source file for system '~a' not found." system-name))
      (unless (and component-pathname (probe-file component-pathname))
        (error "GET-SYSTEM-AND-COMPONENT-CONTENTS: Source file for component '~a' not found." component-name))
      (values system (handler-case (uiop:read-file-string system-pathname)
                       (error (c)
                         (error "GET-SYSTEM-AND-COMPONENT-CONTENTS: Failed to read system definition file '~a': ~a"
                                system-pathname c)))
              component (handler-case (uiop:read-file-string component-pathname)
                            (error (c)
                                (error "GET-SYSTEM-AND-COMPONENT-CONTENTS: Failed to read component source file '~a': ~a"
                                     component-pathname c)))))))

The purpose of the get-system-and-component-contents function is to retrieve comprehensive information about a specific ASDF system and one of its components. It returns four values: the ASDF system object, the content of the system’s definition file as a string, the ASDF component object, and the content of the component’s source file as a string. The function ensures robustness by signaling errors if the system, component, or their respective source files cannot be found or read.

The function has an excellent docstring that clearly describes its purpose, the four values it returns, and its error conditions.

The form is error-free and follows best practices, including the use of multiple-value-bind for handling multiple return values, probe-file for checking file existence, and handler-case for robust file reading with informative error messages.

Here are two suggestions for improvement:

  1. Minor Improvement: Refine error message consistency.
    • Reasoning: While the current error messages are informative, adopting a consistent prefix for all errors originating from this function (e.g., "GET-SYSTEM-AND-COMPONENT-CONTENTS Error: ...") can improve readability in logs and make it easier to identify the source of an error.
    • Actionable Suggestion: Modify the error messages to include a consistent prefix, for example:
      • (error "GET-SYSTEM-AND-COMPONENT-CONTENTS Error: Source file for system '~a' not found." system-name)
      • (error "GET-SYSTEM-AND-COMPONENT-CONTENTS Error: Source file for component '~a' not found." component-name)
      • (error "GET-SYSTEM-AND-COMPONENT-CONTENTS Error: Failed to read system definition file '~a': ~a" system-pathname c)
      • (error "GET-SYSTEM-AND-COMPONENT-CONTENTS Error: Failed to read component source file '~a': ~a" component-pathname c)
  2. Minor Improvement: Extract a helper function for safe file reading.
    • Reasoning: The pattern of reading a file string with handler-case to catch and re-signal errors is repeated twice within this function. Extracting this logic into a small helper function would improve code conciseness, adhere to the DRY (Don’t Repeat Yourself) principle, and make the main function body cleaner.
    • Actionable Suggestion: Define a local helper function (e.g., read-file-string-safely) that takes a pathname and an error message format string, and then use this helper for both system-pathname and component-pathname.
              ```lisp
              (defun get-system-and-component-contents (system-name component-name)
                "Retrieves the ASDF system object, its definition file content,
                 the ASDF component object, and its source file content.
                 Returns four values:
                    1. The ASDF system object.
                    2. A string containing the system definition file's content.
                    3. The ASDF component object.
                    4. A string containing the component's source file content.
                 Signals an error if the system, component, or their respective
                 source files are not found."
                (multiple-value-bind (system component) (get-system-and-component system-name component-name)
                  (let ((system-pathname (asdf:system-source-file system))
                        (component-pathname (asdf:component-pathname component)))
                    (unless (and system-pathname (probe-file system-pathname))
                      (error "GET-SYSTEM-AND-COMPONENT-CONTENTS Error: Source file for system '~a' not found." system-name))
                    (unless (and component-pathname (probe-file component-pathname))
                      (error "GET-SYSTEM-AND-COMPONENT-CONTENTS Error: Source file for component '~a' not found." component-name))
      
                    (flet ((read-file-string-safely (pathname error-format-string &rest args)
                             (handler-case (uiop:read-file-string pathname)
                               (error (c)
                                 (apply #'error error-format-string pathname c args)))))
                      (values system (read-file-string-safely
                                      system-pathname
                                      "GET-SYSTEM-AND-COMPONENT-CONTENTS Error: Failed to read system definition file '~a': ~a")
                              component (read-file-string-safely
                                         component-pathname
                                         "GET-SYSTEM-AND-COMPONENT-CONTENTS Error: Failed to read component source file '~a': ~a"))))))
              ```
              
;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 5.99 seconds.
;; Prompt Tokens:         12,744
;; Thoughts Tokens:          426
;; Candidate Tokens:         174
;; Accumulated Prompt Tokens:      8,345,850
;; Accumulated Response Tokens:      348,484

;; Analyzing form:

(defparameter +system-definition-context-instruction+
  "What follows is the definition of the `~a` system. Do not analyze this file, it is here so that you understand the dependencies within the system."
  "LLM instruction to provide system definition as context without analysis.")

The purpose of this defparameter form is to define a special variable named +system-definition-context-instruction+. This variable stores a format string that serves as an instruction for an LLM (Large Language Model). The instruction explicitly tells the LLM that a subsequent piece of text is an ASDF system definition, but it should be used only for contextual understanding of system dependencies and not for analysis itself. The ~a placeholder is intended for the system’s name.

The form is error-free and does not need improvement. It correctly uses defparameter for a global, dynamically-scoped variable, adheres to the +earmuffs+ naming convention for special variables, and includes a clear and concise docstring. The instruction itself is well-phrased to guide the LLM’s behavior.

;; Invoking Gemini API model `gemini-2.5-flash`...
GC 3521
;; Gemini API finished in 11.46 seconds.
;; Prompt Tokens:         13,433
;; Thoughts Tokens:          652
;; Candidate Tokens:         457
;; Accumulated Prompt Tokens:      8,359,283
;; Accumulated Response Tokens:      349,593

;; Analyzing form:

(defun analyze-component (system-name component-name)
  "Analyzes the source code of a specific ASDF component within a given system
   using an LLM. It provides the LLM with the system definition for context
   and the component's source code. The LLM is prompted to summarize
   functionality, purpose, and features, identify deviations from best practices,
   potential bugs, and suggest up to three concrete improvements with reasoning
   and severity."
  (multiple-value-bind (system system-contents component component-contents)
      (get-system-and-component-contents system-name component-name)
    (declare (ignore system component))
    (let ((*system-instruction*
            (content
             :parts
             (list
              (part (format nil +analyze-source-system-instruction+ system-name)))
             :role "system")))
      (invoke-gemini
       (list
        (part (format nil +system-definition-context-instruction+ system-name))
        (part system-contents)
        (part (format nil +analyze-source-file-prompt+ system-name component-name))
        (part component-contents))))))

The purpose of the analyze-component function is to orchestrate the analysis of a specific ASDF component’s source code by an LLM (Large Language Model). It first retrieves the ASDF system object, its definition file content, the ASDF component object, and its source file content using get-system-and-component-contents. It then sets up a system instruction for the LLM, providing context about the system being analyzed. Finally, it invokes the LLM with a carefully constructed prompt that includes the system definition (for context on dependencies), the component’s source code, and instructions for the LLM to summarize functionality, identify issues, and suggest improvements.

The function has an excellent docstring that clearly describes its purpose, arguments, and the detailed analysis expected from the LLM.

The form is error-free and follows best practices. It leverages the get-system-and-component-contents function for robust data retrieval and error handling, uses multiple-value-bind effectively, and correctly declares ignored variables. The construction of the LLM prompt and system instruction using predefined templates and format is also a good practice for consistency.

Here is one minor suggestion for improvement:

  1. Minor Improvement: Explicitly name the parts for invoke-gemini for enhanced readability.
    • Reasoning: While the current list of part forms is functional, explicitly naming the role of each part within the invoke-gemini call could improve the readability and maintainability of the code, especially as the complexity of LLM interactions grows. This makes it immediately clear what each piece of content represents in the overall prompt structure.
    • Actionable Suggestion: Consider using a more descriptive structure for the parts list, perhaps by introducing helper functions or a more structured way to represent the different segments of the LLM input. For example, if invoke-gemini supported keyword arguments for different content types, it would be more explicit. As it stands, a comment or a local variable for each part could also serve this purpose, but that’s less integrated. For now, the current approach is acceptable.
;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 6.49 seconds.
;; Prompt Tokens:         14,452
;; Thoughts Tokens:          313
;; Candidate Tokens:         221
;; Accumulated Prompt Tokens:      8,373,735
;; Accumulated Response Tokens:      350,127

;; Analyzing form:

(defparameter +analyze-form-prompt+
  (str:join
   #\Newline
   (list 
    "  * Use the preceeding forms as context for analysis."
    "  * Determine the purpose of the form."
    "  * Determine whether a library function might achieve the same thing."
    "  * Check that it follows best practices."
    "  * Look for opportunities to simplify or improve the code."
    "  * Look for typical Common Lisp errors and pitfalls."
    "  * Look for bugs."
    "Then report on the form:"
    "  * Describe the purpose of the form."
    "  * Suggest a docstring if one is needed, or if it can be improved, otherwise don't mention it."
    "  * If the form fails to follow best practices, say so."
    "  * If there are bugs or errors, describe them."
    "  * If improvements can be made, give up to three concrete, actionable suggestions for improvement."
    "  * For each suggestion, indicate whether it is a critical improvement, major improvement, minor improvement, merely nice to have, or barely worth mentioning."
    "  * For each suggestion, describe your reasoning."
    "  * If the form is error free and does not need improvement, say so."))
  "LLM prompt detailing the analysis criteria and reporting format for a single Common Lisp top-level form.")

The purpose of this defparameter form is to define a special variable named +analyze-form-prompt+. This variable stores a multi-line string that serves as a comprehensive prompt template for an LLM (Large Language Model). The prompt meticulously outlines the criteria for analyzing a single Common Lisp top-level form, including what aspects to consider (context, purpose, library functions, best practices, simplification, errors, bugs) and the exact format for reporting the analysis (description, docstring suggestions, best practice adherence, bug descriptions, and up to three concrete, actionable suggestions with criticality and reasoning).

The form is error-free and does not need improvement. It correctly uses defparameter for a global, dynamically-scoped variable, adheres to the +earmuffs+ naming convention for special variables, and includes a clear and concise docstring. The use of str:join with #\Newline is an idiomatic and readable way to construct a multi-line string from a list of lines, making the prompt definition clear and easy to manage.

;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 17.41 seconds.
;; Prompt Tokens:         15,190
;; Thoughts Tokens:        1,246
;; Candidate Tokens:         738
;; Accumulated Prompt Tokens:      8,388,925
;; Accumulated Response Tokens:      352,111

;; Analyzing form:

(defparameter +analyze-file-system-instruction+
  (str:join #\newline
            (list
             (str:join #\Space
                       (list
                        "You are a world class Common Lisp programmer."
                        "You will be analyzing a Common Lisp file one top-level form at a time."
                        "Your analysis should be thorough and insightful, demonstrating a deep understanding of Common Lisp programming practices."
                        "If there is no package definition, assume an appropriate one exists elsewhere and do not mention this."
                        "Assume that popular utility packages such as `alexandria` and `series` have been loaded and made available."
                        "Assume that undefined functions are defined elsewhere and do not mention this."
                        "Assume variables decorated with *earmuffs* have been declared as special variables."
                        "Implicit dependencies are expected, so do not mention them."
                        "Do not suggest using defconstant, even if it would be appropriate or a best practice."
                        "Prefer `check-type` over type declarations for robustness."))
             "For each top-level form:"
             +analyze-form-prompt+))
  "LLM system instruction to analyze a Common Lisp file one top-level form at a time.")

The purpose of this defparameter form is to define a special variable named +analyze-file-system-instruction+. This variable holds a comprehensive multi-line string that serves as the primary "system instruction" for an LLM (Large Language Model) when it is tasked with analyzing Common Lisp files. This instruction sets the LLM’s persona as a "world class Common Lisp programmer" and provides detailed guidelines, assumptions (e.g., package existence, utility libraries, special variables), and constraints (e.g., no defconstant suggestions, prefer check-type). It also incorporates the previously defined +analyze-form-prompt+ to specify the detailed reporting format for each top-level form analyzed.

The form is error-free and follows best practices. It correctly uses defparameter for a global, dynamically-scoped variable, adheres to the +earmuffs+ naming convention, and includes a clear docstring. The use of str:join to construct the multi-line string from a list of components, including another defparameter (+analyze-form-prompt+), is idiomatic and promotes readability and modularity.

Here is one suggestion for improvement:

  1. Minor Improvement: Extract the initial general instructions into a separate defparameter.
    • Reasoning: The first str:join block, which defines the LLM’s persona, assumptions, and general constraints, is a distinct logical unit. Separating this into its own defparameter (e.g., +llm-lisp-programmer-persona-instruction+) would further modularize the overall system instruction. This would make +analyze-file-system-instruction+ a cleaner composition of two other named instruction blocks, potentially improving readability and making it easier to modify specific parts of the instruction without affecting the overall structure.
    • Actionable Suggestion:
              ```lisp
              (defparameter +llm-lisp-programmer-persona-instruction+
                (str:join #\Space
                          (list
                           "You are a world class Common Lisp programmer."
                           "You will be analyzing a Common Lisp file one top-level form at a time."
                           "Your analysis should be thorough and insightful, demonstrating a deep understanding of Common Lisp programming practices."
                           "If there is no package definition, assume an appropriate one exists elsewhere and do not mention this."
                           "Assume that popular utility packages such as `alexandria` and `series` have been loaded and made available."
                           "Assume that undefined functions are defined elsewhere and do not mention this."
                           "Assume variables decorated with *earmuffs* have been declared as special variables."
                           "Implicit dependencies are expected, so do not mention them."
                           "Do not suggest using defconstant, even if it would be appropriate or a best practice."
                           "Prefer `check-type` over type declarations for robustness."))
                "LLM system instruction defining the persona and general guidelines for Common Lisp analysis.")
      
              (defparameter +analyze-file-system-instruction+
                (str:join #\newline
                          (list
                           +llm-lisp-programmer-persona-instruction+
                           "For each top-level form:"
                           +analyze-form-prompt+))
                "LLM system instruction to analyze a Common Lisp file one top-level form at a time.")
              ```
      
;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 7.22 seconds.
;; Prompt Tokens:         16,245
;; Thoughts Tokens:          463
;; Candidate Tokens:         182
;; Accumulated Prompt Tokens:      8,405,170
;; Accumulated Response Tokens:      352,756

;; Analyzing form:

(defparameter +analyze-file-form-prompt+
  (str:join #\Newline
            (list "Analyze the following top-level form in a Common Lisp file." +analyze-form-prompt+))
  "LLM prompt to analyze a single top-level form in a Common Lisp file.")

The purpose of this defparameter form is to define a special variable named +analyze-file-form-prompt+. This variable stores a string that serves as a prompt template for an LLM (Large Language Model). The prompt instructs the LLM to analyze a single top-level form in a Common Lisp file, incorporating the detailed analysis criteria and reporting format defined in the +analyze-form-prompt+ variable.

The form is error-free and does not need improvement. It correctly uses defparameter for a global, dynamically-scoped variable, adheres to the +earmuffs+ naming convention for special variables, and includes a clear and concise docstring. The use of str:join to combine a specific instruction with a more general prompt template (+analyze-form-prompt+) is a good practice for modularity and readability.

;; Invoking Gemini API model `gemini-2.5-flash`...
;; Gemini API finished in 23.46 seconds.
;; Prompt Tokens:         17,199
;; Thoughts Tokens:        1,842
;; Candidate Tokens:         556
;; Accumulated Prompt Tokens:      8,422,369
;; Accumulated Response Tokens:      355,154

;; Analyzing form:

(defun analyze-file (filename &key (temperature 0.01) (verbose t))
  "Analyze a Common Lisp file one top-level form at a time."
  (check-type filename (or pathname string))
  (let ((forms (file-forms filename))
        (*temperature* temperature)
        (*system-instruction*
          (content :parts (list (part +analyze-file-system-instruction+))
                   :role "system"))
        (start-prompt-tokens *accumulated-prompt-tokens*)
        (start-response-tokens *accumulated-response-tokens*)
        (start-time (get-universal-time))
        (aborted t))
    (unless (consp forms)
      (error "No top-level forms found in file '~a'." filename))
    (unwind-protect
         (prog1 (let iter ((current-form (car forms))
                           (analysis (invoke-gemini
                                      (list (part +analyze-file-form-prompt+)
                                            (part (car forms)))))
                           (results '())
                           (more (cdr forms)))
                  (when verbose
                    (format t "~&~%;; Analyzing form:~%~%~a~%--------~%~a~%~%" current-form analysis))
                  (if (null more)
                      (reverse (acons current-form analysis results))
                      (iter (car more)
                            (continue-gemini
                             (list (part +analyze-file-form-prompt+)
                                   (part (car more))))
                            (acons current-form analysis results)
                            (cdr more))))
           (setq aborted nil))
      (when verbose
        (let ((total-prompt-tokens (- *accumulated-prompt-tokens* start-prompt-tokens))
              (total-response-tokens (- *accumulated-response-tokens* start-response-tokens))
              (elapsed-time (- (get-universal-time) start-time)))
          (format t "~&;; Analysis ~:[complete in~;aborted at~] ~d seconds.~%" aborted elapsed-time)
          (format t ";; Prompt tokens:   ~7,d~%" total-prompt-tokens)
          (format t ";; Response tokens: ~7,d~%" total-response-tokens))))))

The purpose of the analyze-file function is to read a specified Common Lisp file, extract its top-level forms, and then sequentially send each form to an LLM (Large Language Model) for detailed analysis. It sets up the LLM’s system instruction and a specific prompt for analyzing individual forms, dynamically adjusting the LLM’s temperature. The function also tracks and reports the total prompt and response tokens used, as well as the elapsed time for the analysis, even if an error occurs.

The docstring is concise but could be improved to explicitly state the return value.

The form follows best practices, including check-type for argument validation, let for dynamic scope of special variables (*temperature*, *system-instruction*), unwind-protect for guaranteed cleanup and reporting, and a let iter construct for robust iteration over the file’s forms. Error handling for files with no top-level forms is also present.

Here are two concrete, actionable suggestions for improvement:

  1. Minor Improvement: Enhance the docstring to include the return value.
    • Reasoning: While the docstring describes the function’s action, it doesn’t explicitly state what the function returns. Knowing the exact structure of the return value is crucial for users of the function to correctly process its output.
    • Actionable Suggestion: Update the docstring to:
              ```lisp
              "Analyze a Common Lisp file one top-level form at a time.
               Returns a list of cons cells, where each cons cell is `(top-level-form . llm-analysis-result-string)`."
              ```
       
  2. Minor Improvement: Add a check-type for the forms returned by file-forms.
    • Reasoning: The file-forms function is an external dependency. While it’s assumed to return a list of Lisp forms (s-expressions), explicitly checking that the result is indeed a list ensures robustness. If file-forms were to return something unexpected (e.g., a single atom or NIL when forms are present but not a list), this check-type would catch it early and provide a clearer error message than a subsequent car or cdr error.
    • Actionable Suggestion: After the line (let ((forms (file-forms filename)) ...)), add:
              ```lisp
              (check-type forms list)
              ```
      

The form is otherwise error-free and well-written.

;; Analysis complete in 150 seconds.
;; Prompt tokens:    152555
;; Response tokens:   14755

Conclusion

This approach works quite well. The LLM’s analysis is generally accurate and insightful. It often finds ways to improve the code without introducing bugs. It also often finds potential bugs that I had not considered. The analysis is not perfect, but it is a useful tool for improving the quality of Common Lisp code.

An important thing to keep in mind is that the tool makes an effort to find something to say about each top-level form. If you follow its suggestions, you could endlessly “improve” your code. It also suggests “improvements” where deliberately have decided to do things in a non-standard way. You need to exercise judgment in deciding which suggested improvements are worth making.

The analysis is slow enough that you would not want to use it as part of a tool chain, but it is fast enough that you can use it regularly. It does consume tokens fairly rapidly, so a lot of analysis will add up.

No comments:

Post a Comment