Friday, November 24, 2023

GitHub Co-pilot Review

I recently tried out GitHub CoPilot. It is a system that uses generative AI to help you write code.

The tool interfaces to your IDE — I used VSCode — and acts as an autocomplete on steroids … or acid. Suggested comments and code appear as you move the cursor and you can often choose from a couple of different completions. The way to get it to write code was to simply document what you wanted it to write in a comment. (There is a chat interface where you can give it more directions, but I did not play with that.)

I decided to give it my standard interview question: write a simple TicTacToe class, include a method to detect a winner. The tool spit out a method that checked an array for three in a row horizontally, vertically, and along the two diagonals. Almost correct. While it would detect three ‘X’s or ‘O’s, it also would detect three nulls in a row and declare null the winner.

I went into the class definition and simply typed a comment character. It suggested an __init__ method. It decided on a board representation of a 1-dimensional array of 9 characters, ‘X’ or ‘O’ (or null), and a character that determined whose turn it was. Simply by moving the cursor down I was able to get it to suggest methods to return the board array, return the current turn, list the valid moves, and make a move. The suggested code was straightforward and didn’t have bugs.

I then decided to try it out on something more realistic. I have a linear fractional transform library I wrote in Common Lisp and I tried porting it to Python. Co-pilot made numerous suggestions as I was porting, to various degrees of success. It was able to complete the equations for a 2x2 matrix multiply, but it got hopelessly confused on higher order matrices. For the print method of a linear fractional transform, it produced many lines of plausible looking code. Unfortunately, the code has to be better than “plausible looking” in order to run.

As a completion tool, co-pilot muddled its way along. Occasionally, it would get a completion impressively right, but just as frequently — or more often — it would get the completion wrong, either grossly or subtly. It is the latter that made me nervous. Co-pilot would produce code that looked plausible, but it required a careful reading to determine if it was correct. It would be all too easy to be careless and accept buggy code.

The code Co-Pilot produced was serviceable and pedestrian, but often not what I would have written. I consider myself a “mostly functional” programmer. I use mutation sparingly, and prefer to code by specifying mappings and transformations rather than sequential steps. Co-pilot, drawing from a large amount of code written by a variety of authors, seems to prefer to program sequentially and imperatively. This isn’t surprising, but it isn’t helpful, either.

Co-pilot is not going to put any programmers out of work. It simply isn’t anywhere near good enough. It doesn’t understand what you are attempting to accomplish with your program, it just pattern matches against other code. A fair amount of code is full of patterns and the pattern matching does a fair job. But exceptions are the norm, and Co-pilot won’t handle edge cases unless the edge case is extremely common.

I found myself accepting Co-pilot’s suggestions on occasion. Often I’d accept an obviously wrong suggestion because it was close enough and the editing seemed less. But I always had to guard against code that seemed plausible but was not correct. I found that I spent a lot of time reading and considering the code suggestions. Any time savings from generating these suggestions was used up in vetting the suggestions.

One danger of Co-pilot is using it as a coding standard. It produces “lowest common denominator” code — code that an undergraduate that hadn’t completed the course might produce. For those of us that think the current standard of coding is woefully inadequate, Co-pilot just reinforces this style of coding.

Co-pilot is kind of fun to use, but I don’t think it helps me be more productive. It is a bit quicker than looking things up on stackoverflow, but its results have less context. You wouldn’t go to stackoverflow and just copy code blindly. Co-pilot isn’t quite that — it will at least rename the variables — but it produces code that is more likely buggy than not.