Anecdotes are not data.
You cannot extrapolate trends from anecdotes. A sample size of one is rarely significant. You cannot derive general conclusions based on a single data point.
Yet, a single anecdote can disprove a categorical. You only need one counterexample to disprove a universal claim. And an anecdote can establish a possibility. If you run a benchmark once and it takes one second, you have at least established that the benchmark can complete in one second, as well as established that the benchmark can take as long as one second. You can also make some educated guesses about the likely range of times the benchmark might take, probably within a couple of orders of magnitude more or less than the one second anecdotal result. It probably won't be as fast as a microsecond nor as slow as a day.
An anecdote won't tell you what is typical or what to expect in general, but that doesn't mean it is completely worthless. And while one anecdote is not data, enough anecdotes can be.
Here are a couple of AI success story anecdotes. They don't necessarily show what is typical, but they do show what is possible.
I was working on a feature request for a tool that I did not author and had never used. The feature request was vague. It involved saving time by feeding back some data from one part of the tool to an earlier stage so that subsequent runs of the same tool would bypass redundant computation. The concept was straightforward, but the details were not. What exactly needed to be fed back? Where exactly in the workflow did this data appear? Where exactly should it be fed back to? How exactly should the tool be modified to do this?
I browsed the code, but it was complex enough that it was not obvious where the code surgery should be done. So I loaded the project into an AI coding assistant and gave it the JIRA request. My intent was get some ideas on how to proceed. The AI assistant understood the problem — it was able to describe it back to me in detail better than the engineer who requested the feature. It suggested that an additional API endpoint would solve the problem. I was unwilling to let it go to town on the codebase. Instead, I asked it to suggest the steps I should take to implement the feature. In particular, I asked it exactly how I should direct Copilot to carry out the changes one at a time. So I had a daisy chain of interactions: me to the high-level AI assistant, which returned to me the detailed instructions for each change. I vetted the instructions and then fed them along to Copilot to make the actual code changes. When it had finished, I also asked Copilot to generate unit tests for the new functionality.
The two AIs were given different system instructions. The high-level AI was instructed to look at the big picture and design a series of effective steps while the low-level AI was instructed to ensure that the steps were precise and correct. This approach of cascading the AI tools worked well. The high-level AI assistant was able to understand the problem and break it down into manageable steps. The low-level AI was able to understand each step individually and carry out the necessary code changes without the common problem of the goals of one step interfering with goals of other steps. It is an approach that I will consider using in the future.
The second anecdote was concerning a user interface that a colleague was designing. He had mocked up a wire-frame of the UI and sent me a screenshot as a .png file to get my feedback. Out of curiousity, I fed the screenshot to the AI coding tool and asked what it made of the .png file. The tool correctly identified the screenshot as a user interface wire-frame. It then went on to suggest a couple of improvements to the workflow that the UI was trying to implement. The suggestions were good ones, and I passed them along to my colleague. I had expected the AI to recognize that the image was a screenshot, and maybe even identify it as a UI wire-frame, but I had not expected it to analyze the workflow and make useful suggestions for improvement.
These anecdotes provide two situations where the AI tools provided successful results. They do not establish that such success is common or typical, but they do establish that such success is possible. They also establish that it is worthwhile to throw random crap at the AI to see what happens. I will be doing this more frequently in the future.
No comments:
Post a Comment