Experience with AI Coding Assistants

Experience Report - Usage of different AI Coding Assistance for private projects.

06.09.2024

Coding Assistants

Hello everyone,

Of course, the AI hype hasn't passed me by unnoticed, and naturally, I wanted to know if software developers could be replaced by AI software developers, like Devin 😉, in the near future. Since I work on private software projects in my free time, I thought I could see how well the currently available tools can support me in my private projects.

A Brief Background

I work in the software industry and also enjoy developing software projects in my free time, realizing various ideas in the form of mobile apps or web apps. I enjoy programming, and when it works, it makes me even happier. When I heard the statement from the Nvidia CEO that we won't need to learn programming in the near future, I perked up and had to try out the AI coding assistants.

I'd like to share my experiences using the currently available LLM tools. It's not scientific and definitely not conclusive or fair, as I tested the free version of some tools and opted for the premium version of others! Feel free to make your own experiences and share them in the comments.

I wanted to get answers to the following questions:

  • How can LLM tools help me with software development?
    • Implementing UI (Complementing / complete Flutter Widgets)
    • Implementing logic (Improving / completely AI-generated)
    • Adapting functions
    • Explaining functions
    • Generating complete AI features

I tried out the following tools:

  • Tabnine
  • ChatGPT Web
  • GitHub Copilot
  • Claude AI Web
  • Claude AI API plus Cloud Dev

Tabnine

I integrated Tabnine as a plugin into my IDE last year. I had to adapt my way of working to the tool. At that time, Tabnine still provided suggestions on how I could implement a function. I had to get used to the fact that I would soon get a suggestion, which meant that the typing speed decreased as one waited for the suggestion to come. However, this could take some time.

Note: Now Tabnine can do more, and you can chat with the Tabnine assistant just like with GitHub Copilot. However, I haven't tested this functionality.

ChatGPT Web (Free)

ChatGPT initially delivered very impressive results. However, you had to describe exactly what the current situation is, what my goal is, and what the result should look like (prompt). After that, you also received a good result. Creating the prompt takes time, though, and sometimes I felt that I would have implemented it faster myself than writing the prompt beforehand and then waiting for the result.

What led to limitations, however, is the restriction in the context window. It works wonderfully for simple functions. For more demanding tasks that require, for example, different functions / classes / files, it becomes difficult. Just the user-friendliness of copying the required content from the IDE into the chat window is very cumbersome. Moreover, I also reached the limit here of how much I can upload. However, we see from examples like Gemini 1.5 Pro (1 million tokens) that more is possible here.

In summary, it wasn't a great improvement in development speed for me, as the development flow suffered due to back-and-forth copying. The limitations of the context window were also hindering. Since the results often didn't work directly (single shot) and I still had to chat back and forth to get a suitable result.

GitHub Copilot (Modell: GPT-3.5 Turbo & GPT-4o)

After the ChatGPT web experiment, I thought it makes sense to try out Copilot. It can be easily integrated into the IDE. Similar to Tabnine, Copilot provided suggestions on what a function could look like. Furthermore, I could also have Copilot check and improve code that wasn't accepted by the compiler. This is, of course, very helpful when switching between different languages and you still have the syntax of the previous language in your head.

Due to faster suggestions than Tabnine and the integration into the IDE, the development flow wasn't as disrupted. Nevertheless, I was slower because I was waiting for the result. Furthermore, you also have the problem of a limited context window here. It happened to me that new functions were suggested that were present in the entire project, but he hadn't seen due to the context window limitations.

Through the IDE integration and faster suggestions, it was acceptable. When I got stuck somewhere, the chat function was also helpful. However, I have to admit that I rarely got a suitable answer right away, but usually had a longer chat history.

Claude.ai Web (Modell: Sonnet-3.5)

The appearance of the Sonnet-3.5 model directed me to Anthropic's Claude.ai. I found the artifacts quite interesting here and wanted to try that out too. I also found it exciting that in the free version, you could upload an image with the desired UI and receive a matching code. Of course, I tried that out and was initially positively surprised by the result. I immediately received a usable code that correctly implemented 90% of the UI. Adjusting the remaining 10% was then child's play.

Since it worked so well at the beginning, I also ran into the daily limit on requests. As the result of the free version convinced me, I decided to try out the premium features. Here you have the possibility to create projects, and in these projects, you can store 'Project Knowledge'. These are simple data in text format and thus also code.

Here I was interested to see if I could simply give it screenshots of a web app and the LLM would deliver a suitable result. Unfortunately, this didn't work because you're limited to 5 files per chat when uploading images. This means you can't map an entire app. What worked well, however, was when you went screenshot by screenshot. The UI could be wonderfully realized this way (90% generated - 10% own adjustments). However, you have to make sure that the requirement is nicely encapsulated and complete. If you expect it to build on existing functions, you'll be disappointed.

I also got into a situation where it couldn't satisfactorily improve its own code. Here I had to intervene and tell it which approach it should choose for the solution to achieve the desired result. After my hint, a clean result was delivered. This means you have to guide the models in the right direction at one point or another, otherwise the model tries to improve a wrong approach endlessly without realizing that the approach it chose is wrong.

I have to say, this was the best model I've tried so far from my point of view and experience. But I also have to admit that after a while I got the feeling that the model had gotten worse. The results were sometimes worse than the initial 90% correctness.

Claude.ai API + Claude DEV (Model: Sonnet-3.5)

Based on the good experiences with Claude.ai, I activated the API access and integrated it into the IDE via the Claude Dev Plugin. My hope was that it now has multi-file support and can thus make adjustments in multiple files. Unfortunately, I was disappointed on various levels here. Of course, I ran into all sorts of limits, be it Requests Per Minute, or Tokens Per Day. This led to it getting stuck in the middle and not completing the task. So out of 13 files, 8 were completed and the rest were not. Which of course led to frustration and also to a poor rating.

Summary

Depending on what requirements you have for a tool, you get the appropriate support or not. I wanted to see if it could create complete applications based on screenshots and function descriptions. I couldn't realize this with the mentioned tools. This doesn't mean that it won't be possible in the future, it just means that currently, it didn't work for me 😕.

From my point of view, the results from Claude.ai were very good and usable with few adjustments. Here you have to keep in mind that the field of work of a developer could/will change, as I wrote less code myself, but rather checked and adjusted the generated code. My development speed only visibly improved when using Claude.ai, although I used the web version. If it were possible to use the model as an IDE copilot, the development speed would probably be even better.

Answers to the questions asked at the beginning

  • How can LLM tools help me with software development?
    • Implementing UI (Complementing / complete Flutter Widgets)
      • I could achieve this best with Claude.ai Web.
    • Implementing logic (Improving / completely AI-generated)
      • From my point of view, all of them could do this quite well.
    • Adapting functions
      • This worked well with all of them.
    • Explaining functions
      • This worked well with all of them.
    • Generating complete AI features
      • None of the models could fulfill this to an acceptable degree for me.

Personal Opinion

I can't answer whether we should still learn programming or not. For my part, I enjoy programming. But I'm also open to tools that make my work easier and thus bring me away from syntax towards functions and benefits for the user.

Even if the models currently don't deliver 100% correct results, we need to deal with this topic and see how such tools can be used meaningfully and what effects the use of AI coding assistants can have.

I'm curious about what the future will bring us and how quickly the results of the models will improve with a focus on programming.