Software Development with AI Support (Pre-Agentic-Coding)

My experiences using AI in software development

12/02/2025

•6 min read

Introduction

After the release of ChatGPT by OpenAI on November 30, 2022, Artificial Intelligence reached the mainstream. Meanwhile, there are several providers of Large Language Models, each with their own field of expertise. Besides ChatGPT, there is also Anthropic with their models Claude Opus, Claude Sonnet, and Claude Haiku, as well as LLama models from Meta and DeepSeek R-models from China.

As the models evolved, I naturally started testing them for their suitability in software development. I tried different use cases along the way. I would like to share my findings with you in this blog post.

My Use Cases

Let's start with my use cases. These naturally became more and more extensive over time. This was mainly because the models developed quickly and delivered good results relatively fast for simple use cases. I wanted to test how far I could go before the model stopped delivering useful content. So I regularly tried the following use cases:

Explaining functions – Super helpful for diving into a completely unfamiliar codebase.
Developing functions – Quickly have a function written that performs a specific task.
Developing app features – After successfully creating a function, I wanted to know if I could have a complete feature written. This would include multiple classes and functions.
Writing tests – Tests are super important but also quite time-consuming to write. Here I wanted to know if the models could help me.
Checking applications for bugs and performance issues – After developing a feature or function, I like to have it checked for bugs and performance issues.
Checking applications for compliance with best practices and project architectures – For this, I like to get suggestions on how to improve my code: Where do I deviate from best practices and how can I improve it?
Creating documentation for existing code – Even if you start a project alone, sooner or later one or more developers will join. To make the onboarding process as smooth as possible, good documentation is super important but also quite time-consuming to create. Here I wanted to know if the models could process a complete codebase and then create correct documentation.
Creating flowcharts for data flows in an unknown software project – Another point that can be super helpful when starting a new project are flowcharts that explain the data flow in the application.

Another point that is difficult to grasp but which I would like to include is the vibe factor. How well can you work with the assistant, is it convenient? Does the assistant deliver good results? Do the results match my style? Or do I constantly have to tell it that it made a mistake? All these points flow into the vibe factor.

Vibe factor

First Steps

When ChatGPT was released, developers naturally tried to get code from the chat tool and integrate it into their programs. Or they tried to have configurations completed by the LLM. However, this was usually only a small help and quite cumbersome. You had to leave your IDE to chat on the OpenAI website, and then copy the results back to the right place in the program and check if the function code was even correct. Initially, you could only have smaller functions developed. This developed quickly, however, coding assistants were developed and deployed, and in parallel new players like Claude from Anthropic entered the market. You can read about my experiences with various tools here: Experience with AI Coding Assistants.

Fulfillment of my use cases with ChatGPT (web version, Model: GPT-3.5):

Loading timeline data...

Claude Sonnet 3.5

At some point, I learned about Claude and naturally had to try it out. I also learned from various subreddits that Claude Sonnet is better suited for programming. I naturally had to try that out.

First Attempt

Initially, just like with ChatGPT, I used the chat on the claude.ai website. Here too, one must say: Even though the model was better and the results more often matched what I wanted, the back and forth of copying individual functions was still quite annoying. But Claude had quickly expanded its context, so you could insert entire classes and have them improved.

Fulfillment of my use cases:

Loading timeline data...

Second Attempt

Eventually, Claude's context got larger and you could add more information to the prompt. So you had to copy all necessary classes from the source code and insert them into the prompt. The copying became too tedious for me, and I thought about how to get more context into the request on the website. For this, I actually developed a helper tool that creates a Markdown file from my codebase with all necessary information. You can find this tool here: GitHub Link

Fulfillment of my use cases:

Loading timeline data...

Copilot Improvement

As described in the article Experience with AI Coding Assistants, I wasn't particularly happy with Copilot initially. I didn't find the auto-completion particularly exciting, and the generated functions weren't particularly helpful either. You could see that Copilot didn't have enough context of the problem to develop a well-fitting function. I personally didn't find OpenAI's models very helpful either. However, when Copilot also integrated Anthropic's Claude Sonnet 3.7 models, the results were better in my opinion. I then used Copilot more and more. When you could also pass files as context to the question/task, I liked the results even more and it accelerated my work process.

Fulfillment of my use cases:

Loading timeline data...

Conclusion

In summary, I can say that the development of LLMs and their integration into my work process initially helped me less. Over time, the models got better and the context larger, so that better results could also be achieved. Particularly helpful was the ability to pass files as context. This allowed the assistant to understand the code much better and suggest appropriate solutions. However, I still wasn't completely happy with the AI assistant. There were always situations where the assistant didn't understand the context correctly and suggested inappropriate solutions. Even as the models got better, there were still situations where I felt the assistant didn't really understand what I wanted. Also, the back and forth of copying code didn't get me into a flow state, but rather distracted me. The discussions with the assistant were often frustrating when you tell it for the umpteenth time that its solution doesn't fit. However, if you give it the right tip, e.g., the right concept for how a function or feature should be implemented, then it usually understood and could suggest appropriate solutions. But then the provider's limits were usually already reached.