Software Development with AI Support (Agentic-Coding) Part 2

My experiences using AI in software development

12/18/2025

•7 min read

In the post Software Development with AI Support (Pre-Agentic-Coding), I described my experiences with various LLMs and coding assistants - without agentic-coding functionality.

In this post, I want to share my experiences with the newer agentic-coding approaches, which enable AI models to take on more complex tasks in software development by autonomously executing multiple steps and decisions and writing code directly into your classes.

Disclaimer!

The experiences described in this post are based on my personal tests and observations with various AI models and coding assistants. The results do not follow any specific scientific or statistical standard and are therefore subjective.!

Introduction

I still have a great interest in improving and accelerating my development process with AI support, without losing control over the code or accepting lower quality. So I keep trying to hand over larger task blocks to the AI to see how good the models have become at successfully completing more complex tasks. The "Agentic Coding" feature, which is available in both Copilot and Claude Code, particularly interested me because it promises that the AI can autonomously execute multiple steps and write code directly into the corresponding classes.

Development Languages & Frameworks, Prompt and Output Language!

The languages I mainly use for development are TypeScript, Python, and Dart. Therefore, my experiences mainly relate to these languages.!

The languages I mainly use for analysis, documentation, and diagram creation are English and German. Therefore, my experiences mainly relate to these two languages.!

The frameworks I mainly use for development are Nuxt 3/4, Vue 3, FastAPI, and Flutter. Therefore, my experiences mainly relate to these frameworks.!

My Use Cases

My expectations of the results have increased, but the use cases have essentially remained the same:

Explaining functions – Super helpful for getting started with a completely unfamiliar codebase. (No longer considered!)
Developing functions – Quickly having a function written that performs a specific task.
Developing app features – After successfully creating a function, I wanted to know if I could have a complete feature written. This would include multiple classes and functions.
Writing tests – Tests are super important, but also quite time-consuming to write. Here I wanted to know if the models could help me.
Checking applications for bugs and performance issues – After I've developed a feature or function, I like to have it checked for bugs and performance issues.
Checking applications for compliance with best practices and project architectures – For this, I like to get suggestions on how I can improve my code: Where do I deviate from best practices and how can I improve that?
Creating documentation for existing code – Even if you start a project alone, sooner or later one or more developers will join. To make the entry into the project as smooth as possible, good documentation is super important, but also quite time-consuming to create. Here I wanted to know if the models can process a complete codebase and then create correct documentation.
Creating flowcharts for data flows in an unknown software project – Another point that can be super helpful when starting a new project is flowcharts that explain the data flow in the application.
Vibe factor - How well can you work with the assistant, is it convenient? Does the assistant deliver good results? Do the results match my style? Or do I constantly have to tell it that it made a mistake? All these points flow into the vibe factor.
Improving app features - Here I wanted to know if the models can help me optimize and improve existing features.

NOTE: The listing of my experiences with the various models and tools is not presented in any special order. This is mainly because I tried the tools and models in parallel and not one after another. Depending on which tool had a promising feature, I tried it.

Claude Code (Agentic, Model: Claude Sonnet-4.5)

I only started using Claude Code when I could integrate it into my IDE as a plugin, which means I got to enjoy Claude Code a bit later. When I started with it, the '/init' function already existed, which enabled Claude Code to better grasp the context of the codebase. So I no longer had to manually insert all classes and functions into the prompt, but could simply let the tool analyze the code. This significantly simplified and accelerated the process. Through this function, Claude Code also saved a structure and the peculiarities of the code in a CLAUDE.md file, which I could then keep expanding to also map the latest developments of the codebase there.

Fulfillment of my use cases:

Loading timeline data...

I find the results from Claude Code very good. There are occasional bugs when you can't communicate with the model, or when you have to authenticate multiple times per session. But all of that is still acceptable. However, I find the ever-decreasing limits very annoying. Here I have the feeling that I hit the limits more often and then have to wait again until I can continue working. Sometimes it's in the middle of a task, and you have to wait until the next day before you can continue. Simply terrible.

Codex

Fulfillment of my use cases:

Loading timeline data...

Basically, Codex fulfills my use cases quite well. I have the feeling that my prompts don't work as well here as with the Anthropic models.

Conclusion

The same points apply here as with using Github Copilot as an IDE plugin.

However, what is the difference between the two approaches? Actually, it fluctuates for me which tool I prefer to work with and where the vibe factor is higher. I have the impression that I can better delegate larger tasks, be it 'checking applications for bugs and performance issues' or 'checking applications for compliance with best practices and project architectures', to Claude Code or Codex. Here the results are often somewhat better than with Copilot. However, with Copilot I have the feeling that I get smaller tasks like 'developing functions' or 'writing tests' done faster. Here the vibe factor is higher and I have the feeling that I get to a good result faster. What is of course problematic are the limits that Claude Code keeps lowering. Here I have the feeling that I hit the limits more often and then have to wait again until I can continue working. Sometimes it's in the middle of a task, and you're basically stuck because the AI has already completed several points, adjusted 5-10 files for that, and still needs to complete more points with further adjustments in various files. This has already ruined my evening several times.

In both comparisons – post Software Development with AI Support (Agentic-Coding) and this post – it is the case that my tasks are all completed quite well. With one model you have to do one more iteration, with the other model fewer. But overall, the results are good. However, what I notice is that you prefer one model or another or have gotten used to it. I've read several Reddit posts that swear by CodeX and the GPT models. I, on the other hand, find the Anthropic models better. As a result, I use CodeX and the GPT models less. Now I ask myself whether the systems perhaps do get to know the user and adapt to their preferences and style? Or perhaps you've unconsciously adopted a prompt style that fits better with one model or another? I don't know.