Software Development with Github Copilot (Agentic-Coding) Part 1

My experiences using AI in software development

12/14/2025

•7 min read

In the post Software Development with AI Support (Pre-Agentic-Coding), I described my experiences with various LLMs and coding assistants - without agentic-coding functionality.

In this post, I want to share my experiences with the newer agentic-coding approaches, which enable AI models to take on more complex tasks in software development by autonomously executing multiple steps and decisions and writing code directly into your classes.

Disclaimer!

The experiences described in this post are based on my personal tests and observations with various AI models and coding assistants. The results do not follow any specific scientific or statistical standard and are therefore subjective.!

Introduction

I still have a strong interest in improving and accelerating my development process with AI support, without losing control over the code or accepting lower quality. So I keep trying to hand over larger task blocks to the AI to see how good the models have become at successfully completing more complex tasks. The "Agentic Coding" feature, which is available in both Copilot and Claude Code, particularly interested me, as it promises that the AI can autonomously execute multiple steps and write code directly into the corresponding classes.

Development Languages & Frameworks, Prompt and Output Language!

The languages I mainly use for development are TypeScript, Python, and Dart. Therefore, my experiences mainly refer to these three languages.!

The languages I mainly use for analysis, documentation, and diagram creation are English and German. Therefore, my experiences mainly refer to these two languages.!

The frameworks I mainly use for development are Nuxt 3/4, Vue 3, FastAPI, and Flutter. Therefore, my experiences mainly refer to these frameworks.!

My Use Cases

My expectations of the results have increased, but the use cases have essentially remained the same:

Explain functions – Super helpful for diving into a completely unfamiliar codebase.
Develop functions – Quickly have a function written that fulfills a specific task.
Develop app features – After successfully creating a function, I wanted to know if I could have a complete feature written. This would include multiple classes and functions.
Write tests – Tests are super important, but also quite time-consuming to write. Here I wanted to know if the models could help me.
Check applications for bugs and performance issues – After I've developed a feature or function, I like to have it checked for bugs and performance issues.
Check applications for compliance with best practices and project architectures – For this, I like to get suggestions on how I can improve my code: Where do I deviate from best practices and how can I improve that?
Create documentation for existing code – Even if you start a project alone, sooner or later one or more developers will join. To make the entry into the project as smooth as possible, good documentation is super important, but also quite time-consuming to create. Here I wanted to know if the models could process a complete codebase and then create correct documentation.
Create flowcharts for data flows in an unknown software project – Another point that can be super helpful when entering a new project are flowcharts that explain the data flow in the application.
Vibe factor - How well can you work with the assistant, is it convenient? Does the assistant deliver good results? Do the results match my style? Or do I constantly have to tell it that it made a mistake? All these points flow into the vibe factor.
Improve app features – Here I wanted to know if the models could help me optimize and improve existing features.

NOTE: The listing of my experiences with the various models and tools is not presented in any special order. This is mainly because I tried out the tools and models in parallel rather than sequentially. Depending on which tool had a promising feature, I tried it out.

Github Copilot

In previous posts, I've already shared my experiences with Copilot – I wasn't really convinced by the tool at first. The introduction of Agentic-Coding in Copilot naturally immediately piqued my interest: How well will this feature support me in my development process?

I must admit that I was positively surprised by how well Copilot with Agentic-Coding could handle my requests. The ability to autonomously execute multiple steps and write code directly into the corresponding classes has significantly accelerated my workflow. The generated functions matched the existing code in terms of content and were often well structured – often, but not always, which disappointed me a bit. Although I was able to hand over a lot to Copilot by now and the results were mostly good, I still had to find that the code didn't meet my expectations and I often had to manually improve it. Only with the introduction of copilot-instructions.md and Agents.md could I give Copilot more precise instructions on how to structure the code, which significantly improved the results.

Copilot with Model: GPT-4.1

Initially, just like with ChatGPT, I used the chat on the claude.ai website. Here too, one must say: Even though the model was better and the results more often corresponded to what I wanted, the back and forth copying of individual functions was quite annoying. But Claude had quickly expanded the context, and so one could insert entire classes and have them corrected.

Fulfillment of my use cases:

Loading timeline data...

Copilot with Model: Claude Sonnet-4.5

Fulfillment of my use cases:

Loading timeline data...

Conclusion

In summary, I can say that the introduction of Agentic-Coding in tools like Github Copilot and Claude Code has significantly improved my development process. The ability of the models to autonomously handle more complex tasks and write code directly into the corresponding classes has saved me a lot of time.

Nevertheless, there is still room for improvement, especially when it comes to designing the code exactly according to my ideas. I regularly found that while it had prepared a plan for how it wanted to solve the task assigned to it, it didn't always manage to correctly implement this plan and realize all the points that were planned. Furthermore, I also encountered some problems where the model was supposed to improve or extend existing functions. I explicitly pointed out that existing functions should be reused as much as possible and that new classes, data models, and functions should only be implemented in absolutely necessary cases. However, the model often completely ignored existing functions and wrote new functions. This then led to me having functionality distributed across many different classes, thus creating high complexity in my code. I then had to reduce and simplify this again, partly manually, partly with the help of artificial intelligence.

In my way of working, I noticed that I no longer think in individual class functions, but in features, use cases, and their concepts. I also get good results, but they must be carefully checked each time to ensure they meet the requirements and contain no errors. Here we come to the first major challenge. Depending on the feature size, the model can generate quite a lot of code, which should ideally be checked. This in turn can become quite tedious. And as humans, we tend not to check the generated code as carefully anymore and accept it directly. Which of course leads to overlooking errors and problems and at some point no longer being able to oversee one's application.

I must say that I notice more and more that my way of working is changing. I now write less code myself, but rather define more and more the requirements and concepts for features and then have the code generated. I then have to check this code. Either through tests or manual inspection. This gives me the feeling that I look at tasks assigned to me on a completely different level of abstraction. I think more in concepts and features and less in individual functions and classes.

However, these problems can be cleverly intercepted with custom agents. How to define and use custom agents, I will describe in a future post.