The Workflow in Action – From Vague Wish to Finished Commit

A concrete walkthrough through all five phases of the agent team – from the first vague feature wish to a tested commit. Including the real handoffs, the checkpoints, and the moment where I had to correct the agents.

06/15/2026

•16 min read

Part 5 of 9 of the "Agentic Coding" seriesGo to series overview →

Note on the Content

This post series describes my insights from using AI in the software development process. These are personal experiences and findings from which I derived measures that worked for my projects. This series does not claim to be a comprehensive guide or universally applicable, but rather an inspiration for your own projects.

Over the last posts I introduced my agent team piece by piece: the Business Analyst and the Solution Architect – the two strategists who write not a single line of production code but lay the foundation for everything that follows. The Implementation Plan Agent and the Developer Agent – the doers that turn this foundation into a finished implementation. And in the previous post the E2E test agent – the verifier that proves, at the end, that the result actually does what was required at the start.

Both posts pursued the same goal: explaining why this approach works, which problems it solves, and where its limits lie. The theory is in place. Now it's time to pull back the curtain.

This post shows what the workflow looks like in practice – concrete, step by step, with real artifacts, real corrections, and what comes out at the end.

Diagramm wird geladen …

The complete run of the search feature – five phases, four corrections, eight commits plus an E2E test.

The Starting Point: A Vague Wish

I wanted full-text search for this blog. Not because I had a concrete plan – but because it bothered me that you could only find a particular post through the navigation or tags. The initial idea was approximately this precise:

"I need a search function. The user types a term, the matching posts appear."

That is a typical starting point: understandable, but full of implicit assumptions. What does "search function" mean in a statically generated Nuxt blog? Search across title and description, or the entire content? Live during input or after pressing Enter? With debouncing? For both languages? On its own page or as a dropdown in the header?

Whoever prompts directly gives the agent these decisions. It makes them – and usually you only notice in the finished code which ones it made.

I called the Business Analyst instead.

Phase 1: The Business Analyst – From Wish to Use Case

The Business Analyst has exactly one task: turning a vague wish into a precise, structured use case document. Before it does that, it asks follow-up questions.

In this case, four:

Should the search react live during input or only after an explicit submit?
Which content should be searched – title and description, or also the full text of the posts?
Should the search work cross-language or separately per locale?
What happens with an empty search term – an empty results list or all posts?

These are not questions a Coding Agent asks. It simply builds – and makes four implicit assumptions in doing so, which I would have only discovered in the finished components.

I answered: live search with debouncing, only title and description, separate per locale, empty term shows all posts. 90 seconds of effort on my part.

The result is a document in docs/usecases/UC-search.md. Its essential parts look approximately like this:

## Main Flow
1. User enters search term in the search bar
2. System waits 300ms (debounce)
3. System filters posts by title and description (case-insensitive)
4. Search results appear as a dropdown below the search bar
5. User selects an entry → navigates to the post

## Alternative Flow A1: No Results
- After step 3: no results found
- System shows "No results" in the dropdown

## Alternative Flow A2: Empty Search Term
- User clears the field
- Dropdown closes, all posts remain visible unchanged

## Error Cases
- E1: Content query fails → error message in dropdown, no app crash

Added to this are a sequence diagram and a user journey – both generated automatically in the use case document:

Diagramm wird geladen …

Sequence diagram of the search function, as the Business Analyst stores it in the use case.

Diagramm wird geladen …

User journey with happy path, alternative flows, and error case.

That was checkpoint 1.

I read the document. Not scanned – read. Two places bothered me. First, the agent had added an alternative flow for "search within the current post" that I had never mentioned. Second, it had assumed the search results appear on their own /search page.

Neither was correct. I wanted a dropdown in the header, not a separate page. The agent had made a plausible but wrong architecture assumption – and anchored it deep in the use case. I corrected it in two sentences, the agent revised the document.

Two minutes of effort. Not the two hours it would have cost to fix the same thing in the finished code.

Phase 2: The Solution Architect – From Use Case to Solution Design

The corrected use case document goes to the Solution Architect. Its task: analyze the existing codebase, design the target state, and store a technical solution design under docs/loesungskonzept/.

The agent reads the use case document. It reads the existing composables. It looks at the content schema. And it finds that the useContent() composable already has an internal filter method that filters by title – and that it could be extended with minimal effort to cover the description as well, rather than building a new search method.

That is exactly the duplication problem from the first post in this series, which is not avoided here through instruction but through the step itself: the Architect searches for existing logic before designing new ones.

The solution design contains a tabular action catalog with file path, type of change, and effort estimate. In this case six measures:

Measure	File	Type	Effort
1	`composables/useContent.ts`	Extend search method	S
2	`components/ui/SearchBar.vue`	New component	M
3	`components/ui/SearchDropdown.vue`	New component	M
4	`components/AppHeader.vue`	Integrate SearchBar	S
5	`i18n/locales/de.json`	Translation keys	XS
6	`i18n/locales/en.json`	Translation keys	XS

Added to this are a class diagram, a sequence diagram for the technical flow, and explicit hints on existing patterns to be reused.

That was checkpoint 2.

I read the concept. One thing I changed: the Architect had placed the debounce logic in the composable. I wanted it in the component – there it's easier to test and, if needed, reusable for other scenarios without loading the composable with UI-specific logic. One piece of feedback, one revision, the concept approved.

Important: at this point I had not read a single line of code. The concept was at an abstraction level I could fully judge – structure, dependencies, reuse – without having to look into useContent.ts.

Phase 3: The Implementation Plan Agent – From Concept to Battle Plan

The approved solution design goes to the Implementation Plan Agent. It takes the six measures and breaks them into eight concrete implementation steps under docs/plan/plan-search.md.

Each step has a fixed structure. An excerpt from the finished plan:

## Step 2: Create SearchBar.vue

**File:** `components/ui/SearchBar.vue`

### Tasks
- [ ] Create new component with input field
- [ ] Implement debounce logic with 300ms delay in the component
- [ ] `modelValue` prop for v-model compatibility
- [ ] Populate placeholder text via i18n key `search.placeholder`
- [ ] Focus styles according to existing design system

### Validation
`npm run typecheck` → must be green

### Manual Verification
| Action | Expected Result |
|---|---|
| Start dev server | No build errors |
| Render SearchBar in isolation | Input field with correct placeholder visible |

### Risk Assessment
None – new component without existing dependencies

The decisive property of this plan: after each completed step the application is in a runnable state. That is not a wish – it is an explicit requirement on the agent that it must adhere to when creating the plan. No step breaks the build. No step leaves half-finished dependencies.

That was checkpoint 3.

I read the plan. Ten minutes. In step 5 the agent had mis-ordered a dependency: it assumed SearchBar.vue was already embedded in the header – but that was only planned for step 6. The order was swapped, the build would have been broken between steps 5 and 6.

I communicated this, the agent swapped the steps. Then I approved the plan.

Phase 4: The Developer Agent – From Plan to Code

The Developer Agent does not receive the solution design. It does not receive the plan as a whole. It receives one step.

Step 1: extend useContent.ts.

The agent reads the file, extends the search method, runs npm run typecheck, reports: green. Done.

I look at the changes. One file, two methods, one new optional parameter. Everything as planned. No additional state, no new dependency, no initiative outside the task.

Then step 2. Then step 3.

After step 4 I briefly intervened: the agent had implemented the dropdown with position: absolute in inline styles, even though the plan explicitly specified the Tailwind pattern relative/absolute. No functional error – but exactly the kind of small interpretation that accumulates over eight steps into an inconsistency you can no longer trace later. I corrected it, the agent re-executed the step.

Steps 5, 6, 7, 8. After each step: look at the changes, build green, continue.

The result: a working search bar in the header, live with debouncing, with correct dropdown, without a separate page, without duplication of existing composable logic. Eight clean, focused commits in the log.

Does it actually work, though? The build is green – but green only means "compiles," not "does what the use case said." That last question is answered by the fifth agent.

Phase 5: The E2E Test Agent – From Code to Verified Feature

The E2E test agent I introduced in the previous post now gets two things: the original use case document from docs/usecases/UC-search.md and the running application. Not the plan, not the code, not the implementation details – only the requirement in its approved form and a browser.

It translates the structure of the use case directly into Playwright test cases: the main flow into the central scenario, the two alternative flows (no results, empty term) into one test each, and error case E1 into a test that forces the failing content query. Then it starts the dev server, drives the browser, and runs the suite.

test('Main flow: matches appear in the dropdown', async ({ page }) => {
  await page.goto('/')
  await page.getByRole('searchbox').fill('agents')
  const dropdown = page.getByTestId('search-dropdown')
  await expect(dropdown).toBeVisible()
  await expect(dropdown.getByRole('link')).not.toHaveCount(0)
})

test('A2: empty term closes the dropdown', async ({ page }) => {
  await page.goto('/')
  const box = page.getByRole('searchbox')
  await box.fill('agents')
  await box.fill('')
  await expect(page.getByTestId('search-dropdown')).toBeHidden()
})

That was checkpoint 5. And here is the point where I look most closely – not at the green checkmark, but at the test code itself.

That paid off. The first attempt for error case E1 looked like this:

// too weak – runs green but asserts nothing solid
test('E1: error on content query', async ({ page }) => {
  await page.goto('/')
  await page.getByRole('searchbox').fill('xyz')
  await expect(page.getByTestId('search-dropdown')).toBeVisible()
})

The test didn't force the error at all – it just typed a term and checked that some dropdown is visible. Green, but worthless: exactly the fake safety I warned about in the HITL post. I instructed the agent to actually make the content query fail via a route mock and to assert on the defined error message, not on mere visibility. Three minutes, one revision – after that the test really checked the error behavior described in the use case.

Four test cases, all green, all meaningful. They move into the repository and run on every future change from now on. The one-time "it works" has become a lasting "it still works."

What This Walkthrough Really Shows

I actually intervened at four of five checkpoints. That sounds like a lot of correction effort.

In practice it was the opposite. Every intervention took between two and ten minutes. I corrected a misassumption (phase 1), adjusted an architecture decision (phase 2), smoothed out a dependency ordering (phase 3), and sharpened a too-weak test assertion (phase 5); phase 4 ran through apart from one tiny style correction. The first three interventions required not a single glance at the final code – they took place at levels I could judge without a code dive: requirement, concept, plan. The fourth happened at the test level, but even there I didn't read the implementation, only what the test asserts – and so it stayed just as much a matter of minutes.

And that is the point I had described theoretically in the HITL post, but could now see in practice: it is not about finding errors. It is about holding the course. The agents made no gross technical errors. They made implicit decisions – small, plausible, but wrong. And the moment I saw these decisions was always when they still cost nothing.

Where the Overhead Really Lies

Yes, this workflow took longer than a direct prompt – with all course corrections a good hour instead of maybe 20 minutes. I pay that time deliberately, because it produces exactly what a direct prompt does not: safety and quality. Each of the four corrections I made at the requirement, concept, plan, and test level would have turned into hours of refactoring, a broken build, or creeping duplication in the finished code – or worse, only in production.

For small, isolated changes this workflow is overkill. For anything that affects multiple layers and must be integrated into a grown codebase, the overhead pays off – not immediately, but consistently.

Conclusion

This walkthrough was not a showcase. It was an honest look at a process that does exactly what it is supposed to – and still needs human interventions. At four points. With a total of around 30 minutes of review effort, embedded in a feature that took a good hour from first requirement to tested commit.

What I have at the end: no surprises, no backtracking, no "I should have known that earlier." The corrections that were needed cost me minutes – not hours. The artifacts from each phase were already in the project and transparently captured why each decision was made the way it was.

And that is exactly what makes this middle ground appealing to me. When I line up the three approaches side by side, the picture becomes clear:

Vibe Coding is fast. One prompt, a few minutes, a runnable result. But no concept, no architecture plan, no traceable decision path. In a grown codebase, this approach produces problems you pay for expensively later – I described that in detail in post 11.
Traditional software development with real Product Owners, Business Analysts, Solution Architects, plannings, groomings, and a subsequent implementation phase, on the other hand, produces very reliable results – but stretches out over days or weeks for a single feature. That is the right and necessary choice for many projects. For a smaller initiative or an individual, it is simply not feasible.
Agentic + Human-in-the-Loop sits in between. The entire process – requirements analysis, concept, plan, implementation, and verification – fits into a good hour because the agents do the legwork. The human decides where it matters: at the transitions between requirement, architecture, plan, code, and test. The structural discipline of classical development is preserved, just in a form that scales appropriately for a single feature.

From my perspective this middle ground is defensible – more than that: it is exactly the point at which AI-assisted development starts to become seriously usable in the engineering sense. Not as fast as a single prompt, but more reliable. Not as thorough as a classical project setup, but thorough enough for most tasks.

So far I've shown what the five agents do and how they work together in practice. In the next post it gets hands-on: how you set up such an agent team in the first place – with ready-made templates as a shortcut or scaffolded from scratch yourself.