The Spec-First Workflow

You’ve got your tools set up and you’ve had the Claude Code in Action course. Now comes the thing that actually makes agents useful: giving them good work.

This is the workflow you’ll use every time you work with an agent. Six steps. Whether it’s a quick bug fix or a multi-day feature, the shape is the same. The steps themselves aren’t complicated — the discipline of following them is what separates good agent output from confident garbage.

The six steps

  1. UNDERSTAND    →  What's being asked? Why?
  2. SPECIFY       →  Write down what "done" looks like
  3. DECOMPOSE     →  Break it into agent-sized pieces
  4. DELEGATE      →  Hand it to an agent (or do it yourself)
  5. REVIEW        →  Check the output against the spec
  6. MERGE & LEARN →  Ship it, then improve the system

Let’s walk through each one.

Step 1: Understand

Before you touch any tool, make sure you actually understand what needs to be built. This step hasn’t changed from pre-agent work — but it matters more now. A misunderstanding that you’d catch while coding gets baked into the spec and multiplied by the agent. The agent won’t come over and ask you what you meant. It’ll just build what you wrote.

Do this:

Read the ticket fully, including comments and linked issues.
Check if there are existing patterns in the codebase for similar work.
Ask clarifying questions to the PO or client before starting. Not after the agent has already written 500 lines.
For client work, check the client-specific CLAUDE.md and any architecture docs.

Not this:

Starting to spec before you understand the “why” behind the request.
Assuming the ticket is complete. Tickets written for human devs often leave context implicit. Agents need it made explicit.

Step 2: Specify

This is where most of your time goes now. If that feels backwards (“wait, shouldn’t I be coding?”), go back to the time-split in the first chapter. A good spec is the single biggest predictor of whether agent output will be useful or useless.

You’ve already seen the gold-standard ticket template in the first chapter. That template covers what the requester needs to provide. The spec you write here is the developer’s layer on top — the technical approach.

## Task: [Short title]

### What
[One paragraph: what needs to happen and why.]

### Acceptance criteria
- [ ] [Specific, testable condition]
- [ ] [Specific, testable condition]
- [ ] [Specific, testable condition]

### Technical approach
- Files to modify: [list specific files or directories]
- Pattern to follow: [link to or describe an existing similar implementation]
- Key constraints: [e.g., "must use existing Repository pattern", "no new NuGet packages"]

### Out of scope
- [Explicitly list what should NOT be changed]
- [e.g., "Do not modify the database schema", "Do not change the API contract"]

### Tests
- [What tests should be written or updated]
- [Reference existing test patterns if applicable]

The “out of scope” section is easy to skip, but it’s one of the most valuable parts. Without it, agents will helpfully refactor things you didn’t ask them to touch. A clear “do NOT modify X” saves you from reviewing unnecessary changes.

GitHub’s Spec Kit takes this idea further. It uses a “constitution” file (your project’s coding standards and principles) that feeds into specs, plans, and task breakdowns (GitHub blog). You don’t need to adopt Spec Kit, but the principle is the same: write the spec before any code gets generated.

Addy Osmani calls a similar approach “waterfall in 15 minutes” — brainstorm a spec covering requirements, architecture, and testing strategy, feed it to a reasoning model to generate a project plan, then execute one task at a time (addyosmani.com). The name is tongue-in-cheek, but the point is real. The upfront work that used to feel like overhead is now what makes everything downstream work.

If you’re new to writing specs: Nobody writes great specs on day one. The template above is a starting point, not a test. You’ll get a feel for how much detail is “enough” after a few rounds of seeing what agents do with different levels of specificity. If the output is wrong, the spec is usually the first place to look.

Step 3: Decompose

Break the spec into tasks that are small enough for a single agent session. A good target is under 300 lines of change per task. Each task should be completable without needing to understand what the other tasks did.

GitHub’s WRAP principle captures this well:

Write effective issues — the issue IS the prompt. Quality in, quality out.
Refine instructions — update CLAUDE.md when you learn what agents get wrong.
Atomic tasks — one concern per task, under 300 lines, clear test criteria.
Pair with review — agents are async collaborators, not autonomous decision-makers.

Good decomposition:

Ticket: "Add user profile editing to the dashboard"
↓
Task 1: Add UpdateProfileCommand and handler (CQRS pattern)
Task 2: Add PUT /api/profile endpoint in ProfileController
Task 3: Add ProfileEditForm React component with validation
Task 4: Add integration tests for profile update flow
Task 5: Update API documentation

Bad decomposition:

Task 1: "Implement user profile editing"  ← too broad, too ambiguous

The good version has five tasks that can each be done independently. The bad version is one task that would take the agent through dozens of files with no clear stopping point. It would probably produce something, but reviewing it would be a nightmare.

Think of decomposition as parallelisable work — because with agents, you can literally run tasks 1–5 in parallel on separate branches (more on this in the Orchestration chapter).

Step 4: Delegate (or do it yourself)

Not everything should go to an agent. Some tasks are a clean hand-off, some need a detailed spec, and some need you at the keyboard. This table is a starting point — you’ll develop your own instincts over time.

Task type	Delegate?	Why
CRUD operations, boilerplate	Yes	Well-defined patterns, easy to verify
Documentation updates	Yes	Low risk, easy to review
Bug fix with clear repro steps	Yes	Scoped, testable
New feature with clear spec and existing patterns	Yes	If the codebase has examples to follow
Architecture decisions	No	Requires judgment and context the agent doesn’t have
Test generation	Carefully	Repetitive and pattern-based, but agents tend to test implementation rather than intent — they’ll write tests that pass without catching real bugs. Specify what to test, not just “add tests”
Security-sensitive code	Carefully	Delegate but review with extra scrutiny
Complex business logic	Do it yourself or pair	Too much implicit domain knowledge
Performance optimisation	Do it yourself	Requires profiling, measurement, judgment
CMS content model design	Design yourself (for now…), delegate implementation	Content type design needs business context; creating the code is mechanical
Unfamiliar third-party integrations	Do it yourself	Agent may hallucinate APIs that don’t exist

When you delegate with Claude Code, be specific:

> I need you to implement the UpdateProfileCommand handler.
>
> Here's the spec:
> - Create UpdateProfileCommand in src/Commands/UpdateProfileCommand.cs
> - Create UpdateProfileCommandHandler in src/Commands/Handlers/
> - Follow the same pattern as CreateUserCommand (look at that file for reference)
> - Validate that email is unique (excluding the current user)
> - Use the existing IUserRepository
> - Write unit tests in tests/Unit/Commands/UpdateProfileCommandTests.cs
>
> Do NOT modify the User entity or the database schema.
> Run dotnet build and dotnet test when done.

Notice the structure: what to do, what pattern to follow, what constraints apply, what NOT to do, and how to verify. That’s a spec-in-miniature. The more of these elements you include, the better the output.

Background delegation — assigning work to agents that run while you do something else — is where things really scale. You can assign a GitHub issue to @copilot and it’ll create a branch, implement the change, and open a draft PR. Or you can trigger the Claude Code GitHub Action to do the same. We’ll set up the CI integration later. For now, interactive Claude Code sessions are where you’ll build the muscle memory.

Step 5: Review

This gets its own full chapter (Quality Gates) because it’s that important. The short version: AI-generated code needs harder review, not easier review.

The key habit is plan-then-execute. Ask the agent to explain its plan before it starts writing code. It’s much easier to catch a wrong approach at the plan stage than to untangle 500 lines of generated code.

Boris Cherny, who created Claude Code, works exactly this way. He starts in Plan Mode, goes back and forth on the plan until he’s happy with it, then lets Claude execute the whole thing (InfoQ).

In Claude Code, this looks like:

# Ask for a plan first
> Before you make any changes, explain your plan for implementing
> the UpdateProfileCommand. List the files you'll create or modify,
> the approach you'll take, and any assumptions you're making.

# Review the plan — does it match what you expect?
# Then give the go-ahead
> That looks right. Go ahead and implement it.

Jellyfish recommends a minimum 30-minute review for AI-generated PRs (Jellyfish). That might sound like a lot, but the standard is simple: you should be able to explain every line in a PR you approve. If you can’t, it’s not ready to merge.

Step 6: Merge and learn

When the PR is good, merge it. Then close the loop:

If the agent made a mistake it could avoid with better instructions, update the project’s CLAUDE.md. Boris Cherny’s team at Anthropic does this routinely — when a colleague’s PR reveals an AI pitfall, they tag it with @.claude to capture the learning (InfoQ).
If the spec was ambiguous and caused the wrong implementation, improve the spec or your decomposition. The template isn’t sacred — adjust it to fit how you work.
If the agent got it right first try, notice what made that task work well. Was the spec especially clear? Was there a good existing pattern to follow? Those are your best candidates for future delegation.

This is how the system gets better over time. Every merged PR is a chance to teach the agent something. The CLAUDE.md is a living document, not a write-once config file — and the Context Files chapter goes deep on how to structure it.

Running tasks in parallel

Once the basic workflow feels natural, the next multiplier is running more than one task at a time. Each task runs in its own git branch so they don’t conflict. Claude Code’s --worktree flag handles this for you — it creates an isolated working directory with its own branch, no manual git commands needed.

# Start a session in its own worktree
claude --worktree feature-auth

# Start another, in parallel
claude --worktree bugfix-123

Boris Cherny runs five local sessions plus five to ten cloud sessions simultaneously (InfoQ). You don’t need to start there. Even two or three parallel sessions is a significant jump. The Orchestration chapter covers the full model — worktrees, Desktop sessions, remote execution, and how to monitor multiple streams without losing track.

Exercise 02 — Write your first spec (frontend)

Your first time putting the full workflow into practice. The goal: write a real spec, break it down, hand one piece to the agent, and review what comes back.

What to do: Pick a small feature or bug from the Next.js practice repo’s open issues. Write a spec using the template above. Decompose it into 2–3 agent-ready tasks. Delegate one task to Claude Code. Review the output against your spec.
What to look for: How does the quality of the output relate to the quality of your spec? Did the agent touch files you didn’t mention? Did the “out of scope” section actually prevent drift? On the frontend side specifically: did it respect your component patterns? Did it add unexpected dependencies? Did it generate sensible markup and accessibility attributes, or did you need to specify those?
Common pitfalls: Specs that are too vague (“make the page faster”) or too prescriptive (dictating every line). The sweet spot is clear intent with room for the agent to choose an approach. Frontend-specific traps to watch for: forgetting to specify which component library or styling approach to use (agents will happily install something new), and not mentioning responsive behaviour or accessibility (the agent won’t think about these unless you do).
Reflection prompt: If you were going to delegate this same task again tomorrow, what would you change about the spec? Frontend specs often need more “out of scope” items than you’d expect — don’t change the layout, don’t add packages, don’t refactor the routing. Did yours have enough?

Exercise 02b — Write your first spec (backend)

Same workflow, different codebase. If your day-to-day is backend work, do this exercise instead of (or in addition to) Exercise 02.

What to do: Pick a small feature or bug from the Alloy CMS practice repo’s open issues. Write a spec using the template above. Decompose it into 2–3 agent-ready tasks. Delegate one task to Claude Code. Review the output against your spec.
What to look for: Same as Exercise 02 — plus: did the agent follow existing architectural patterns (e.g., Repository, CQRS)? Did it modify things you didn’t ask it to, like the database schema or API contracts?
Common pitfalls: Assuming the agent knows your project’s conventions. Backend codebases often have stricter architectural patterns, and if you don’t point the agent at an existing example to follow, it’ll invent its own approach. Also worth specifying: which test framework and assertion style to use — the agent won’t guess right consistently.
Reflection prompt: If you were going to delegate this same task again tomorrow, what would you change about the spec?

Putting it together

The whole workflow comes down to one idea: tell the agent exactly what you want before it starts working. The six steps are how that plays out in practice. You’ll get faster at all of them, especially spec-writing. The template helps at first. Then it becomes second nature and you won’t need it.

Next, we’ll look at the files that tell the agent how to work in your codebase: CLAUDE.md, the three-level hierarchy, and how to keep them useful without letting them bloat.