
Spec-Driven Development in Practice #3: Implementation — Orchestrate to a PR with spec-implement
Introduction
This is the final installment of the agent-skills practical guide. Part 1 (Setup) generated the project's foundation documents with spec-workflow-init and spec-rules-init. Part 2 (Spec Authoring) used spec-generator / spec-inspect / spec-to-issue to turn intent into a structured GitHub Issue.
Part 3 takes that Issue and runs it all the way to a PR, with four skills that split orchestration, implementation, review, and testing.
Key Takeaways
spec-implementis a pure orchestrator (it does not write code or perform reviews itself). It delegates to worker skills and drives the pipeline.- Eight phases: Load Context → Issue Analysis → mandatory feature branch → Task Loop ([code] / [orchestrator] / Review Gate -R) → Final Quality Gate → PR creation
spec-coderuns in Phase A (full context) or Phase B (--feedbackwith minimal context) and addresses review/test findings via--feedback {file}spec-reviewruns a rule × file matrix againstreview_rules.mdandcoding-rules.md, classifies findings as Critical / Improvement / Minor, and writesreview-{task-id}.mdspec-testderives test cases from the task's completion criteria, auto-detects the test framework and patterns, and writestest-{task-id}.md- Dispatch modes: Codex sub-agents / Claude Code agent team / cmux dispatch / single-agent sequential — selected automatically based on environment
- Everything Part 1 and Part 2 produced (
issue-to-pr-workflow.md,coding-rules.md,review_rules.md,.specs/{feature}/) feeds directly into this pipeline
The Four Skills and Their Division of Labor
The implementation set is "one conductor + three workers."
| Skill | Role | Writes / Reads |
|---|---|---|
spec-implement | Orchestrator: invokes workers, aggregates results, checks off tasks.md, opens the PR | Does not write code or perform reviews itself |
spec-code | Implements a single task; also handles fixes via --feedback | Writes implementation code, commits |
spec-review | Rule × file matrix review | Writes the review result file |
spec-test | Generates and runs tests from completion criteria | Writes test code, runs the test command |
spec-implement's SKILL.md is explicit: "🚨 BLOCKING — orchestrator only." If a worker is missing, it stops and tells you to install it — it never falls back to doing the work itself.
spec-implement: The Orchestration Loop
spec-implement drives Issue → PR through eight phases.
npx skills add anyoneanderson/agent-skills --skill spec-implement -g -yImplement from spec --issue 42
# or
仕様書から実装 --issue 42 --spec .specs/auth-feature/
# or
Implement from spec --resume # resume from last unchecked task
Eight Phases
| Phase | Action |
|---|---|
| 1–3 | Load context (workflow, coding rules, project instructions, all spec files) |
| 4 | Issue analysis (gh issue view {N} for title, body, labels) |
| 5 | 🚨 Mandatory feature branch feature/issue-{N}-{brief} — blocks main / master / develop |
| 6 | Task Loop (the core, covered below) |
| 7 | Final quality gate (run test / lint / typecheck; verify all tasks.md checkboxes ticked) |
| 8 | PR creation (gh pr create with workflow template; does not open a PR if tests fail) |
Phase 6: Task Loop and Role Tags
Each Phase in tasks.md carries a role tag. spec-implement branches its behavior accordingly.
[code]: delegate implementation tospec-code[orchestrator]:spec-implementruns commands directly (no file mutation)-Rsuffix (e.g.,Phase 2-R: Review Gate [orchestrator]): a Review Gate that runsspec-review+spec-testagainst every task in the preceding[code]phase
Phases without a tag are treated as [code] for backward compatibility with pre-v3 specs.
The Review Gate Fix Loop
Inside a Review Gate phase, each preceding [code] task goes through:
spec-review → Critical found → spec-code --feedback {review.md}
→ re-spec-review → ... up to 3 iterations
spec-test → FAIL → spec-code --feedback {test.md}
→ re-spec-test
review PASS AND test PASS → tick the Review Gate task checkboxAfter three failed iterations, the skill asks via AskUserQuestion — a deliberate safety stop.
Dispatch Modes
How workers get invoked depends on the environment. Four options, auto-selected:
| Mode | Condition | How |
|---|---|---|
| Codex sub-agents | Running in Codex + .codex/agents/workflow-*.toml present | Spawn the custom agent and have it run /spec-code, etc. |
| Claude Code agent team | Running in Claude Code + .claude/agents/workflow-*.md present + CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 | Create teammates and assign roles |
| cmux dispatch | CMUX_SOCKET_PATH set + workflow explicitly selects cmux | cmux-delegate dispatches to another pane |
| Single agent | None of the above | Run sequentially in the current session |
Priority: Codex sub-agents → Claude Code agent team → cmux dispatch → single agent. The role definitions you generated in Part 1 (workflow-implementer.md etc.) are what enables the runtime-native modes.
spec-code: Implement a Single Task
spec-code implements one task per invocation. It does not check off tasks.md — that is the orchestrator's job.
npx skills add anyoneanderson/agent-skills --skill spec-code -g -y/spec-code --issue 42 --task T-007 --spec .specs/auth-feature/
Phase A / Phase B Context Loading
Two context modes:
Phase A (first invocation, full context):
- Locate the workflow file, take the implementer role
gh issue view {N}if--issueis provided- Read
requirement.md/design.md/tasks.mdin--spec - Read
coding-rules.md/CLAUDE.md/AGENTS.md
Phase B (--feedback re-invocation, minimal context):
- The feedback file (review or test results)
- The target task description from
tasks.md - The relevant
design.mdsection - The files changed in the previous implementation
The split keeps fix iterations focused and cheap.
--feedback Auto-Detection
The feedback file's type: header drives behavior:
type: review: read## Findings; fix Critical findings at the specified file:line first, then Improvementstype: test: read## Test Casesfailures and## Completion Criteria Coverageuncovered items; modify implementation to pass
A standing constraint: do not touch code outside what the findings reference.
Commit
When done, commit. The format comes from coding-rules.md / CLAUDE.md (default feat(scope): {task-id} — {brief}). Stage only implementation files, not tasks.md — the orchestrator updates that later.
spec-review: Rule × File Matrix Review
spec-review enforces a rule × file matrix so every applicable rule is checked against every changed file.
npx skills add anyoneanderson/agent-skills --skill spec-review -g -y/spec-review --task T-007 --base-commit abc1234 --spec .specs/auth-feature/
# or, standalone on the current staged diff:
/spec-review
Step 1: Collect Rules
Read review_rules.md (generated in Part 1) and coding-rules.md and parse into a structured list:
rule_list: [
{ id: "RR-001", severity: "Critical", description: "No SQL injection", category: "security" },
{ id: "CR-MUST-001", severity: "MUST", description: "Use strict TypeScript", category: "typescript" },
...
]If no rules files exist, fall back to minimal defaults (security / correctness / style).
Step 2: Acquire the Diff
| Context | Diff command |
|---|---|
--task + --base-commit | git diff {sha}...HEAD |
--task alone | Auto-detect task-start commit; if ambiguous, require --base-commit |
--diff {file} | Read the diff file |
| No options | git diff --cached first; if empty, git diff |
| PR context | git diff {base}...HEAD |
Step 3: The Matrix (core)
for each rule in rule_list:
for each file in changed_files:
if rule.category is relevant to this file type:
check if any added/modified line violates this rule
if violation: record { rule.id, file.path, line_number, description, severity }Category-to-filetype relevance:
- security → all files
- typescript →
.ts/.tsx - test →
*.test.*/*.spec.* - style → all source files
- api → controller / route files
Step 4: Design Consistency Check
If --spec is provided, read the relevant design.md section and compare the implementation against it — interfaces present, data model matches, architectural decisions honored. Deviations get recorded at Improvement severity.
Step 5: Write the Review File
Output goes to .specs/{feature}/review-{task-id}.md:
# Review: T-007
type: review
## Meta
- Reviewer: spec-review
- Date: ...
- Iteration: 1
- Rules checked: 36 rules across 5 files
- Diff basis: git diff abc1234...HEAD
## Findings
### Critical
- [ ] **RR-001** `src/auth/service.ts:48` — SQL injection: raw string concatenation in query
### Improvement
- [ ] **CR-SHOULD-002** `src/auth/service.ts:62` — explicit return type recommended
## Summary
- Critical: 1 | Improvement: 1 | Minor: 0
- Gate: FAILGate logic: any Critical → FAIL / only Improvement+Minor → PASS (with warnings) / nothing → PASS. The file feeds straight into spec-code --feedback.
spec-test: Tests from Completion Criteria
spec-test builds tests from the completion criteria in tasks.md for the target task.
npx skills add anyoneanderson/agent-skills --skill spec-test -g -y/spec-test --task T-007 --spec .specs/auth-feature/
Steps 1–2: Extract Criteria + Detect Patterns
- From
tasks.md, pull the target task's completion criteria, target files, and requirement ID - Scan the project for existing test conventions:
- Test files:
*.test.*,*.spec.*,__tests__/,test/,tests/ - Framework: Jest / Vitest / Mocha / pytest / Go test / Rust test
- Patterns: AAA, describe/it, fixtures
- Commands: from
package.jsonscripts, Makefile, CLAUDE.md
- Test files:
Defaults kick in if no existing tests are found.
Steps 3–4: Design Cases and Write
Three categories of test from criteria + design:
- Happy path: at least one test per completion criterion
- Edge cases: empty inputs, boundaries, error conditions
- Negative tests: invalid inputs, unauthorized access, missing data
Tests follow the detected conventions — naming, placement, AAA structure.
Steps 5–6: Run and Write Results
Use the detected command (npm test / pytest / go test / ...). New tests run first, then the full suite. Results land in .specs/{feature}/test-{task-id}.md:
# Test: T-007
type: test
## Meta
- Tester: spec-test
- Date: ...
- Command: npm test -- --coverage
- Framework: Vitest
## Results
- Tests: 8/9 passed
- Coverage: 87%
- Duration: 1.2s
## Test Cases
- [x] auth-service: returns token on successful login
- [ ] auth-service: rejects bad password — FAILED: expected 401 got 500
## Completion Criteria Coverage
| Criterion | Test | Status |
|---|---|---|
| Login success creates session | login-success | PASS |
| Wrong password returns 401 | login-fail | FAIL |
## Gate: FAILGate: all pass → PASS, any fail → FAIL. Failures route into spec-code --feedback; spec-test does not edit implementation code (separation of concerns).
How Parts 1 and 2 Pay Off Here
Everything you set up in Parts 1 and 2 plugs into this pipeline.
| Document | Generated by (Part) | Used in Part 3 |
|---|---|---|
issue-to-pr-workflow.md | spec-workflow-init (Part 1) | spec-implement reads it during Phase 1–3: base branch, naming, test commands, dispatch strategy |
coding-rules.md | spec-rules-init (Part 1) | spec-code consults during implementation; spec-review parses [MUST] / [SHOULD] style severities into its rule list |
review_rules.md | spec-rules-init (Part 1) | spec-review uses it as the primary rule list |
.claude/agents/workflow-*.md / .codex/agents/workflow-*.toml | spec-workflow-init (Part 1) | spec-implement chooses its dispatch mode based on which files are present |
requirement.md / design.md / tasks.md | spec-generator (Part 2) | spec-code reads context, spec-review verifies design fit, spec-test extracts completion criteria |
| GitHub Issue | spec-to-issue (Part 2) | Becomes spec-implement --issue {N} input; PR body uses closes #{N} |
The end-to-end picture:
spec-workflow-init / spec-rules-init ← Setup (Part 1)
↓ docs/issue-to-pr-workflow.md / docs/coding-rules.md / docs/review_rules.md
spec-generator → spec-inspect → spec-to-issue ← Spec authoring (Part 2)
↓ .specs/{feature}/ + GitHub Issue #N
spec-implement(--issue N) ← Implementation (Part 3)
├─ spec-code
├─ spec-review
└─ spec-test
↓
PRFAQ
Q. Does spec-implement really never write code itself?
A. Correct. The SKILL.md flags this as "🚨 BLOCKING — orchestrator only". If a worker is missing, it stops and points you to the install command rather than silently falling back. This prevents the conductor from leaking out of role and corrupting the contract.
Q. How many fix iterations does --feedback run?
A. Up to three per Review Gate task. If the gate still fails after three iterations, the skill asks for your decision via AskUserQuestion. Test failures follow the same fix → retest pattern.
Q. What if my old spec files have no [code] / [orchestrator] tags?
A. Untagged phases default to [code] for backward compatibility with pre-v3 specs.
Q. Which runtime should I pick — Codex, Claude Code, or cmux?
A. Depends on your environment, but if you generated the role definitions in Part 1, spec-implement will prefer runtime-native multi-agent (Codex sub-agents or Claude Code agent team). cmux dispatch only kicks in when the workflow explicitly selects it.
Q. What happens if tests fail at PR time?
A. No PR. Phase 7's final quality gate (test / lint / typecheck) must pass first. Failures get pushed back through spec-code --feedback; if that still fails, you're prompted to decide.
Q. How does --resume work?
A. It picks up at the first unchecked task in tasks.md. Phase 6 is built to be idempotent — interrupt, then run spec-implement --resume to continue.
Summary
spec-implement / spec-code / spec-review / spec-test split orchestration, implementation, review, and testing into four distinct skills. The orchestrator never writes code, the reviewer and tester never fix implementations, and the workers participate in the fix loop only through --feedback. That role separation is what lets the AI drive itself responsibly.
Across the three articles, the nine spec skills add up to a flow where "create the full spec," "turn it into an issue," "start implementation" is enough to reach a PR. At ZenChAIne we are running this on real projects and refining the operational details as we go.
That closes the "Spec-Driven Development in Practice" series. Parts 1, 2, and 3 cover the nine spec skills end-to-end. The real value shows up when you start growing your own coding-rules.md, review_rules.md, and agent definitions for your team — that's where the loop tightens.
References
- agent-skills - GitHub
- spec-implement SKILL.md
- spec-code SKILL.md
- spec-review SKILL.md
- spec-test SKILL.md
- Spec-Driven Development in Practice #1: Setup - ZenChAIne
- Spec-Driven Development in Practice #2: Spec Authoring - ZenChAIne
- What Are Agent Skills - ZenChAIne
- agent-skills open-source release - ZenChAIne
- skills.sh