記事一覧に戻る
Spec-Driven Development in Practice #3: Implementation — Orchestrate to a PR with spec-implement

Spec-Driven Development in Practice #3: Implementation — Orchestrate to a PR with spec-implement

ZenChAIne·
Agent SkillsSpec-Driven DevelopmentClaude Code

Introduction

This is the final installment of the agent-skills practical guide. Part 1 (Setup) generated the project's foundation documents with spec-workflow-init and spec-rules-init. Part 2 (Spec Authoring) used spec-generator / spec-inspect / spec-to-issue to turn intent into a structured GitHub Issue.

Part 3 takes that Issue and runs it all the way to a PR, with four skills that split orchestration, implementation, review, and testing.

Key Takeaways

  • spec-implement is a pure orchestrator (it does not write code or perform reviews itself). It delegates to worker skills and drives the pipeline.
  • Eight phases: Load Context → Issue Analysis → mandatory feature branch → Task Loop ([code] / [orchestrator] / Review Gate -R) → Final Quality Gate → PR creation
  • spec-code runs in Phase A (full context) or Phase B (--feedback with minimal context) and addresses review/test findings via --feedback {file}
  • spec-review runs a rule × file matrix against review_rules.md and coding-rules.md, classifies findings as Critical / Improvement / Minor, and writes review-{task-id}.md
  • spec-test derives test cases from the task's completion criteria, auto-detects the test framework and patterns, and writes test-{task-id}.md
  • Dispatch modes: Codex sub-agents / Claude Code agent team / cmux dispatch / single-agent sequential — selected automatically based on environment
  • Everything Part 1 and Part 2 produced (issue-to-pr-workflow.md, coding-rules.md, review_rules.md, .specs/{feature}/) feeds directly into this pipeline

The Four Skills and Their Division of Labor

The implementation set is "one conductor + three workers."

SkillRoleWrites / Reads
spec-implementOrchestrator: invokes workers, aggregates results, checks off tasks.md, opens the PRDoes not write code or perform reviews itself
spec-codeImplements a single task; also handles fixes via --feedbackWrites implementation code, commits
spec-reviewRule × file matrix reviewWrites the review result file
spec-testGenerates and runs tests from completion criteriaWrites test code, runs the test command

spec-implement's SKILL.md is explicit: "🚨 BLOCKING — orchestrator only." If a worker is missing, it stops and tells you to install it — it never falls back to doing the work itself.

spec-implement: The Orchestration Loop

spec-implement drives Issue → PR through eight phases.

bash
npx skills add anyoneanderson/agent-skills --skill spec-implement -g -y
Implement from spec --issue 42
# or
仕様書から実装 --issue 42 --spec .specs/auth-feature/
# or
Implement from spec --resume   # resume from last unchecked task

Eight Phases

PhaseAction
1–3Load context (workflow, coding rules, project instructions, all spec files)
4Issue analysis (gh issue view {N} for title, body, labels)
5🚨 Mandatory feature branch feature/issue-{N}-{brief} — blocks main / master / develop
6Task Loop (the core, covered below)
7Final quality gate (run test / lint / typecheck; verify all tasks.md checkboxes ticked)
8PR creation (gh pr create with workflow template; does not open a PR if tests fail)

Phase 6: Task Loop and Role Tags

Each Phase in tasks.md carries a role tag. spec-implement branches its behavior accordingly.

  • [code]: delegate implementation to spec-code
  • [orchestrator]: spec-implement runs commands directly (no file mutation)
  • -R suffix (e.g., Phase 2-R: Review Gate [orchestrator]): a Review Gate that runs spec-review + spec-test against every task in the preceding [code] phase

Phases without a tag are treated as [code] for backward compatibility with pre-v3 specs.

The Review Gate Fix Loop

Inside a Review Gate phase, each preceding [code] task goes through:

text
spec-review → Critical found → spec-code --feedback {review.md}
  → re-spec-review → ... up to 3 iterations
spec-test  → FAIL → spec-code --feedback {test.md}
  → re-spec-test
review PASS AND test PASS → tick the Review Gate task checkbox

After three failed iterations, the skill asks via AskUserQuestion — a deliberate safety stop.

Dispatch Modes

How workers get invoked depends on the environment. Four options, auto-selected:

ModeConditionHow
Codex sub-agentsRunning in Codex + .codex/agents/workflow-*.toml presentSpawn the custom agent and have it run /spec-code, etc.
Claude Code agent teamRunning in Claude Code + .claude/agents/workflow-*.md present + CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1Create teammates and assign roles
cmux dispatchCMUX_SOCKET_PATH set + workflow explicitly selects cmuxcmux-delegate dispatches to another pane
Single agentNone of the aboveRun sequentially in the current session

Priority: Codex sub-agents → Claude Code agent team → cmux dispatch → single agent. The role definitions you generated in Part 1 (workflow-implementer.md etc.) are what enables the runtime-native modes.

spec-code: Implement a Single Task

spec-code implements one task per invocation. It does not check off tasks.md — that is the orchestrator's job.

bash
npx skills add anyoneanderson/agent-skills --skill spec-code -g -y
/spec-code --issue 42 --task T-007 --spec .specs/auth-feature/

Phase A / Phase B Context Loading

Two context modes:

Phase A (first invocation, full context):

  1. Locate the workflow file, take the implementer role
  2. gh issue view {N} if --issue is provided
  3. Read requirement.md / design.md / tasks.md in --spec
  4. Read coding-rules.md / CLAUDE.md / AGENTS.md

Phase B (--feedback re-invocation, minimal context):

  1. The feedback file (review or test results)
  2. The target task description from tasks.md
  3. The relevant design.md section
  4. The files changed in the previous implementation

The split keeps fix iterations focused and cheap.

--feedback Auto-Detection

The feedback file's type: header drives behavior:

  • type: review: read ## Findings; fix Critical findings at the specified file:line first, then Improvements
  • type: test: read ## Test Cases failures and ## Completion Criteria Coverage uncovered items; modify implementation to pass

A standing constraint: do not touch code outside what the findings reference.

Commit

When done, commit. The format comes from coding-rules.md / CLAUDE.md (default feat(scope): {task-id} — {brief}). Stage only implementation files, not tasks.md — the orchestrator updates that later.

spec-review: Rule × File Matrix Review

spec-review enforces a rule × file matrix so every applicable rule is checked against every changed file.

bash
npx skills add anyoneanderson/agent-skills --skill spec-review -g -y
/spec-review --task T-007 --base-commit abc1234 --spec .specs/auth-feature/
# or, standalone on the current staged diff:
/spec-review

Step 1: Collect Rules

Read review_rules.md (generated in Part 1) and coding-rules.md and parse into a structured list:

text
rule_list: [
  { id: "RR-001", severity: "Critical", description: "No SQL injection", category: "security" },
  { id: "CR-MUST-001", severity: "MUST", description: "Use strict TypeScript", category: "typescript" },
  ...
]

If no rules files exist, fall back to minimal defaults (security / correctness / style).

Step 2: Acquire the Diff

ContextDiff command
--task + --base-commitgit diff {sha}...HEAD
--task aloneAuto-detect task-start commit; if ambiguous, require --base-commit
--diff {file}Read the diff file
No optionsgit diff --cached first; if empty, git diff
PR contextgit diff {base}...HEAD

Step 3: The Matrix (core)

text
for each rule in rule_list:
  for each file in changed_files:
    if rule.category is relevant to this file type:
      check if any added/modified line violates this rule
      if violation: record { rule.id, file.path, line_number, description, severity }

Category-to-filetype relevance:

  • security → all files
  • typescript → .ts / .tsx
  • test → *.test.* / *.spec.*
  • style → all source files
  • api → controller / route files

Step 4: Design Consistency Check

If --spec is provided, read the relevant design.md section and compare the implementation against it — interfaces present, data model matches, architectural decisions honored. Deviations get recorded at Improvement severity.

Step 5: Write the Review File

Output goes to .specs/{feature}/review-{task-id}.md:

markdown
# Review: T-007
type: review
 
## Meta
- Reviewer: spec-review
- Date: ...
- Iteration: 1
- Rules checked: 36 rules across 5 files
- Diff basis: git diff abc1234...HEAD
 
## Findings
 
### Critical
- [ ] **RR-001** `src/auth/service.ts:48` — SQL injection: raw string concatenation in query
 
### Improvement
- [ ] **CR-SHOULD-002** `src/auth/service.ts:62` — explicit return type recommended
 
## Summary
- Critical: 1 | Improvement: 1 | Minor: 0
- Gate: FAIL

Gate logic: any Critical → FAIL / only Improvement+Minor → PASS (with warnings) / nothing → PASS. The file feeds straight into spec-code --feedback.

spec-test: Tests from Completion Criteria

spec-test builds tests from the completion criteria in tasks.md for the target task.

bash
npx skills add anyoneanderson/agent-skills --skill spec-test -g -y
/spec-test --task T-007 --spec .specs/auth-feature/

Steps 1–2: Extract Criteria + Detect Patterns

  1. From tasks.md, pull the target task's completion criteria, target files, and requirement ID
  2. Scan the project for existing test conventions:
    • Test files: *.test.*, *.spec.*, __tests__/, test/, tests/
    • Framework: Jest / Vitest / Mocha / pytest / Go test / Rust test
    • Patterns: AAA, describe/it, fixtures
    • Commands: from package.json scripts, Makefile, CLAUDE.md

Defaults kick in if no existing tests are found.

Steps 3–4: Design Cases and Write

Three categories of test from criteria + design:

  • Happy path: at least one test per completion criterion
  • Edge cases: empty inputs, boundaries, error conditions
  • Negative tests: invalid inputs, unauthorized access, missing data

Tests follow the detected conventions — naming, placement, AAA structure.

Steps 5–6: Run and Write Results

Use the detected command (npm test / pytest / go test / ...). New tests run first, then the full suite. Results land in .specs/{feature}/test-{task-id}.md:

markdown
# Test: T-007
type: test
 
## Meta
- Tester: spec-test
- Date: ...
- Command: npm test -- --coverage
- Framework: Vitest
 
## Results
- Tests: 8/9 passed
- Coverage: 87%
- Duration: 1.2s
 
## Test Cases
- [x] auth-service: returns token on successful login
- [ ] auth-service: rejects bad password — FAILED: expected 401 got 500
 
## Completion Criteria Coverage
| Criterion | Test | Status |
|---|---|---|
| Login success creates session | login-success | PASS |
| Wrong password returns 401 | login-fail | FAIL |
 
## Gate: FAIL

Gate: all pass → PASS, any fail → FAIL. Failures route into spec-code --feedback; spec-test does not edit implementation code (separation of concerns).

How Parts 1 and 2 Pay Off Here

Everything you set up in Parts 1 and 2 plugs into this pipeline.

DocumentGenerated by (Part)Used in Part 3
issue-to-pr-workflow.mdspec-workflow-init (Part 1)spec-implement reads it during Phase 1–3: base branch, naming, test commands, dispatch strategy
coding-rules.mdspec-rules-init (Part 1)spec-code consults during implementation; spec-review parses [MUST] / [SHOULD] style severities into its rule list
review_rules.mdspec-rules-init (Part 1)spec-review uses it as the primary rule list
.claude/agents/workflow-*.md / .codex/agents/workflow-*.tomlspec-workflow-init (Part 1)spec-implement chooses its dispatch mode based on which files are present
requirement.md / design.md / tasks.mdspec-generator (Part 2)spec-code reads context, spec-review verifies design fit, spec-test extracts completion criteria
GitHub Issuespec-to-issue (Part 2)Becomes spec-implement --issue {N} input; PR body uses closes #{N}

The end-to-end picture:

text
spec-workflow-init / spec-rules-init  ← Setup (Part 1)
  ↓ docs/issue-to-pr-workflow.md / docs/coding-rules.md / docs/review_rules.md
spec-generator → spec-inspect → spec-to-issue  ← Spec authoring (Part 2)
  ↓ .specs/{feature}/ + GitHub Issue #N
spec-implement(--issue N)  ← Implementation (Part 3)
  ├─ spec-code
  ├─ spec-review
  └─ spec-test

PR

FAQ

Q. Does spec-implement really never write code itself?

A. Correct. The SKILL.md flags this as "🚨 BLOCKING — orchestrator only". If a worker is missing, it stops and points you to the install command rather than silently falling back. This prevents the conductor from leaking out of role and corrupting the contract.

Q. How many fix iterations does --feedback run?

A. Up to three per Review Gate task. If the gate still fails after three iterations, the skill asks for your decision via AskUserQuestion. Test failures follow the same fix → retest pattern.

Q. What if my old spec files have no [code] / [orchestrator] tags?

A. Untagged phases default to [code] for backward compatibility with pre-v3 specs.

Q. Which runtime should I pick — Codex, Claude Code, or cmux?

A. Depends on your environment, but if you generated the role definitions in Part 1, spec-implement will prefer runtime-native multi-agent (Codex sub-agents or Claude Code agent team). cmux dispatch only kicks in when the workflow explicitly selects it.

Q. What happens if tests fail at PR time?

A. No PR. Phase 7's final quality gate (test / lint / typecheck) must pass first. Failures get pushed back through spec-code --feedback; if that still fails, you're prompted to decide.

Q. How does --resume work?

A. It picks up at the first unchecked task in tasks.md. Phase 6 is built to be idempotent — interrupt, then run spec-implement --resume to continue.

Summary

spec-implement / spec-code / spec-review / spec-test split orchestration, implementation, review, and testing into four distinct skills. The orchestrator never writes code, the reviewer and tester never fix implementations, and the workers participate in the fix loop only through --feedback. That role separation is what lets the AI drive itself responsibly.

Across the three articles, the nine spec skills add up to a flow where "create the full spec," "turn it into an issue," "start implementation" is enough to reach a PR. At ZenChAIne we are running this on real projects and refining the operational details as we go.

That closes the "Spec-Driven Development in Practice" series. Parts 1, 2, and 3 cover the nine spec skills end-to-end. The real value shows up when you start growing your own coding-rules.md, review_rules.md, and agent definitions for your team — that's where the loop tightens.

References