Spec-Driven Development in Practice #3: Implementation — Orchestrate to a PR with spec-implement

ZenChAIne·June 2, 2026

Agent SkillsSpec-Driven DevelopmentClaude Code

Introduction

This is the final installment of the agent-skills practical guide. Part 1 (Setup) generated the project's foundation documents with spec-workflow-init and spec-rules-init. Part 2 (Spec Authoring) used spec-generator / spec-inspect / spec-to-issue to turn intent into a structured GitHub Issue.

Part 3 takes that Issue and runs it all the way to a PR, with four skills that split orchestration, implementation, review, and testing.

Key Takeaways

spec-implement is a pure orchestrator (it does not write code or perform reviews itself). It delegates to worker skills and drives the pipeline.
Eight phases: Load Context → Issue Analysis → mandatory feature branch → Task Loop ([code] / [orchestrator] / Review Gate -R) → Final Quality Gate → PR creation
spec-code runs in Phase A (full context) or Phase B (--feedback with minimal context) and addresses review/test findings via --feedback {file}
spec-review runs a rule × file matrix against review_rules.md and coding-rules.md, classifies findings as Critical / Improvement / Minor, and writes review-{task-id}.md
spec-test derives test cases from the task's completion criteria, auto-detects the test framework and patterns, and writes test-{task-id}.md
Dispatch modes: Codex sub-agents / Claude Code agent team / cmux dispatch / single-agent sequential — selected automatically based on environment
Everything Part 1 and Part 2 produced (issue-to-pr-workflow.md, coding-rules.md, review_rules.md, .specs/{feature}/) feeds directly into this pipeline

The Four Skills and Their Division of Labor

The implementation set is "one conductor + three workers."

Skill	Role	Writes / Reads
`spec-implement`	Orchestrator: invokes workers, aggregates results, checks off tasks.md, opens the PR	Does not write code or perform reviews itself
`spec-code`	Implements a single task; also handles fixes via `--feedback`	Writes implementation code, commits
`spec-review`	Rule × file matrix review	Writes the review result file
`spec-test`	Generates and runs tests from completion criteria	Writes test code, runs the test command

spec-implement's SKILL.md is explicit: "🚨 BLOCKING — orchestrator only." If a worker is missing, it stops and tells you to install it — it never falls back to doing the work itself.

spec-implement: The Orchestration Loop

spec-implement drives Issue → PR through eight phases.

bash

npx skills add anyoneanderson/agent-skills --skill spec-implement -g -y

Implement from spec --issue 42
# or
仕様書から実装 --issue 42 --spec .specs/auth-feature/
# or
Implement from spec --resume   # resume from last unchecked task

Eight Phases

Phase	Action
1–3	Load context (workflow, coding rules, project instructions, all spec files)
4	Issue analysis (`gh issue view {N}` for title, body, labels)
5	🚨 Mandatory feature branch `feature/issue-{N}-{brief}` — blocks main / master / develop
6	Task Loop (the core, covered below)
7	Final quality gate (run test / lint / typecheck; verify all tasks.md checkboxes ticked)
8	PR creation (`gh pr create` with workflow template; does not open a PR if tests fail)

Phase 6: Task Loop and Role Tags

Each Phase in tasks.md carries a role tag. spec-implement branches its behavior accordingly.

[code]: delegate implementation to spec-code
[orchestrator]: spec-implement runs commands directly (no file mutation)
-R suffix (e.g., Phase 2-R: Review Gate [orchestrator]): a Review Gate that runs spec-review + spec-test against every task in the preceding [code] phase

Phases without a tag are treated as [code] for backward compatibility with pre-v3 specs.

The Review Gate Fix Loop

Inside a Review Gate phase, each preceding [code] task goes through:

text

spec-review → Critical found → spec-code --feedback {review.md}
  → re-spec-review → ... up to 3 iterations
spec-test  → FAIL → spec-code --feedback {test.md}
  → re-spec-test
review PASS AND test PASS → tick the Review Gate task checkbox

After three failed iterations, the skill asks via AskUserQuestion — a deliberate safety stop.

Dispatch Modes

How workers get invoked depends on the environment. Four options, auto-selected:

Mode	Condition	How
Codex sub-agents	Running in Codex + `.codex/agents/workflow-*.toml` present	Spawn the custom agent and have it run `/spec-code`, etc.
Claude Code agent team	Running in Claude Code + `.claude/agents/workflow-*.md` present + `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`	Create teammates and assign roles
cmux dispatch	`CMUX_SOCKET_PATH` set + workflow explicitly selects cmux	`cmux-delegate` dispatches to another pane
Single agent	None of the above	Run sequentially in the current session

Priority: Codex sub-agents → Claude Code agent team → cmux dispatch → single agent. The role definitions you generated in Part 1 (workflow-implementer.md etc.) are what enables the runtime-native modes.

spec-code: Implement a Single Task

spec-code implements one task per invocation. It does not check off tasks.md — that is the orchestrator's job.

bash

npx skills add anyoneanderson/agent-skills --skill spec-code -g -y

/spec-code --issue 42 --task T-007 --spec .specs/auth-feature/

Phase A / Phase B Context Loading

Two context modes:

Phase A (first invocation, full context):

Locate the workflow file, take the implementer role
gh issue view {N} if --issue is provided
Read requirement.md / design.md / tasks.md in --spec
Read coding-rules.md / CLAUDE.md / AGENTS.md

Phase B (--feedback re-invocation, minimal context):

The feedback file (review or test results)
The target task description from tasks.md
The relevant design.md section
The files changed in the previous implementation

The split keeps fix iterations focused and cheap.

`--feedback` Auto-Detection

The feedback file's type: header drives behavior:

type: review: read ## Findings; fix Critical findings at the specified file:line first, then Improvements
type: test: read ## Test Cases failures and ## Completion Criteria Coverage uncovered items; modify implementation to pass

A standing constraint: do not touch code outside what the findings reference.

Commit

When done, commit. The format comes from coding-rules.md / CLAUDE.md (default feat(scope): {task-id} — {brief}). Stage only implementation files, not tasks.md — the orchestrator updates that later.

spec-review: Rule × File Matrix Review

spec-review enforces a rule × file matrix so every applicable rule is checked against every changed file.

bash

npx skills add anyoneanderson/agent-skills --skill spec-review -g -y

/spec-review --task T-007 --base-commit abc1234 --spec .specs/auth-feature/
# or, standalone on the current staged diff:
/spec-review

Step 1: Collect Rules

Read review_rules.md (generated in Part 1) and coding-rules.md and parse into a structured list:

text

rule_list: [
  { id: "RR-001", severity: "Critical", description: "No SQL injection", category: "security" },
  { id: "CR-MUST-001", severity: "MUST", description: "Use strict TypeScript", category: "typescript" },
  ...
]

If no rules files exist, fall back to minimal defaults (security / correctness / style).

Step 2: Acquire the Diff

Context	Diff command
`--task` + `--base-commit`	`git diff {sha}...HEAD`
`--task` alone	Auto-detect task-start commit; if ambiguous, require `--base-commit`
`--diff {file}`	Read the diff file
No options	`git diff --cached` first; if empty, `git diff`
PR context	`git diff {base}...HEAD`

Step 3: The Matrix (core)

text

for each rule in rule_list:
  for each file in changed_files:
    if rule.category is relevant to this file type:
      check if any added/modified line violates this rule
      if violation: record { rule.id, file.path, line_number, description, severity }

Category-to-filetype relevance:

security → all files
typescript → .ts / .tsx
test → *.test.* / *.spec.*
style → all source files
api → controller / route files

Step 4: Design Consistency Check

If --spec is provided, read the relevant design.md section and compare the implementation against it — interfaces present, data model matches, architectural decisions honored. Deviations get recorded at Improvement severity.

Step 5: Write the Review File

Output goes to .specs/{feature}/review-{task-id}.md:

markdown

# Review: T-007
type: review
 
## Meta
- Reviewer: spec-review
- Date: ...
- Iteration: 1
- Rules checked: 36 rules across 5 files
- Diff basis: git diff abc1234...HEAD
 
## Findings
 
### Critical
- [ ] **RR-001** `src/auth/service.ts:48` — SQL injection: raw string concatenation in query
 
### Improvement
- [ ] **CR-SHOULD-002** `src/auth/service.ts:62` — explicit return type recommended
 
## Summary
- Critical: 1 | Improvement: 1 | Minor: 0
- Gate: FAIL

Gate logic: any Critical → FAIL / only Improvement+Minor → PASS (with warnings) / nothing → PASS. The file feeds straight into spec-code --feedback.

spec-test: Tests from Completion Criteria

spec-test builds tests from the completion criteria in tasks.md for the target task.

bash

npx skills add anyoneanderson/agent-skills --skill spec-test -g -y

/spec-test --task T-007 --spec .specs/auth-feature/

Steps 1–2: Extract Criteria + Detect Patterns

From tasks.md, pull the target task's completion criteria, target files, and requirement ID
Scan the project for existing test conventions:
- Test files: *.test.*, *.spec.*, __tests__/, test/, tests/
- Framework: Jest / Vitest / Mocha / pytest / Go test / Rust test
- Patterns: AAA, describe/it, fixtures
- Commands: from package.json scripts, Makefile, CLAUDE.md

Defaults kick in if no existing tests are found.

Steps 3–4: Design Cases and Write

Three categories of test from criteria + design:

Happy path: at least one test per completion criterion
Edge cases: empty inputs, boundaries, error conditions
Negative tests: invalid inputs, unauthorized access, missing data

Tests follow the detected conventions — naming, placement, AAA structure.

Steps 5–6: Run and Write Results

Use the detected command (npm test / pytest / go test / ...). New tests run first, then the full suite. Results land in .specs/{feature}/test-{task-id}.md:

markdown

# Test: T-007
type: test
 
## Meta
- Tester: spec-test
- Date: ...
- Command: npm test -- --coverage
- Framework: Vitest
 
## Results
- Tests: 8/9 passed
- Coverage: 87%
- Duration: 1.2s
 
## Test Cases
- [x] auth-service: returns token on successful login
- [ ] auth-service: rejects bad password — FAILED: expected 401 got 500
 
## Completion Criteria Coverage
| Criterion | Test | Status |
|---|---|---|
| Login success creates session | login-success | PASS |
| Wrong password returns 401 | login-fail | FAIL |
 
## Gate: FAIL

Gate: all pass → PASS, any fail → FAIL. Failures route into spec-code --feedback; spec-test does not edit implementation code (separation of concerns).

How Parts 1 and 2 Pay Off Here

Everything you set up in Parts 1 and 2 plugs into this pipeline.

Document	Generated by (Part)	Used in Part 3
`issue-to-pr-workflow.md`	spec-workflow-init (Part 1)	spec-implement reads it during Phase 1–3: base branch, naming, test commands, dispatch strategy
`coding-rules.md`	spec-rules-init (Part 1)	spec-code consults during implementation; spec-review parses `[MUST]` / `[SHOULD]` style severities into its rule list
`review_rules.md`	spec-rules-init (Part 1)	spec-review uses it as the primary rule list
`.claude/agents/workflow-.md` / `.codex/agents/workflow-.toml`	spec-workflow-init (Part 1)	spec-implement chooses its dispatch mode based on which files are present
`requirement.md` / `design.md` / `tasks.md`	spec-generator (Part 2)	spec-code reads context, spec-review verifies design fit, spec-test extracts completion criteria
GitHub Issue	spec-to-issue (Part 2)	Becomes `spec-implement --issue {N}` input; PR body uses `closes #{N}`

The end-to-end picture:

text

spec-workflow-init / spec-rules-init  ← Setup (Part 1)
  ↓ docs/issue-to-pr-workflow.md / docs/coding-rules.md / docs/review_rules.md
spec-generator → spec-inspect → spec-to-issue  ← Spec authoring (Part 2)
  ↓ .specs/{feature}/ + GitHub Issue #N
spec-implement(--issue N)  ← Implementation (Part 3)
  ├─ spec-code
  ├─ spec-review
  └─ spec-test
  ↓
PR

FAQ

Q. Does `spec-implement` really never write code itself?

A. Correct. The SKILL.md flags this as "🚨 BLOCKING — orchestrator only". If a worker is missing, it stops and points you to the install command rather than silently falling back. This prevents the conductor from leaking out of role and corrupting the contract.

Q. How many fix iterations does `--feedback` run?

A. Up to three per Review Gate task. If the gate still fails after three iterations, the skill asks for your decision via AskUserQuestion. Test failures follow the same fix → retest pattern.

Q. What if my old spec files have no `[code]` / `[orchestrator]` tags?

A. Untagged phases default to [code] for backward compatibility with pre-v3 specs.

Q. Which runtime should I pick — Codex, Claude Code, or cmux?

A. Depends on your environment, but if you generated the role definitions in Part 1, spec-implement will prefer runtime-native multi-agent (Codex sub-agents or Claude Code agent team). cmux dispatch only kicks in when the workflow explicitly selects it.

Q. What happens if tests fail at PR time?

A. No PR. Phase 7's final quality gate (test / lint / typecheck) must pass first. Failures get pushed back through spec-code --feedback; if that still fails, you're prompted to decide.

Q. How does `--resume` work?

A. It picks up at the first unchecked task in tasks.md. Phase 6 is built to be idempotent — interrupt, then run spec-implement --resume to continue.

Summary

spec-implement / spec-code / spec-review / spec-test split orchestration, implementation, review, and testing into four distinct skills. The orchestrator never writes code, the reviewer and tester never fix implementations, and the workers participate in the fix loop only through --feedback. That role separation is what lets the AI drive itself responsibly.

Across the three articles, the nine spec skills add up to a flow where "create the full spec," "turn it into an issue," "start implementation" is enough to reach a PR. At ZenChAIne we are running this on real projects and refining the operational details as we go.

That closes the "Spec-Driven Development in Practice" series. Parts 1, 2, and 3 cover the nine spec skills end-to-end. The real value shows up when you start growing your own coding-rules.md, review_rules.md, and agent definitions for your team — that's where the loop tightens.

References

🇯🇵 日本語で読む