AI-Augmented Development Factory Workflow

Sabbatical Independent Builder 2025–2026

Situation

AI-assisted development tools can generate enormous amounts of code. The problem is not generation speed — it’s quality control, consistency, and the compounding cost of skipping steps. Without discipline, AI-assisted development produces code that passes a quick read but fails under review, breaks in CI, or doesn’t match what the product actually needs.

During my sabbatical, I was building across 100+ repositories. I needed a workflow that would let me ship fast with AI assistance while maintaining the quality standards I’d expect from a team I managed.

Decision

I codified a repeatable development loop as a set of slash commands in Claude Code. The workflow started simple and evolved through iteration across multiple projects into a structured pipeline with review gates — each step’s output feeds the next, with human judgment concentrated where it matters most. The approach is grounded in TDD (every feature starts with a failing test) and DDD (domain models, services, and bounded contexts drive the architecture). In its most mature form — the community portal project, where I shipped 29 PRDs — it became a 9-step development workflow:

/intake → /slice → /plan-story → /implement-story → /verify → /demo → /pr → /merged → /handoff

Each step enforces a specific gate:

/intake: Idea → PRD-lite + first story slice. Consults Codex via MCP for codebase and research context before writing anything.
/slice: PRD → 3–7 progressive vertical slices. Codex recommends slice strategy based on the PRD and codebase. Each slice = one PR.
/plan-story: Produces implementation and test plans before any code is written. Codex proposes architecture strategy. Prior product callouts are consulted to avoid repeating past issues.
/implement-story: TDD-first implementation. Delegates to Codex with write access — Codex writes the failing test first, then the implementation. Demo scripts auto-generated.
/verify: Dual review gate — a code-reviewer agent and Codex run independently in parallel, catching disjoint issue classes. Design review enforced if views changed (semantic classes, dark mode, theme tokens). Fix all issues before CI runs.
/demo: Product acceptance gate. Execute the demo script, take screenshots in light/dark/mobile. Codex independently verifies implementation against acceptance criteria. Record observations in product callouts.
/pr: Draft PR with summary, test description, and risk notes. Codex summarizes the diff. Pre-check warns if /verify hasn’t passed.
/merged: Post-merge documentation. Updates the project wiki with PRD summary, diagrams, and retrospective. Records build metrics. Feeds recommendations into the next PRD’s planning.
/handoff: Session notes + docs/now.md update for continuity across sessions.

Quality constraints enforced throughout:

CRAP score < 8 (complexity × coverage metric via SimpleCov)
Every test must justify its carrying cost
No unrelated refactors in a story PR
100% pre-merge issue catch rate across 29 PRDs — zero issues reached CI
Build metrics tracked at the story level: tests added, total tests, verify issues found, issues fixed

A key evolution: Codex collaborates at every step via MCP, not just as a code generator. Codex researches context during intake, recommends slicing strategies, proposes architecture, implements code, reviews independently, and verifies demos. The workflow treats Codex as a collaborator with specific roles at each gate, not a general-purpose assistant.

The project wiki is continuously updated as part of the process. Each completed PRD generates a wiki summary with diagrams and a retrospective. Recommendations from retrospectives feed into the next PRD’s planning — creating a learning loop across the entire project.

Risk

The risk was over-process. A solo developer adding nine mandatory steps to every feature sounds like bureaucracy. The workflow could slow me down more than it helped, especially for small changes.

I accepted this risk because the alternative — shipping fast without gates — had already failed. Early in the sabbatical, I shipped features that looked correct but had subtle bugs, missing edge cases, or product mismatches that required rework. The rework cost more than the gates would have.

The other risk: the workflow was designed for me, working with AI. It might not transfer to a team context. I accepted this because the principles (plan before code, test before implementation, review before CI, demo before merge, document what you learned) are universal — only the tooling is specific.

Change

The workflow evolved across projects. The community portal project — a multi-tenant SaaS for Florida condo associations — was the most complete expression: 29 PRDs, 150+ stories, 2,359 tests, with story-level build metrics tracked throughout.

Specific outcomes:

Zero rollbacks on any project using the full workflow
100% pre-merge issue catch rate — dual reviewers (agent + Codex) caught disjoint issue classes
Test coverage grew 8.5x over the project (278 → 2,359 tests)
Demo documentation and product callouts created a reviewable product record, not just a code record
Retrospectives per PRD created a learning loop — each PRD’s process improved from the last
The /verify gate consistently caught issues that would have failed CI, saving time and maintaining flow

The workflow also produced transferable judgment about AI-assisted development: the AI is excellent at generating code from a clear plan and test spec, but poor at deciding what to build, what to skip, and when the product is actually done. The human gates (plan, review, demo) are where the judgment lives. Codex is powerful as a collaborator with defined roles — not as an autonomous agent.

What This Demonstrates

Process design for AI-augmented work: Not “use AI to go faster” but “design a workflow that makes AI-assisted development reliable and measurable.”
Codex as a structured collaborator: Defined roles at each gate (researcher, implementer, reviewer, verifier) — not a general-purpose chatbot.
Quality gates that earn their cost: Each gate exists because skipping it produced a measurable failure. No gate was added speculatively.
Velocity with evidence: 29 PRDs with story-level metrics, dual review, and continuous documentation. The speed is supported by artifacts.
Transferable operating system: The specific tools are mine, but the pattern (intake → slice → plan → implement → review → accept → ship → document) applies to any team adopting AI-assisted development.