AI‑Spec‑Driven Development with DevLoop: a practical working model

Introduction

Most teams trying to “use AI in development” bolt a code‑generator onto a traditional process and hope for speed. They usually get fragments: faster stubs, but the same misunderstandings, drift, and rework. Here agents.md (https://agents.md/.) is a definite improvement: it’s a solid public standard for AI code generation supported by more and more code generation agents (OpenAI’s Codex, Github Copilot etc), It's purpose it to give coding agents a consistent way to read intent and emit code and tests. On its own, though, it improves local throughput (scaffolds, stubs, small features) without fixing the end-to-end bottlenecks of the Software Delivery Life Cycle. To get the systemic gains teams expect, you need AI-Spec-Driven Development with the DevLoop—a projec "way of working" which providing a "rhythm" that connects the spec to generation and to proof, every day, across the Software Delivery Life Cycle.

In this approach, the Executable Product Spec (XPS) is written as agents.md, but its role is elevated: it’s the single, living contract for humans, agents, and CI. Humans write intent and user-visible behavior in plain language; agents consume the same file to generate code, tests, and scaffolding; CI interprets a few spec-level claims (oracles) and returns evidence. Work advances in one tight loop—Specify → Generate → Prove → Refine—applied at every level (product, feature, module). When evidence disagrees with the spec, you either update the spec (intent changed) or the code (implementation wrong) and you loop again. This is what turns agents.md from a code-gen convenience into a delivery mechanism that raises quality and speed across the whole lifecycle.

Two properties make it practical. First, the spec is human-first: it opens with purpose and user-visible behavior so reviewers get clarity before they see wiring. Second, it is agent-readable in the same place: it names inputs/outputs, oracles for CI, and minimal architecture/layout rules, so generators can scaffold safely and pipelines can assert facts (e.g., “artifact exists and is non-empty; exit code is zero”). There’s no parallel source of truth to reconcile; the XPS is the product’s heartbeat.

A working tour of the app (before the XPS)

To make the DevLoop concrete, we’ll use a real, running module: text2audio. It converts a short Markdown/Text script into a spoken audio file, optionally translates first, and streams synthesis directly to disk. Repository: https://github.com/soyrochus/text2audio/

From a user’s point of view, the app is a small, predictable CLI. You point it at a text file and an output path, choose a voice/model/format, and it produces an audio artifact while showing progress. When it finishes, it prints the absolute path to the file and a one-line summary (format, voice). If something is off—missing key, invalid voice/model—the command stops with a clear message and a non-zero exit code. That’s it: simple to run, easy to reason about, and testable.

Minimal run (happy path)

Create a tiny prompt file:

printf "Hello, this is text2audio.\n" > examples/hello.md

Run the tool:

export OPENAI_API_KEY=sk-...redacted...

uv sync
uv run python -m text2audio \
  --prompt-file examples/hello.md \
  --audio-file  out/hello.mp3 \
  --audio-format mp3 \
  --language english \
  --voice alloy \
 --tts-model tts-1-hd

What you see while it runs is intentionally boring and informative:

read-prompt ✓,
translate (if requested) ✓,
synthesize ✓,
write-to-disk ✓.

The last lines include something like:

AudioGenerated: out/hello.mp3 (mp3, voice=alloy, model=tts-1-hd, lang=en, ~2.1s)
DONE  out/hello.mp3

There is no giant in-memory buffer; audio bytes are streamed to disk. A quick test -s out/hello.mp3 (or ffprobe) confirms the file exists and isn’t empty.

Two small variations (behavior, not ceremony)

Skip translation and keep the source language:

uv run python -m text2audio \
 --prompt-file examples/hello.md \
  --audio-file  out/hello_native.mp3 \
  --no-translate \
  --voice alloy --tts-model tts-1-hd

Model-specific instructions (only for models that support them, e.g., gpt-4o-mini-tts):

uv run python -m text2audio \
 --prompt-file examples/hello.md \
  --audio-file  out/hello_instruct.mp3 \
  --tts-model gpt-4o-mini-tts \
  --voice nova \
  --instructions "warm, friendly narrator"

When a model ignores --instructions by design, the run still succeeds—just without applying the extra guidance. That distinction is deliberate and verifiable later in CI.

Utilities you actually use

You don’t guess voice names; you ask the app:

uv run python -m text2audio --list-voices

And when you want to check which voices really work in your environment, you probe them with ~1-second clips and get a crisp summary:

uv run python -m text2audio --probe-voices --tts-model tts-1-hd --audio-format mp3
# ...
# VoiceProbeCompleted: working=[alloy, verse, ...], failed=[...]

Both utilities exit cleanly after printing—useful in local dev and in CI diagnostics.

Negative path (by design)

If the API key is missing, the command fails fast, prints a clear explanation (with the key redacted, never echoed), creates no file, and returns a specific exit code. That behavior is intentional, repeatable, and checked later by CI.

unset OPENAI_API_KEY
uv run python -m text2audio \
  --prompt-file examples/hello.md \
  --audio-file  out/should_not_exist.mp3 || echo "exit=$?"
# -> error: OPENAI_API_KEY missing
# -> exit=2

What’s happening under the hood (only what you need to know now)

The CLI layer handles arguments, environment checks, and dispatch. A small “model” layer coordinates optional translation and TTS; it streams bytes to disk and reports progress. “Views” are just the textual progress and summaries you saw above. The app emits a few named events (e.g., AudioGenerated) so logs and dashboards speak the same language as the spec. Secrets are redacted centrally. Re-running with the same inputs is safe and simply overwrites the target file.

Why this narrative matters to the DevLoop

Everything you just did is straightforward to Specify in a short agents.md (the XPS), trivial to Generate with an agent (argument parsing, guards, progress output), and easy to Prove in CI (happy path, missing-key, model-semantics). When behavior changes, you adjust the spec—often a single sentence—regenerate, and rerun. No ceremony; just incremental edits that keep code, tests, and expectations aligned. The full XPS for this module captures exactly that, section by section.

The model in practice

Think of the XPS as the contract of reality. It is short, human-first, and specific enough for agents and CI to act on. The order of sections matters because it lines up with how people think and how machines build:

Humans get intent and user-visible behavior first.
Agents can still jump to structure and policies later on the page when generating code.
CI turns oracles into checks and reports back evidence.

This removes drift. There’s one page where decisions live and one loop that keeps them honest.

A Deep Dive into the Executable Product Specification as an Agent.md for text2audio: A practical example:;

Using the reference application as defined in https://github.com/soyrochus/text2audio/ we will build up the XPS in order, using the real module text2audio as our running example. The purpose of the example is not to exhaust every detail, but to show how each section earns its keep and how you evolve it incrementally.

Purpose & Promise — say what it does and why it matters

You open agents.md by explaining the transformation and the value, in a few lines. This anchors everyone before any code is written.

text2audio turns a short Markdown/Text script into a spoken audio file you can publish. It can translate to a target language, then synthesize with an OpenAI TTS model, streaming to disk to avoid large memory buffers. It shows clear progress, offers voice utilities, and can play the result locally. Outputs include mp3, wav, opus, and aac.

Why this section exists: it aligns expectations in seconds and already encodes a non-functional promise (streaming).

How it evolves: if you later decide to chunk long inputs by headings, you add one sentence here. That single sentence drives the next loop of code and CI.

User-visible Behavior — describe views and actions literally

Keep it in natural language: what screens/commands exist, what they accept, and what must be true before and after.

Run accepts: prompt_file, audio_file, audio_format, language, tts_model, voice, speed, optional instructions, optional no_translate. It runs only if OPENAI_API_KEY is set and prompt_file exists. On success, it prints the final audio_file path and emits AudioGenerated { audio_file, format, language, tts_model, voice, ~duration_s }. If a voice/model is rejected, keep the view and print a hint. Utilities: VoiceList (list voices), VoiceProbe (short per-voice clips and a summary).

Why this section exists: it doubles as acceptance criteria and a contract for scaffolding.

How it evolves: adding --no-translate is one line here. The agent updates argument parsing and branching; CI gains a semantic check.

Inputs & Outputs at a glance — parameters + tangible result

A tiny table is enough for agents to wire parsing and for CI to assert artifacts.

Outcome: audio_file exists and is non-empty. Evidence: an AudioGenerated { ... } event in logs/telemetry.

Why this section exists: parameters stop being vague; the outcome becomes checkable. How it evolves: need loudness normalization? Add normalize_lufs and extend “Outcome” with a LUFS range. One line in the spec; one more CI assertion.

Oracles — what CI must prove this iteration

Two to four claims. If they hold, the module is “good” for now.

A1 (happy path): valid key + minimal input → exit 0, non-empty audio_file.
N1 (missing key): no OPENAI_API_KEY → clear error, non-zero exit, no file.
S1 (semantics): --instructions ignored by tts-1-hd, accepted by gpt-4o-mini-tts.

Why this section exists: oracles are how the spec becomes executable.

How it evolves: when you add normalize_lufs, add S2: “Given --normalize-lufs -16, measured loudness is −16 LUFS ±1.” CI grows a new job; the model adds a normalization step.

Quality, Safety & Policy — name the rules before wiring

Security and operability shouldn’t be inferred. Write them down and let tooling enforce them.

Stream to disk; never buffer full audio in memory. Never print OPENAI_API_KEY (redact on error). Pragmatic idempotency: same inputs may overwrite the same audio_file. Exit codes: MISSING_API_KEY=2, FILE_NOT_FOUND=3, VOICE_OR_MODEL_REJECTED=4, unexpected=1. Logging: concise INFO, timings at DEBUG, no secrets. Accessibility: always print the final audio_file path and a one-line summary.

Why this section exists: it lets generators synthesize guard code and lets reviews be objective.

How it evolves: forbid network calls in views? Add a line; import-graph checks can fail non-compliant PRs.

Domain Snapshot — stabilize names for artifacts and events

This keeps your vocabulary consistent across code, logs, and dashboards.

Artifacts: input text file → output audio file (.mp3|.wav|.opus|.aac)
Events: AudioGenerated, VoiceListPrinted, VoiceProbeCompleted
Actor: Author (CLI user)
External: OpenAI text model (translation), OpenAI TTS model (synthesis)

Why this section exists: names become reusable handles in telemetry and tests.

How it evolves: new utilities introduce new events; you append them here first.

Architecture & Layout — boundaries and import rules

Give the minimum structure agents and reviewers need; keep it enforceable.

cli (entrypoint, args, env; no business logic) model (translation + TTS orchestration; streams to disk; functions as pure as practical) views (progress/tables/summaries; call the model, not the APIs) Allowed imports: cli→model, cli→views, views→model. Forbidden: model importing cli or views. Testing: unit for pure functions; short end-to-end; golden tests for views.

Why this section exists: it prevents spaghetti and enables safe scaffolding.

How it evolves: if you add events.py/errors.py, add the rule “helpers import nothing app-specific” and enforce it with a simple import-graph check.

How to Run — one happy path for humans and CI

A newcomer should be able to copy-paste and get a real artifact; CI should do the same.

uv sync
export OPENAI_API_KEY=sk-...redacted...
uv run python -m text2audio \
  --prompt-file examples/description-text2audio.md \
  --audio-file  out/description-text2audio.mp3 \
  --audio-format mp3 \
  --language english \
  --voice alloy \
  --tts-model tts-1-hd

Utilities stay discoverable but optional:

uv run python -m text2audio --list-voices

uv run python -m text2audio --probe-voices --tts-model tts-1-hd --audio-format mp3

Why this section exists: it shortens onboarding and removes guesswork from CI.

How it evolves: extend it only when a new capability becomes part of the happy path.

Boundaries & Non-goals — deliberate scope cuts

State what is out of scope for this iteration so decisions are visible and test plans stay lean.

No DB or cache. No .srt subtitles. No segmentation by headings (single output file per run). No Windows playback guarantee.

Why this section exists: it prevents accidental scope creep and misfiled “bugs”.

How it evolves: when a boundary moves, change it here first, then add/adjust oracles and regenerate code/tests.

What changes when you work this way

You stop arguing about what the software “should” do; the XPS says it.
You stop maintaining parallel documents; one page drives code generation and CI checks.
You stop guessing whether the module “works”; oracles produce evidence.
You keep moving in small steps; the file is small, so iteration is cheap.

This is the difference between “using AI to write code” and AI-Spec-Driven Development with DevLoop. The first accelerates typing. The second turns a short, human-first specification into a delivery engine that scales across the lifecycle.

Again, the example module is here (working code): https://github.com/soyrochus/text2audio/ And the file convention is documented here: https://agents.md/

LinkedIn respects your privacy

AI‑Spec‑Driven Development with DevLoop: a practical working model

Iwan van der Kleijn

Director, Head of Solutioning and Innovation (C&CA) @ Capgemini Spain

Introduction

A working tour of the app (before the XPS)

Minimal run (happy path)

Two small variations (behavior, not ceremony)

Utilities you actually use

Negative path (by design)

What’s happening under the hood (only what you need to know now)

Why this narrative matters to the DevLoop

The model in practice

A Deep Dive into the Executable Product Specification as an Agent.md for text2audio: A practical example:;

Purpose & Promise — say what it does and why it matters

User-visible Behavior — describe views and actions literally

Inputs & Outputs at a glance — parameters + tangible result

Oracles — what CI must prove this iteration

Quality, Safety & Policy — name the rules before wiring

Domain Snapshot — stabilize names for artifacts and events

Architecture & Layout — boundaries and import rules

How to Run — one happy path for humans and CI

Boundaries & Non-goals — deliberate scope cuts

What changes when you work this way

More articles by this author

Others also viewed

CTOs guide on AI-assisted development

Is Code Coverage Lying to You?

How We're Using AI to Build Software (And Everything Else)

10 key positive impacts of AI on developers

Is Conversational Coding the Future? How ‘Vibe Coding’ Is Redefining Developer Productivity

OpenAI Announces A-SWE AI agent

Can AI Really Code?

Top 10 AI Tools Every Software Development Startup Should Embrace in 2025

Adding an “AI developer” to the team

The New Pattern: Reimagining the AI SDLC Operating Model

Explore content categories

Introduction

A working tour of the app (before the XPS)

Minimal run (happy path)

Two small variations (behavior, not ceremony)

Utilities you actually use

Negative path (by design)

What’s happening under the hood (only what you need to know now)

Why this narrative matters to the DevLoop

The model in practice

A Deep Dive into the Executable Product Specification as an Agent.md for text2audio: A practical example:;

Purpose & Promise — say what it does and why it matters

User-visible Behavior — describe views and actions literally

Inputs & Outputs at a glance — parameters + tangible result

Oracles — what CI must prove this iteration

Quality, Safety & Policy — name the rules before wiring

Domain Snapshot — stabilize names for artifacts and events

Architecture & Layout — boundaries and import rules

How to Run — one happy path for humans and CI

Boundaries & Non-goals — deliberate scope cuts

What changes when you work this way

The Quiet Comeback of the TUI

Oct 9, 2025

In Defense of AI Slop — Why the Noise Matters

Oct 8, 2025

Python, Rust, and the Language of AI-Guided Software Development

Aug 31, 2025

The Zen of Python does not make Sense anymore

Aug 9, 2025

MondrUI: A Specification for Conversational UI

Jul 30, 2025

Building DataSpec: YAML Schemas Meet Querying, with AI as a Pair-Programmer

Jul 27, 2025

Vibecoding: Beyond the Vibe vs. Spec Dichotomy

Jul 18, 2025

Agentic Systems and Vibe Coding: How “Natural Language as Code” Can Transform Smaller Enterprises

Jun 29, 2025

Pragmatic Patterns for AI-Augmented Development: The Three-Leg DevLoop and Its Team Application

Jun 11, 2025

Building an MCP Agent for Developer Journaling – The CodeScribe Example

Jun 8, 2025

Others also viewed

CTOs guide on AI-assisted development

Is Code Coverage Lying to You?

How We're Using AI to Build Software (And Everything Else)

10 key positive impacts of AI on developers

Is Conversational Coding the Future? How ‘Vibe Coding’ Is Redefining Developer Productivity

OpenAI Announces A-SWE AI agent

Can AI Really Code?

Top 10 AI Tools Every Software Development Startup Should Embrace in 2025

Adding an “AI developer” to the team

The New Pattern: Reimagining the AI SDLC Operating Model

Explore content categories