Spec-driven development in 2026: main tools overview

spec-kit, OpenSpec, BMAD, GSD, Kiro, Tessl - what each really is, and how to pick one, or none.

If you write code with AI and you have a nagging sense that you are behind, you are not wrong, and you are not alone. New spec-driven tools keep appearing, every one of them has a few thousand GitHub stars, and every comparison post ends with the same shrug of a chart. This is the guide I wish existed: read it once, understand the entire category, and leave with a decision instead of more open tabs.

I will be concrete and I will have opinions. I run one of these tools daily on a real product, so where I have a view I will tell you, and where the thing is genuinely a coin flip I will tell you that too.

Why this category exists at all

The code is the cheap part now. You can describe a feature to a capable model and watch it produce something that runs in seconds. So the expensive part moved. It moved upstream, to the brief: what you are actually asking for, what the rules are, what must never happen. Get the brief right and the AI is a fast, tireless builder. Get it vague and the model fills the gaps with confident guesses, and you discover which guesses were wrong three features later, after you have built on top of them.

Spec-driven development (SDD) is the discipline of taking that brief seriously. You write down what you want as a structured spec, the AI builds against it, and the spec - not a Slack thread, not your memory of last Tuesday's standup - is the artifact of record. The only genuinely new idea is that the brief is now executable: the model reads it and writes the code.

Here is where spec-driven tooling sits in the wider world of AI development, so you know what we are zooming into:

The frameworks in this guide are not apps. They are conventions plus a little tooling that ride on top of the agentic coding tool you already use, living in your repo as files. App builders (prompt-to-app) and spec platforms (which we will touch at the end) are separate categories with their own trade-offs.

What every spec-driven tool is actually doing

Strip away the branding and all of them run the same loop. Knowing the shared skeleton makes the differences easy to see.

A "spec," at its smallest, is just requirements written so a machine and a human read them the same way. Most of these tools converge on something close to this shape:

Requirement: Password reset
  WHEN a user requests a reset for a registered email
  THE SYSTEM SHALL send a single-use link valid for 30 minutes.

  WHEN a user opens an expired or already-used link
  THE SYSTEM SHALL reject it and offer to send a new one.

That is it. The frameworks differ in how much process surrounds that core, and in whether the spec survives after the code ships. Which brings us to the only mental model you need.

The mental model: two dials

Every one of these tools is a position on two dials. Fix where you want to sit, and the tool almost picks itself.

Ceremony per task. How much process does one change cost you? At the low end, one markdown file you edit by hand. At the high end, a dozen AI agents role-playing a product team, each handing a document to the next. More ceremony buys a paper trail and forces decisions to be explicit. It also slows you down and hands you more documents to review.
The role of the spec. Is the spec the source of truth you keep and regenerate from, or a planning artifact you write once and discard once the code exists? A throwaway spec is just a better prompt. A durable spec is documentation that cannot drift, because the code is derived from it - more powerful, more expensive, and it binds you harder to the tool that maintains it.

Plot the field on those two dials and the picture explains itself:

Notice OpenSpec sitting almost alone in the top-left: durable spec, low ceremony. That quadrant is the whole reason it is my default, and it is the position the popular roundups tend to flatten. Notice too that the lightest option of all - plan mode plus a single file - is on the board, because the first real question is whether you need a framework at all.

Question zero: do you even need one?

If your scope is small and you already know roughly what you are building, you do not need a framework. You need a habit. Turn on your agent's plan mode, make it write the plan in plain language before it touches code, read the plan, approve it, then let it build. Keep one markdown file per chunk of work as the running spec.

# spec/checkout-tax.md
Goal: add VAT to checkout for EU addresses.
Rules:
  - rates come from config, never hardcoded
  - prices stored as integer cents
Done when:
  - cart total shows tax line for EU, hidden elsewhere
  - unit tests cover DE, FR, and a non-EU address

That is spec-driven development with zero install. For one engineer moving fast it beats every tool below. The frameworks earn their keep when the work gets big enough, or the team gets large enough, that "a habit" stops being reliable. The more common mistake runs the other way: one developer drowning in a twelve-agent ceremony built for a team they do not have.

With that established, here are the four frameworks, each as an answer to "how much ceremony, and is the spec permanent."

One thing to get straight first, because it trips people up. The slash commands you are about to see - /speckit.specify, /opsx:propose, /gsd-plan-phase - are not shell commands you type in a terminal. When you set a framework up, it installs them into your AI coding agent as slash commands, and on agents like Claude Code as skills too. spec-kit and OpenSpec generate that integration for twenty to thirty different agents (Claude Code, Cursor, Copilot, Gemini CLI, and more), so the same command does the same thing whichever you use; only the way you trigger it changes. GSD is the exception - it is built for Claude Code only. None of this is tied to a specific IDE. What you need is an agent that can read the framework's command files out of your repo.

spec-kit (GitHub): the constitution

GitHub's spec-kit is the most adopted by a wide margin - around 106,000 stars as I write this.¹ Its signature idea is the constitution: before any feature spec, you write your project's non-negotiable principles into one file, and every later step is meant to honor them.

# .specify/memory/constitution.md
1. Every feature ships with tests. No exceptions.
2. No new runtime dependency without an ADR.
3. All money values are integer cents, never floats.
4. Public API changes require a version bump.

From there each feature flows through a fixed four-command pipeline, each step producing a structured document:

specs/
  001-password-reset/
    spec.md      # what + acceptance criteria
    plan.md      # architecture + tech choices
    tasks.md     # ordered, parallel-safe checklist

This is the framework for teams that want process to be explicit and legible. The four phases are easy to explain to a stakeholder, and the constitution genuinely captures the rules that usually live only in a senior engineer's head. The cost is volume: you generate and review a lot of markdown. There used to be a real governance gap here too - the implement step did not re-read the constitution, so the rules you wrote down could quietly fail to reach the code. GitHub closed that in May 2026, and implement now loads the constitution when it exists, though the pattern still leans on the agent actually honoring what it reads. It leans greenfield: clean for a new project, heavy for a one-line fix.

Lives at: high ceremony, spec as a planning artifact you advance through phases.

BMAD-METHOD: the simulated team

BMAD² is the ambitious one. Instead of a single assistant it gives you twelve or more specialized agents, each playing a role from a real software team, handing artifacts down a chain:

Each role is a skill you invoke; it reads the artifact the previous role produced and writes the next one, all as plain files in your repo:

analyst    -> docs/product-brief.md
pm         -> docs/prd.md
architect  -> docs/architecture.md      # ADRs live here
sm         -> docs/stories/story-login.md
dev        -> writes the code, updates sprint-status.yaml
qa         -> review + edge-case pass

You do not have to run the whole troupe. For a small change you might invoke only the PM and dev roles; the method scales down as well as up. At version 6.8 and around 48,000 stars it is mature and active, with features like a parallel "edge case hunter" review pass.

When it fits, it fits well: a genuinely complex, multi-stakeholder build where you want the AI to surface the architecture debates and corner cases a lone prompt skips. When it does not, it is a sledgehammer. For a solo MVP, simulating an eight-person agile ceremony is pure overhead, and it is the most token-hungry of the four, which makes it the most expensive to run - 2026 comparisons consistently rank it the highest-cost option. Reach for it only when the complexity is real.

Lives at: the highest ceremony, specs as a chain of formal artifacts.

OpenSpec (Fission-AI): the lightweight source of truth

OpenSpec³ is the one I run daily, on the CRM I am building for my own practice, so I will be plain about why. It keeps the lightest footprint of the four. You describe a change and it scaffolds a folder for just that change; you work propose, then apply, then archive.

Inside that change folder, everything is a plain markdown file:

openspec/changes/add-password-reset/
  proposal.md          # why, and what changes
  design.md            # how (optional)
  tasks.md             # the checklist /opsx:apply ticks off
  specs/auth/spec.md   # the delta: ADDED / MODIFIED / REMOVED

The trick is in that last step. You do not rewrite the whole spec each time; you write only the delta - what this change adds, modifies, or removes - and on archive a deterministic command-line tool merges that delta into a permanent canonical spec that grows with the project.

## ADDED Requirements
### Requirement: Password reset
The system SHALL let a registered user reset a forgotten password.

#### Scenario: valid request
- WHEN a user requests a reset for a registered email
- THEN a single-use link valid for 30 minutes is sent

## MODIFIED Requirements
### Requirement: Session timeout
The system SHALL expire sessions after 30 minutes of inactivity.
(Previously: 60 minutes)

## REMOVED Requirements
### Requirement: Security questions
(Deprecated in favor of email-based reset)

Two things make it my default. It is deliberately stack-neutral: it ships knowing nothing about your language or framework, and your opinions live in one config file you control, so it does not fight your stack. And it draws a clean line between a deterministic engine and a creative worker - the CLI does the bookkeeping, parsing, and validation with no model involved, while the AI only ever fills content inside guardrails the CLI enforces. That separation is exactly where I want reliability and creativity split. It calls itself "fluid not rigid": unlike the phase-locked tools you can edit any artifact in any order, which matches how building actually goes. At around 51,000 stars and version 1.3 it is rough in spots, and like the others it leans brownfield - it assumes the architecture already exists, which is the gap I had to build my own workflow on top of it to fill, a separate story.

Lives at: low ceremony, spec as a genuine, maintained source of truth.

GSD: the Claude Code power tool (but mind the warning below)

GSD⁴ shot to roughly 64,000 stars in about six months and is the most opinionated of the set. It is built specifically for Claude Code and is less a spec format than a context-engineering system. Its core move is a direct answer to the way long AI sessions degrade: it splits work into phases and hands each phase to a fresh sub-agent with a clean 200,000-token window, clearing context between them so task fifty is as sharp as task one.

What makes the fresh-context trick work is that nothing important lives in the chat. GSD writes everything to a .planning/ folder that survives each /clear, so every new sub-agent rehydrates from files instead of conversation history:

.planning/
  PROJECT.md        # vision and context
  REQUIREMENTS.md   # scoped, with IDs
  ROADMAP.md        # phases and status
  STATE.md          # decisions, blockers, session memory
  phases/
    01-auth/
      01-01-PLAN.md      # one atomic execution plan
      01-01-SUMMARY.md   # what actually happened
      VERIFICATION.md    # automated checks after execute
      UAT.md             # manual run-through vs requirements

You move through it with /gsd-new-project, then per phase /gsd-discuss-phase, /gsd-plan-phase, and /gsd-execute-phase. Then /gsd-verify-work runs the built phase against its own requirements and records the result in a UAT.md, and /gsd-ship opens a pull request with a generated summary. Context is cleared between each step, which is why the file trail matters: it is the only thing that carries over. For a heavy Claude Code user it is genuinely powerful, with a minimal mode that strips its own prompt by ninety-four percent for cheaper or local models.

Here is the warning, and it is the most important part of this section. In May 2026 the original maintainer went silent and the project moved to a community-run fork, after what the new maintainers describe as trust and ownership concerns including a meme-coin rug-pull tied to the original.

And here is the part that should give you pause. Those 64,000 stars are on the original repo, which the new maintainers no longer treat as the trusted home; it still receives commits, but from an account whose control the fork team says it cannot vouch for. The community-governed fork that is actually maintained has closer to 1,300 stars. The popularity is real, and it is pointing at the version that is no longer the official one. That is not a reason to avoid GSD outright, but it is a reason to know exactly what you are standing on before you build your workflow on it. Sixty-four thousand stars aimed at the wrong repo is not a safe bet, and telling those two things apart is the judgment this whole field demands.

Lives at: medium ceremony, spec as a planning artifact, best context handling, most governance risk.

The other category: platforms that own the spec

The four above are frameworks - they ride on the tools you already use and leave your files in your repo. There is a second category that gets conflated with them. Platforms make the spec the canonical source in a stronger sense, but inside their own environment.

Kiro (AWS)⁵ is a standalone spec-driven IDE built around a Requirements - Design - Tasks flow, with requirements written in EARS notation:

WHEN a user submits a form with invalid data
THE SYSTEM SHALL display validation errors next to the relevant fields

Tessl goes furthest: it treats the spec as the only thing you maintain and marks the generated code "do not edit," regenerating it from the spec. It is still in beta.

The trade is clean. Frameworks ask less of you and let you leave. Platforms enforce more discipline and bind you tighter. For most teams building from scratch, a framework that lives in your own repository is the lower-risk place to start; you can graduate to a platform later if the discipline proves worth the lock-in.

The comparison, at a glance

	spec-kit	BMAD	OpenSpec	GSD
In one line	Constitution + 4-phase pipeline	A simulated 12-agent team	Lightweight delta specs	Context engineering for Claude Code
Ceremony	High	Highest	Low	Medium
Spec role	Planning artifact	Formal artifact chain	Maintained source of truth	Planning artifact
Runs on	Any agent (Copilot-native)	Any agent	20+ agents, stack-neutral	Claude Code only
Sweet spot	New projects, explicit governance	Complex, multi-role builds	Solo / small teams, ongoing products	Heavy Claude Code users
Watch out for	Markdown volume; agent-dependent governance	Cost and overhead	Brownfield-leaning; younger polish	Maintainer unreachable; community fork
Maturity	106k stars, most adopted	~48k stars, v6.8	~51k stars, v1.3	64k stars (original, no longer official); fork ~1.3k

All four: free, MIT-licensed, specs live in your git repo.

Which one: a decision tree

This is the part that turns reading into a decision. Follow the branches.

In plain words: if it is solo and small, no framework. If you want a living source of truth without the ceremony, OpenSpec. If you need the rules legible to non-engineers, spec-kit. If the build is genuinely complex and funded, BMAD, eyes open on cost. If you live in Claude Code, GSD, on the maintained fork.

The honest part: where it still breaks

Spec-driven development is the real deal, and it is also over-sold. Hold two critiques in your head.

Martin Fowler's team⁶ tested the leading tools and came back with a complaint that will sound familiar the moment you try one: at some point you are reviewing pages of generated markdown, and "I'd rather review code than all these markdown files." They also found that no single tool works for both a tiny bug fix and a greenfield build - what is right for one is wrong for the other. The team at Marmelab⁷ went further, arguing that SDD quietly revives waterfall's oldest mistake: the belief that a detailed enough document up front removes the uncertainty. It does not. Software is non-deterministic, the model does not always follow the spec it was given, and a beautiful spec can give you false confidence in code that does not match it.

Both critiques are right and neither is fatal. The lesson is not "do not use these." It is "match the weight of the process to the weight of the decision," which is the same judgment call as everything else in building with AI. A spec is a tool for thinking and a contract for the AI. It is not a guarantee, and the moment you treat it as one it starts lying to you in the most convincing way possible.

The scarce thing in 2026 is not the ability to generate specs or code. Everyone has that. It is the judgment to know how much process a decision deserves, and which tool is actually carrying its weight. The frameworks are good. Knowing when to reach for one, which one, and when to reach for none, is the part that is still yours.

Your next step, by path

You do not have to commit. Each of these is a fifteen-minute experiment on a throwaway branch:

OpenSpec: npm install -g @fission-ai/openspec@latest, then run openspec init in a repo and ask your agent to /opsx:propose a small change you have been putting off. Watch what lands in openspec/changes/.
spec-kit: run uvx --from git+https://github.com/github/spec-kit.git specify init demo (needs uv), then /speckit.constitution and write four rules, then /speckit.specify one feature. You will feel the ceremony immediately - that tells you if it is your speed.
BMAD: run npx bmad-method install in a sandbox project, then run just the analyst and PM agents on a fake product idea. If the handoffs feel like genuine help rather than theater, it is for you.
GSD: if you live in Claude Code, install the maintained fork with npx @opengsd/get-shit-done-redux@latest (not the starred-but-no-longer-official original at gsd-build/get-shit-done). Run one project through plan, execute, and verify.

Pick the one your situation pointed to in the tree, spend the fifteen minutes, and you will know more than any roundup could tell you.

And if you would rather make that call with someone who has shipped on these tools, that is the conversation I have with founders and engineering leads every week.

GitHub, spec-kit (toolkit and spec-driven.md). https://github.com/github/spec-kit ↩
BMAD-METHOD. https://github.com/bmad-code-org/BMAD-METHOD and https://docs.bmad-method.org ↩
Fission-AI, OpenSpec. https://github.com/Fission-AI/OpenSpec ↩
GSD (Get Shit Done), original repo and the maintained community fork. https://github.com/gsd-build/get-shit-done and https://github.com/open-gsd/get-shit-done-redux ↩
AWS, Kiro (EARS-based spec IDE). https://kiro.dev and Tessl. https://tessl.io ↩
Birgitta Böckeler (martinfowler.com), Understanding Spec-Driven-Development: Kiro, spec-kit, and Tessl. https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html ↩
Marmelab, Spec-Driven Development: The Waterfall Strikes Back (Nov 2025). https://marmelab.com/blog/2025/11/12/spec-driven-development-waterfall-strikes-back.html ↩