Gherkin and BDD: Specifications That Execute
From Dan North's frustration with TDD to a language spoken in 70+ human languages — the history, syntax, philosophy, ecosystem, and hard-earned lessons of Behavior-Driven Development.
- DATE:
- APR.05.2026
- READ:
- 22 MIN
The frustration that started everything
In 2003, Dan North was coaching development teams on Test-Driven Development. The practice worked, but the vocabulary kept getting in the way. Teams consistently struggled with three questions: where to start testing, what to test, and how much to test.
North noticed something. When he replaced the word “test” with “behaviour” — asking “what should this class do?” instead of “how do I test this?” — a whole category of confusion evaporated. Tests named testOrderTotal became shouldCalculateOrderTotalWithTax. The shift was not cosmetic. It changed what people wrote.
He registered jbehave.org on Christmas Eve 2003 and started building the first BDD framework. JBehave reimagined JUnit from the ground up: instead of test cases, you wrote behaviour specifications. Instead of assertions, you expressed expectations.
A year later, working with Chris Matts on Feature Injection, North developed the format that would define BDD for the next two decades: Given some initial context, When some event occurs, Then ensure some outcome.
In March 2006, North published “Introducing BDD” in Better Software magazine. The article laid out the entire philosophy: start from the outside, work inward, describe behaviour in the language of the business, and let those descriptions drive implementation.
The article is twenty years old. The problems it describes are still the problems most teams have.
From JBehave to Cucumber
JBehave proved the concept, but it was a Java-only tool in an era when the Ruby community was leading the charge on agile practices. The evolution happened in three steps:
RBehave (2006) — North built a Ruby equivalent that expressed story-level behaviours as Given/When/Then steps. It was minimal and focused.
RSpec Story Runner (2007) — David Chelimsky, who had taken over leadership of RSpec, integrated RBehave directly into RSpec. Now Ruby developers could write plain-text stories alongside their RSpec examples.
Cucumber (2008) — Aslak Hellesøy, who had contributed to RSpec since its earliest days, saw the limitations of the Story Runner and started fresh. Cucumber was designed from the ground up to parse plain-text feature files, match steps to code via patterns, and produce human-readable output.
The name was deliberate whimsy in the Ruby tradition of food-themed tools. The scenario language needed its own name too. Hellesøy asked his then-fiancée for a suggestion. She said “Gherkin” — a small, pickled cucumber. The language is what you put inside a Cucumber. The pun stuck, and so did the name.
The Gherkin language
A .feature file is plain text with a line-oriented structure. Here is a complete example that demonstrates every major construct:
# language: en
@billing @regression
Feature: Account billing
As a customer
I want to be billed correctly
So that I trust the service
Background:
Given I am logged in as a verified customer
Rule: VAT is applied for UK customers
Scenario: UK customer sees VAT-inclusive price
Given I am located in the United Kingdom
When I view the product priced at £100
Then I should see the total as £120
And the VAT breakdown should show £20
Scenario Outline: Prices for different regions
Given I am located in <country>
When I view the product priced at £100
Then I should see the total as <total>
Examples:
| country | total |
| United Kingdom | £120 |
| France | €119 |
| United States | $100 |
Scenario: Uploading a payment configuration
Given I upload the following JSON config:
"""json
{
"currency": "GBP",
"vatRate": 0.20
}
"""
Then the billing engine should accept the configurationKeywords
+-------------------+-----------------------------------+----------------------------------------+ | Keyword | Purpose | Notes | +-------------------+-----------------------------------+----------------------------------------+ | Feature: | Top-level grouping | First keyword in any .feature file. | | | | Free-form description follows. | +-------------------+-----------------------------------+----------------------------------------+ | Rule: | Groups scenarios by business rule | Added in Gherkin v6. Represents one | | | | constraint. | +-------------------+-----------------------------------+----------------------------------------+ | Background: | Shared preconditions | Steps run before every Scenario in the | | | | Feature or Rule. | +-------------------+-----------------------------------+----------------------------------------+ | Scenario: | A concrete example | Alias: Example: | +-------------------+-----------------------------------+----------------------------------------+ | Scenario Outline: | Parameterised scenario | Runs once per row in Examples:. Alias: | | | | Scenario Template: | +-------------------+-----------------------------------+----------------------------------------+ | Examples: | Data table for Outline | Alias: Scenarios: | +-------------------+-----------------------------------+----------------------------------------+ | Given | Initial context | "The world is in this state" | +-------------------+-----------------------------------+----------------------------------------+ | When | Action or event | "Something happens" | +-------------------+-----------------------------------+----------------------------------------+ | Then | Expected outcome | "This should be observable" | +-------------------+-----------------------------------+----------------------------------------+ | And / But | Continues previous step | But emphasises a negative or contrast | +-------------------+-----------------------------------+----------------------------------------+ | * (asterisk) | Generic step | When none of the above read naturally | +-------------------+-----------------------------------+----------------------------------------+
Two additional constructs appear in the example above:
Doc Strings — multi-line text arguments delimited by """ or backticks, passed as the last argument to a step. Content type annotations ("""json, """xml) are supported.
Data Tables — pipe-delimited tabular data attached to a step. Each row becomes an argument to the step definition.
Tags — @tagName prefixed to Feature, Rule, Scenario, or Examples blocks. Tags filter execution (--tags @regression), categorise tests, control hooks, and pass metadata to CI systems.
i18n: Gherkin speaks 70+ human languages
Gherkin has been translated into over 70 languages. A French feature file uses Fonctionnalité:, Scénario:, Étant donné, Quand, Alors. A Japanese file uses フィーチャ, シナリオ, 前提, もし, ならば. There is even an emoji dialect.
Declare the language with a comment directive at the top of the file:
# language: fr
Fonctionnalité: Facturation du compte
Scénario: Client français voit le prix TTC
Étant donné je suis localisé en France
Quand je consulte le produit à 100€
Alors je devrais voir le total à 120€The full language reference lists every supported locale and its keyword translations.
How step definitions work
Gherkin scenarios are inert text until connected to code. When Cucumber encounters a step like Given I have 3 items in my cart, it searches registered step definitions for a matching pattern. Two matching systems exist:
Regular expressions
The traditional approach. A regex anchored with ^ and $, where capture groups define arguments:
@Given("^I have (\d+) items in my cart$")
public void iHaveItemsInCart(int count) {
cart.addItems(count);
}Cucumber Expressions
Introduced as a more human-friendly alternative. No anchors, no escape sequences. Curly-brace parameter types handle matching and type conversion:
@Given("I have {int} items in my cart")
public void iHaveItemsInCart(int count) {
cart.addItems(count);
}Built-in parameter types: {int}, {float}, {string} (quoted strings), {word} (single word), {bigdecimal}, {} (anonymous, matches anything). You can register custom parameter types to match domain objects directly.
You cannot mix Cucumber Expression syntax with regex syntax in the same expression. Pick one per step definition.
The philosophy: Discovery, Formulation, Automation
BDD is not a testing technique. It is a collaborative practice with three phases, formalized by Seb Rose and Gáspár Nagy in the BDD Books series and adopted as the canonical BDD lifecycle by the Cucumber team.
Discovery — structured conversations with concrete examples
Before writing a line of code or Gherkin, the team holds structured conversations to explore, discover, and agree on the details of upcoming behaviour. The goal is a shared understanding, not a document.
The primary vehicle is the Three Amigos session — a meeting bringing together three perspectives:
- Business (Product Owner / BA) — defines what and why
- Development (engineer) — identifies technical constraints and edge cases
- QA (tester) — asks “what could go wrong?” and surfaces scenarios nobody else considered
The Three Amigos session is not a review meeting. It is a discovery meeting. The output is a set of concrete examples that illustrate the expected behaviour, not a formal specification. See the Cucumber guide on roles.
Example Mapping — the Discovery workshop format
Matt Wynne developed Example Mapping while training a team in St. Louis. It is a structured, time-boxed (25-30 minute) workshop that uses four colors of index cards:
- Yellow — the user story being discussed
- Blue — business rules (constraints the story must obey)
- Green — concrete examples that illustrate each rule
- Red — questions (unresolved uncertainties)
The shape of the map gives the team a visual signal:
- Many red cards? The story is not ready to develop.
- Many green cards per blue card? The rule is complex — consider splitting.
- Few cards overall? The story is well-understood — proceed.
The canonical article on the Cucumber blog walks through the technique in detail. Wynne’s original Medium post provides additional context.
Formulation — writing Gherkin from agreed examples
The agreed examples are documented in business-readable Gherkin so they can be automated. The key constraint: the language must be understood by all stakeholders, not just developers and testers.
Formulation is a translation exercise, not a creative one. The examples already exist from Discovery. Formulation puts them into a structured format that Cucumber can parse and execute.
Automation — making the specs executable
Step definitions are written to connect Gherkin steps to production code. When all scenarios pass, the feature files become living documentation — specifications that are automatically verified to match actual system behaviour. Unlike static wiki pages or Word documents, living documentation cannot become stale without causing a test failure.
This is the core promise of BDD done correctly: your specifications are always exactly as accurate as your last CI run.
The timeline
+------+------------------------------------------------------+ | Year | Event | +------+------------------------------------------------------+ | 2003 | Dan North coins "BDD," registers jbehave.org on | | | Christmas Eve | +------+------------------------------------------------------+ | 2004 | North + Chris Matts develop Given/When/Then format | +------+------------------------------------------------------+ | 2005 | Steven Baker starts RSpec; Aslak Hellesøy | | | contributes early | +------+------------------------------------------------------+ | 2006 | North publishes "Introducing BDD" in Better Software | | | (March) | +------+------------------------------------------------------+ | 2007 | RSpec 1.0 ships (May); David Chelimsky integrates | | | RBehave into RSpec (Oct) | +------+------------------------------------------------------+ | 2008 | Aslak Hellesøy creates Cucumber (Ruby); Gherkin | | | language named | +------+------------------------------------------------------+ | 2010 | Gáspár Nagy launches SpecFlow for .NET | +------+------------------------------------------------------+ | 2015 | Matt Wynne develops Example Mapping | +------+------------------------------------------------------+ | 2019 | SmartBear acquires Cucumber Ltd (June); Tricentis | | | acquires SpecFlow | +------+------------------------------------------------------+ | 2020 | BDD Books: Discovery published (Nagy & Rose) | +------+------------------------------------------------------+ | 2023 | Matt Wynne laid off; "Cucumber is dying" discourse | | | begins | +------+------------------------------------------------------+ | 2024 | Reqnroll forked from SpecFlow by Gáspár Nagy (Feb) | +------+------------------------------------------------------+ | 2025 | AI-assisted BDD scenario generation enters | | | mainstream | +------+------------------------------------------------------+
The ecosystem
+---------------+-----------------+----------------------------------------+ | Framework | Language | Notes | +---------------+-----------------+----------------------------------------+ | Cucumber-Ruby | Ruby | The original (2008). Aslak Hellesøy. | +---------------+-----------------+----------------------------------------+ | Cucumber-JVM | Java / Kotlin | Most widely used in enterprise. | +---------------+-----------------+----------------------------------------+ | Cucumber-JS | JavaScript / TS | @cucumber/cucumber on npm. | +---------------+-----------------+----------------------------------------+ | Reqnroll | C# / .NET | Fork of SpecFlow (2024). Currently | | | | recommended for .NET. | +---------------+-----------------+----------------------------------------+ | Behave | Python | behave.readthedocs.io | +---------------+-----------------+----------------------------------------+ | Godog | Go | Official Cucumber org repo. | +---------------+-----------------+----------------------------------------+ | Behat | PHP | Popular in the Symfony ecosystem. | +---------------+-----------------+----------------------------------------+ | Karate DSL | Java / any | API-testing-focused. No Java coding | | | | required in steps. | +---------------+-----------------+----------------------------------------+ | Serenity BDD | Java / JS | Rich reporting + screenplay pattern on | | | | top of Cucumber. | +---------------+-----------------+----------------------------------------+
The SpecFlow to Reqnroll story
SpecFlow was the dominant BDD framework for .NET, created by Gáspár Nagy in 2010. Tricentis acquired it in 2019. In early 2024, Nagy forked SpecFlow and relaunched it as Reqnroll — a community-maintained open-source project. Migration from SpecFlow takes minutes. Reqnroll is now the recommended BDD framework for .NET.
The SmartBear acquisition
SmartBear acquired Hiptest (a BDD collaboration platform) in 2018, then Cucumber Ltd in June 2019. The two were unified into CucumberStudio in December 2019 — a commercial tool for managing Gherkin scenarios at scale.
In February 2023, Matt Wynne — the last of Cucumber’s co-founders still actively employed on the project — was laid off. This triggered the “Cucumber is dying” discourse. The framework is not dead (it has millions of users and active maintainers), but paid, dedicated open-source stewardship has diminished. Development continues under the Cucumber GitHub organization.
Integration patterns
Browser automation
The classic combination is Cucumber + Selenium WebDriver — step definitions drive a browser instance. This works but produces brittle tests when scenarios are written at the UI level.
Playwright has largely replaced Selenium for new projects (2024-2025). Its superior stability and built-in waiting mechanics reduce flakiness. Serenity/JS provides a template project for Cucumber + Playwright specifically.
API testing
Cucumber step definitions calling REST-Assured (Java) or fetch/axios (JS) for HTTP assertions. Karate DSL is the specialist choice — it has built-in HTTP primitives, JSON path assertions, and its own assertion language, requiring no Java code in step definitions.
CI/CD and reporting
Cucumber produces reports in multiple formats:
- JUnit XML — understood by every CI server (Jenkins, GitHub Actions, CircleCI, GitLab CI)
- JSON — consumed by dashboards and third-party reporters like Allure and maven-cucumber-reporting
- HTML — self-contained report
- Cucumber Messages — newer streaming format used internally
The Cucumber Reports service provides free cloud-hosted report aggregation for open-source projects. Jenkins has an official Cucumber Reports plugin.
Antipatterns and criticisms
+-------------------------------+----------------------------------+----------------------------------+ | Pattern | Problem | Fix | +-------------------------------+----------------------------------+----------------------------------+ | Imperative steps | "Click login button, type admin" | Declarative: "I am an | | | — coupled to UI | authenticated admin" | +-------------------------------+----------------------------------+----------------------------------+ | QA-only authoring | Business never reads the specs | Three Amigos + Example Mapping | | | | before code | +-------------------------------+----------------------------------+----------------------------------+ | Gherkin after code | Scenarios document what exists, | Discovery first, then | | | not what should exist | Formulation, then Automation | +-------------------------------+----------------------------------+----------------------------------+ | One massive step def file | Unmaintainable, duplicate regex | Organize by domain concept, not | | | conflicts | by feature file | +-------------------------------+----------------------------------+----------------------------------+ | Testing every path in Gherkin | Combinatorial explosion of | Cover key examples in Gherkin, | | | scenarios | edge cases in unit tests | +-------------------------------+----------------------------------+----------------------------------+
The Cucumber Tax
The overhead of maintaining a Gherkin layer on top of code: step definitions need to be written, kept in sync with scenario text, refactored when wording changes, and debugged when pattern matching breaks. Critics argue this indirection layer rarely justifies its cost unless there is genuine business stakeholder involvement.
This is a fair criticism. If the only people reading your feature files are the same engineers who write the step definitions, you are paying the Cucumber Tax for no return. The value of Gherkin comes from the shared language — if nobody outside the development team reads it, you have an expensive testing DSL.
Imperative vs. declarative scenarios
This is the single most common mistake teams make with Gherkin:
Imperative (fragile, unreadable):
Scenario: Login
Given I navigate to "/login"
When I type "admin" into the field with id "username"
And I type "password123" into the field with id "password"
And I click the button with text "Sign In"
Then I should see the element with class "dashboard"Declarative (stable, business-readable):
Scenario: Admin accesses the dashboard
Given I am an authenticated admin user
When I access the admin dashboard
Then I should see the administration controlsThe imperative version is coupled to the UI. Every redesign breaks it. It reads like Selenium commands wrapped in English, which is exactly what it is. The declarative version describes intent — it survives redesigns, reads naturally, and can be understood by anyone in the business.
The Cucumber anti-patterns guide and better Gherkin guide document this in detail.
The stakeholder myth
Matt Wynne described this honestly in “10 Easy Ways to Fail at BDD”: the premise of BDD — that feature files serve as a shared specification language between business and technical people — rarely survives contact with reality in most organizations. Business stakeholders do not pull up .feature files.
This does not mean BDD is useless. It means the value is in the Discovery conversations (Three Amigos, Example Mapping), not in the artifacts. The feature files are a record of what was agreed, but the shared understanding was built in the room, not in the file.
When BDD is overkill
BDD adds the most value when:
- Multiple stakeholder groups need to agree on behaviour
- There is long-term maintenance and a large test suite
- Domain language is complex and misalignments are costly
- The cost of building the wrong thing is high
BDD is likely overkill for: internal tools, solo projects, rapid prototypes, data pipelines, libraries with no business-domain concepts, or teams where a developer is the business analyst.
Dan North’s own evolution
North’s position has evolved considerably. His 2019 talk “BDD Is Not About Testing” says it in the title. Most teams claiming to “do BDD” are writing tests in Gherkin and calling it done — which misses the entire point. North now treats BDD as one tool among many for improving team communication, not a silver bullet for software quality.
Liz Keogh, who rewrote JBehave 2 and has been a senior BDD practitioner since the beginning, has written extensively about the distinction between BDD as a practice (conversations, discovery, shared understanding) and BDD as a tool (Cucumber, Gherkin, step definitions). The practice is valuable. The tool is optional.
BDD done right vs. Gherkin-flavoured test automation
This distinction is worth repeating because it is the most consequential thing in this entire post.
BDD done right: The team uses Discovery conversations before writing any code. Gherkin scenarios emerge from collaborative sessions. The business language in the scenarios is genuinely readable by non-technical stakeholders. Scenarios document business rules, not UI interactions. The feature files are living documentation maintained by the whole team.
Gherkin-flavoured test automation: Teams hear “BDD” and write automated tests using Gherkin syntax after writing the code. Stakeholders never read the feature files. Scenarios are written by QA automators and describe implementation steps. This is test automation with extra steps — literally. It has the cost of Cucumber (step definition maintenance, indirection, regex debugging) without the benefit (shared understanding, living documentation, outside-in design).
Most teams that say they “do BDD” are doing the second thing. The Cucumber team knows this.
Alternatives to Cucumber
+-----------------+--------------------+-----------------------+------------------------------+ | Tool | Syntax | Language | Differentiator | +-----------------+--------------------+-----------------------+------------------------------+ | Cucumber | Gherkin (.feature) | Ruby / Java / JS / Go | The original. Largest | | | | | ecosystem. | +-----------------+--------------------+-----------------------+------------------------------+ | Gauge | Markdown (.spec) | Any (plugin) | ThoughtWorks. Simpler | | | | | syntax. | +-----------------+--------------------+-----------------------+------------------------------+ | Robot Framework | Tabular / keyword | Python | Strong in test automation, | | | | | less BDD-focused. | +-----------------+--------------------+-----------------------+------------------------------+ | Concordion | HTML | Java | Specs are formatted HTML | | | | | documents. | +-----------------+--------------------+-----------------------+------------------------------+ | Karate | Own DSL | Java | API-first. Built-in HTTP, | | | | | JSON assertions. | +-----------------+--------------------+-----------------------+------------------------------+
Gauge (ThoughtWorks) uses Markdown instead of Gherkin and supports any language through plugins. Its syntax is simpler, but the ecosystem is smaller.
Robot Framework is keyword-driven and Python-based. It has a massive library ecosystem for web testing, API testing, database testing, and more. It is more of a general test automation framework than a BDD tool, but it fills a similar niche.
Concordion takes a different approach entirely: specifications are written as HTML documents with embedded assertions. The rendered output looks like a formatted specification with green/red highlights showing pass/fail status.
AI and BDD (2025)
Large language models have entered the BDD conversation in three ways:
Generating Gherkin from requirements. Tools like Gherkinizer use LLMs to convert user stories into Gherkin scenarios. Studies show LLMs produce syntactically correct Gherkin but often generate imperative, non-business-focused scenarios without careful prompt engineering. A 2025 paper on arXiv explores this in detail.
Generating step definitions from Gherkin. Given a scenario, an LLM can produce boilerplate step definition code in any supported language. This reduces the “Cucumber Tax” for initial setup but does not eliminate the ongoing maintenance burden.
The philosophical tension. Some argue AI re-energizes BDD by lowering the formulation cost. Others argue it reinforces the worst antipattern: generating Gherkin without genuine Discovery collaboration. If the point of BDD is shared understanding through conversation, automating the conversation artifacts misses the point entirely.
The Automation Panda put it well in March 2025: the question is not whether BDD is dying, but whether it was ever alive in most organizations that claimed to practice it.
The people who built this
+-----------------+------------------------------------------------+ | Person | Contribution | +-----------------+------------------------------------------------+ | Dan North | Coined BDD (2003), created JBehave, wrote | | | "Introducing BDD" (2006) | +-----------------+------------------------------------------------+ | Chris Matts | Co-developed Given/When/Then; invented Feature | | | Injection | +-----------------+------------------------------------------------+ | Aslak Hellesøy | Co-founded RSpec; created Cucumber (2008); | | | co-wrote The Cucumber Book | +-----------------+------------------------------------------------+ | Matt Wynne | Core Cucumber contributor; invented Example | | | Mapping; co-wrote The Cucumber Book | +-----------------+------------------------------------------------+ | David Chelimsky | Led RSpec; integrated RBehave; co-authored The | | | RSpec Book | +-----------------+------------------------------------------------+ | Liz Keogh | Rewrote JBehave 2; coined Deliberate | | | Discovery; senior BDD practitioner | +-----------------+------------------------------------------------+ | Gáspár Nagy | Created SpecFlow; forked Reqnroll (2024); | | | co-authored BDD Books | +-----------------+------------------------------------------------+ | Seb Rose | Co-authored BDD Books: Discovery and | | | Formulation | +-----------------+------------------------------------------------+
Further reading
Primary sources:
- Dan North — “Introducing BDD” (March 2006)
- Martin Fowler — GivenWhenThen
- Dan North — “BDD Is Not About Testing” (Beauty in Code, 2019)
Official documentation:
- Cucumber docs — BDD overview, Gherkin reference, step definitions, anti-patterns
- Gherkin reference — complete syntax
- Cucumber Expressions — the modern step matching system
Books:
- The Cucumber Book (2nd ed.) — Matt Wynne, Aslak Hellesøy — Pragmatic Bookshelf
- BDD Books: Discovery — Seb Rose, Gáspár Nagy — bddbooks.com
- BDD Books: Formulation — Seb Rose, Gáspár Nagy — Amazon
Community and analysis:
- Liz Keogh — Behaviour Driven Development
- Liz Keogh — ATDD vs BDD and a potted history
- Automation Panda — Is BDD Dying? (March 2025)
- Matt Wynne — 10 Easy Ways to Fail at BDD
BDD was never about Gherkin. It was never about Cucumber. It was about a team sitting in a room, using concrete examples to discover what they were actually building — and then writing those examples down in a language everyone could read. The tool is optional. The conversation is not.