DEV Community

Cover image for Using Grammatical Evolution to Discover Test Payloads: A New Frontier in API Testing
Don Johnson
Don Johnson

Posted on

Using Grammatical Evolution to Discover Test Payloads: A New Frontier in API Testing

"What if your test cases evolved on their own, like digital bacteria, probing your API for weaknesses while you slept?"

Welcome to the Strange Future of API Testing

Grammatical Evolution (GE), when fused with Genetic Algorithms (GA), opens up a beautifully chaotic new approach to payload discovery. This post introduces a fully containerized system that mutates and evolves API payloads using the DEAP Python library to hunt for:

  • Validation failures
  • Authentication edge cases
  • Timeouts and memory issues
  • Vulnerabilities (SQLi, auth bypass, broken logic)

🧬 A Brief History of Grammatical Evolution

Grammatical Evolution originated in the late 1990s as a way to evolve programs and expressions using a formal grammar. Unlike standard genetic programming (GP), which typically evolves tree structures, GE operates on binary strings that are mapped to syntactically valid programs using Backus-Naur Form (BNF) grammars.

This means you can define what legal inputs look like — and GE will evolve valid structures within those constraints.

GE Core Components:

  • Genome: a binary string (or in our case, a structured JSON payload)
  • Grammar: a set of rules that maps genome to actual values (our grammar.py)
  • Fitness Function: how "interesting" or "destructive" a payload is
  • Operators: mutation, crossover, and selection mechanisms

TL;DR

You can clone this repo and run:

docker-compose up --build
Enter fullscreen mode Exit fullscreen mode

You'll get a Flask API in one container and a genetic evolution engine in another, watching payloads evolve over generations.


🔬 The Setup

ge_api_tester/
├── api/               # Flask server
├── harness/           # Evolution engine (DEAP)
├── docker-compose.yml
└── requirements.txt
Enter fullscreen mode Exit fullscreen mode

The key file is harness/evolve_tester.py, which:

  1. Randomly generates JSON payloads from a Python grammar file
  2. Evaluates each payload by sending it to a Flask API (/predict, /auth, etc.)
  3. Scores the payload based on the response (e.g., 5xx errors = high fitness)
  4. Uses selection, crossover, and mutation to evolve new payloads

Over time, the system finds payloads that break your assumptions.


🧠 Why Grammatical Evolution?

Because fuzzing with raw strings is dumb.

GE allows you to define meaningful structure while leaving room for chaos. Each payload is a composite of rules and randomness. Our grammar lets us inject:

  • SQL payloads (' OR '1'='1) into name fields
  • Email formatting errors
  • Rare zip code anomalies (e.g., 90210 triggers errors in the API)
  • Auth fields that evolve into valid (or invalid) token formats
  • Deliberate memory or delay triggers

All of this is tracked with a custom PayloadTracker that persists everything:

tracker.track_high_fitness(payload, response, score)
tracker.track_sql_injection(payload, response)
Enter fullscreen mode Exit fullscreen mode

🔥 Real Output

Here's a real tracked payload from our engine:

{
  "name": "w13D2bA",
  "email": "[email protected]",
  "hobbies": ["fishing"],
  "memleak": true
}
Enter fullscreen mode Exit fullscreen mode

It triggered:

  • 503 Server Error
  • 2s response time
  • Fitness score: 0.8

🧪 Bonus: Automatic Result Discovery

After the run, we pass all high-fitness payloads into a pytest suite that:

  • Groups payloads by category
  • Charts the fitness distribution
  • Replays payloads against the API to confirm vulnerabilities

🤖 Why You Should Care

If you're a QA engineer, security tester, or automation engineer, this system can:

  • Find bugs you didn’t know existed
  • Generate regression payloads from real API behavior
  • Surface edge cases you'd never think to write manually
  • Serve as a powerful regression discovery layer in CI

🚀 Future Work

  • Integrate with GraphQL APIs
  • Add LLMs for seed mutation
  • Store evolution paths and lineage of high-fitness payloads
  • Expose the entire platform as a GitHub Action or CI/CD fuzz stage

🧰 Repo and Source

All source is in copyleftdev/ge_api_tester. You can plug in your own Flask/Django/FastAPI apps and let evolution break them.

This technique is underutilized. We're just scratching the surface. It's not fuzzing. It's payload evolution.

🧬

Top comments (0)