Generating Mock Data: From JSON Schema to Realistic Test Fixtures
Every application needs test data. The question is whether you spend hours crafting it by hand or let a generator do it in seconds. This guide covers the principles behind effective mock data and how to avoid the traps that make test fixtures unreliable.
Why mock data matters
Real production data is off-limits for testing — privacy regulations, data size, and sensitivity all make it impractical. But testing with trivial placeholders like "test" and 123 misses bugs that only surface with realistic inputs:
- Name fields that break on Unicode characters
- Email validation that passes
"aaa"but rejects"user+tag@sub.domain.co.uk" - Date parsing that works for
2026-01-15but fails on2026-02-29 - Pagination logic that only shows up with 500+ records
Good mock data is realistic enough to catch these edge cases and structured enough to be predictable in tests.
Schema-driven generation
The most reliable approach is to define a JSON Schema and let the generator produce data that conforms to it. This guarantees type correctness and constraint satisfaction without manual effort.
{
"type": "object",
"properties": {
"id": { "type": "integer", "minimum": 1 },
"name": { "type": "string", "minLength": 2, "maxLength": 50 },
"email": { "type": "string", "format": "email" },
"role": { "type": "string", "enum": ["admin", "editor", "viewer"] },
"createdAt": { "type": "string", "format": "date-time" }
},
"required": ["id", "name", "email", "role"]
}
A schema-aware generator reads this and produces:
{
"id": 4217,
"name": "Samantha Chen",
"email": "samantha.chen@example.net",
"role": "editor",
"createdAt": "2025-11-03T14:22:08Z"
}
Every field satisfies its constraints. The name looks like a real name. The email passes validation. The role is one of the allowed values.
Realistic data types
A good mock generator goes beyond random strings. It recognizes common field semantics and produces contextually appropriate values:
| Field pattern | Generated as |
|---|---|
name, fullName |
Realistic person names |
email |
Valid email addresses |
phone |
Formatted phone numbers |
address, street |
Plausible addresses |
url, website |
Valid URLs |
date, createdAt |
ISO 8601 timestamps |
price, amount |
Numeric values with appropriate ranges |
avatar, image |
Placeholder image URLs |
This semantic awareness means you spend less time configuring and more time testing.
Controlling array sizes
When your schema includes arrays, you need control over how many items get generated. Too few and you miss pagination bugs. Too many and your test suite slows down.
{
"type": "array",
"items": { "$ref": "#/definitions/User" },
"minItems": 5,
"maxItems": 20
}
For unit tests, keep arrays small (3-5 items). For integration tests that exercise pagination or virtual scrolling, generate hundreds. For load testing, generate thousands — but write them to a file rather than holding them in memory.
Seeding for reproducibility
Random data is great for exploration but terrible for assertions. If your test expects "Samantha Chen" but the generator produces "Marco Rivera" on the next run, the test fails for no real reason.
The solution is deterministic seeding:
// Same seed always produces the same output
const data = generate(schema, { seed: 42 });
// data.name will always be "Samantha Chen" with seed 42
expect(data.name).toBe("Samantha Chen");
Use a fixed seed in CI pipelines and automated tests. Use random seeds (or no seed) during manual exploration to discover edge cases.
Nested objects and relations
Real APIs return nested data — a user has orders, each order has items, each item references a product. A flat generator that handles each object independently produces data where foreign keys point to nothing.
Better generators support relational mocking:
{
"user": {
"id": 1,
"orders": [
{
"id": 101,
"userId": 1,
"items": [
{ "productId": 55, "quantity": 2 }
]
}
]
}
}
The userId in the order matches the parent user's id. The productId references a product that exists in the dataset. This referential integrity is what makes mock data usable for integration testing.
Common pitfalls
- Hardcoding test data — Manually written fixtures become stale as schemas evolve. Generate from the schema and they always match.
- Ignoring edge cases — Add explicit generators for empty strings, null values, maximum-length strings, and boundary dates.
- Over-generating — Producing 10,000 records when 50 will do wastes time. Match the volume to the test scenario.
- Forgetting localization — If your app supports multiple locales, generate names, addresses, and dates in those locales.
- Using production-like emails — Always use
@example.comor@example.net(reserved by RFC 2606) to avoid accidentally emailing real people.
Workflow integration
The fastest workflow is: define your API schema once, then generate mock data for frontend development, unit tests, API documentation examples, and seed scripts. One schema, many uses.
Try our Mock Data Generator to create realistic test fixtures instantly — right in your browser, no upload required.