Generating Mock Data: From JSON Schema to Realistic Test Fixtures

Every application needs test data. The question is whether you spend hours crafting it by hand or let a generator do it in seconds. This guide covers the principles behind effective mock data and how to avoid the traps that make test fixtures unreliable.

Why mock data matters

Real production data is off-limits for testing — privacy regulations, data size, and sensitivity all make it impractical. But testing with trivial placeholders like "test" and 123 misses bugs that only surface with realistic inputs:

Name fields that break on Unicode characters
Email validation that passes "aaa" but rejects "user+tag@sub.domain.co.uk"
Date parsing that works for 2026-01-15 but fails on 2026-02-29
Pagination logic that only shows up with 500+ records

Good mock data is realistic enough to catch these edge cases and structured enough to be predictable in tests.

Schema-driven generation

The most reliable approach is to define a JSON Schema and let the generator produce data that conforms to it. This guarantees type correctness and constraint satisfaction without manual effort.

{
  "type": "object",
  "properties": {
    "id": { "type": "integer", "minimum": 1 },
    "name": { "type": "string", "minLength": 2, "maxLength": 50 },
    "email": { "type": "string", "format": "email" },
    "role": { "type": "string", "enum": ["admin", "editor", "viewer"] },
    "createdAt": { "type": "string", "format": "date-time" }
  },
  "required": ["id", "name", "email", "role"]
}

A schema-aware generator reads this and produces:

{
  "id": 4217,
  "name": "Samantha Chen",
  "email": "samantha.chen@example.net",
  "role": "editor",
  "createdAt": "2025-11-03T14:22:08Z"
}

Every field satisfies its constraints. The name looks like a real name. The email passes validation. The role is one of the allowed values.

Realistic data types

A good mock generator goes beyond random strings. It recognizes common field semantics and produces contextually appropriate values:

Field pattern	Generated as
`name`, `fullName`	Realistic person names
`email`	Valid email addresses
`phone`	Formatted phone numbers
`address`, `street`	Plausible addresses
`url`, `website`	Valid URLs
`date`, `createdAt`	ISO 8601 timestamps
`price`, `amount`	Numeric values with appropriate ranges
`avatar`, `image`	Placeholder image URLs

This semantic awareness means you spend less time configuring and more time testing.

Controlling array sizes

When your schema includes arrays, you need control over how many items get generated. Too few and you miss pagination bugs. Too many and your test suite slows down.

{
  "type": "array",
  "items": { "$ref": "#/definitions/User" },
  "minItems": 5,
  "maxItems": 20
}

For unit tests, keep arrays small (3-5 items). For integration tests that exercise pagination or virtual scrolling, generate hundreds. For load testing, generate thousands — but write them to a file rather than holding them in memory.

Seeding for reproducibility

Random data is great for exploration but terrible for assertions. If your test expects "Samantha Chen" but the generator produces "Marco Rivera" on the next run, the test fails for no real reason.

The solution is deterministic seeding:

// Same seed always produces the same output
const data = generate(schema, { seed: 42 });

// data.name will always be "Samantha Chen" with seed 42
expect(data.name).toBe("Samantha Chen");

Use a fixed seed in CI pipelines and automated tests. Use random seeds (or no seed) during manual exploration to discover edge cases.

Nested objects and relations

Real APIs return nested data — a user has orders, each order has items, each item references a product. A flat generator that handles each object independently produces data where foreign keys point to nothing.

Better generators support relational mocking:

{
  "user": {
    "id": 1,
    "orders": [
      {
        "id": 101,
        "userId": 1,
        "items": [
          { "productId": 55, "quantity": 2 }
        ]
      }
    ]
  }
}

The userId in the order matches the parent user's id. The productId references a product that exists in the dataset. This referential integrity is what makes mock data usable for integration testing.

Common pitfalls

Hardcoding test data — Manually written fixtures become stale as schemas evolve. Generate from the schema and they always match.

Ignoring edge cases — Add explicit generators for empty strings, null values, maximum-length strings, and boundary dates.

Over-generating — Producing 10,000 records when 50 will do wastes time. Match the volume to the test scenario.

Forgetting localization — If your app supports multiple locales, generate names, addresses, and dates in those locales.

Using production-like emails — Always use @example.com or @example.net (reserved by RFC 2606) to avoid accidentally emailing real people.

Workflow integration

The fastest workflow is: define your API schema once, then generate mock data for frontend development, unit tests, API documentation examples, and seed scripts. One schema, many uses.

Try our Mock Data Generator to create realistic test fixtures instantly — right in your browser, no upload required.

Generating Mock Data: From JSON Schema to Realistic Test Fixtures

Why mock data matters

Name fields that break on Unicode characters
Email validation that passes "aaa" but rejects "user+tag@sub.domain.co.uk"
Date parsing that works for 2026-01-15 but fails on 2026-02-29
Pagination logic that only shows up with 500+ records

Good mock data is realistic enough to catch these edge cases and structured enough to be predictable in tests.

Schema-driven generation

The most reliable approach is to define a JSON Schema and let the generator produce data that conforms to it. This guarantees type correctness and constraint satisfaction without manual effort.

{
  "type": "object",
  "properties": {
    "id": { "type": "integer", "minimum": 1 },
    "name": { "type": "string", "minLength": 2, "maxLength": 50 },
    "email": { "type": "string", "format": "email" },
    "role": { "type": "string", "enum": ["admin", "editor", "viewer"] },
    "createdAt": { "type": "string", "format": "date-time" }
  },
  "required": ["id", "name", "email", "role"]
}

A schema-aware generator reads this and produces:

{
  "id": 4217,
  "name": "Samantha Chen",
  "email": "samantha.chen@example.net",
  "role": "editor",
  "createdAt": "2025-11-03T14:22:08Z"
}

Every field satisfies its constraints. The name looks like a real name. The email passes validation. The role is one of the allowed values.

Realistic data types

A good mock generator goes beyond random strings. It recognizes common field semantics and produces contextually appropriate values:

Field pattern	Generated as
`name`, `fullName`	Realistic person names
`email`	Valid email addresses
`phone`	Formatted phone numbers
`address`, `street`	Plausible addresses
`url`, `website`	Valid URLs
`date`, `createdAt`	ISO 8601 timestamps
`price`, `amount`	Numeric values with appropriate ranges
`avatar`, `image`	Placeholder image URLs

This semantic awareness means you spend less time configuring and more time testing.

Controlling array sizes

When your schema includes arrays, you need control over how many items get generated. Too few and you miss pagination bugs. Too many and your test suite slows down.

{
  "type": "array",
  "items": { "$ref": "#/definitions/User" },
  "minItems": 5,
  "maxItems": 20
}

Seeding for reproducibility

The solution is deterministic seeding:

// Same seed always produces the same output
const data = generate(schema, { seed: 42 });

// data.name will always be "Samantha Chen" with seed 42
expect(data.name).toBe("Samantha Chen");

Use a fixed seed in CI pipelines and automated tests. Use random seeds (or no seed) during manual exploration to discover edge cases.

Nested objects and relations

Better generators support relational mocking:

{
  "user": {
    "id": 1,
    "orders": [
      {
        "id": 101,
        "userId": 1,
        "items": [
          { "productId": 55, "quantity": 2 }
        ]
      }
    ]
  }
}

Common pitfalls

Hardcoding test data — Manually written fixtures become stale as schemas evolve. Generate from the schema and they always match.

Ignoring edge cases — Add explicit generators for empty strings, null values, maximum-length strings, and boundary dates.

Over-generating — Producing 10,000 records when 50 will do wastes time. Match the volume to the test scenario.

Forgetting localization — If your app supports multiple locales, generate names, addresses, and dates in those locales.

Using production-like emails — Always use @example.com or @example.net (reserved by RFC 2606) to avoid accidentally emailing real people.

Workflow integration

The fastest workflow is: define your API schema once, then generate mock data for frontend development, unit tests, API documentation examples, and seed scripts. One schema, many uses.

Try our Mock Data Generator to create realistic test fixtures instantly — right in your browser, no upload required.

Generating Mock Data: From JSON Schema to Realistic Test Fixtures

Article

Generating Mock Data: From JSON Schema to Realistic Test Fixtures

Why mock data matters

Schema-driven generation

Realistic data types

Controlling array sizes

Seeding for reproducibility

Nested objects and relations

Common pitfalls

Workflow integration

Generating Mock Data: From JSON Schema to Realistic Test Fixtures

Article

Generating Mock Data: From JSON Schema to Realistic Test Fixtures

Why mock data matters

Schema-driven generation

Realistic data types

Controlling array sizes

Seeding for reproducibility

Nested objects and relations

Common pitfalls

Workflow integration