Test Data Management

Status: Complete
Category: Testing
Default enforcement: Soft
Author: PushBackLog team

Summary

Test data management is the set of practices that ensure automated tests have access to the data they need, that each test is isolated from the side effects of other tests, and that test data reflects the realistic complexity of production data without exposing real user information. Poor test data management produces flaky tests, tests that interfere with each other, and tests that pass against synthetic data but fail against real data with edge cases.

Rationale

Shared mutable test state is the root cause of flaky tests

Tests that share a database and do not clean up after themselves pass in isolation but fail when run together, because each test leaves the database in a modified state that the next test does not expect. This produces non-deterministic test suites where run order matters, some tests fail only when run alongside others, and CI passes intermittently. These failures erode confidence in the test suite until people stop trusting it.

Production data is not acceptable test data

Using a copy of production data in test environments solves the realism problem but creates serious compliance and security risks. Real names, emails, payment details, and health information must not be present in any non-production environment. Beyond compliance, production data changes over time, making tests that depend on specific production records brittle and unrepeatable.

Guidance

Test isolation strategies

1. Transaction rollback per test (database tests)

Wrap each test in a database transaction and roll it back at the end. The database is always restored to its pre-test state.

// Jest + TypeORM example
beforeEach(async () => {
  await dataSource.query('BEGIN');
});

afterEach(async () => {
  await dataSource.query('ROLLBACK');
});

test('creates an order', async () => {
  // This insert will be rolled back; no cleanup needed
  await orderService.create(orderDto);
  const orders = await orderRepository.findAll();
  expect(orders).toHaveLength(1);
});

2. Database truncation between tests

Truncate all tables after every test or test suite. Slower than rollback but works across test boundaries and with async operations.

afterEach(async () => {
  await truncateTables(['orders', 'order_items', 'payments']);
});

3. Test-owned data (per-test factories)

Each test creates exactly the data it needs using factory functions. No test relies on data created by another test.

// Factory function pattern
const buildUser = (overrides: Partial<User> = {}): User => ({
  id: generateId(),
  email: `test-${generateId()}@example.com`,
  plan: 'free',
  createdAt: new Date('2026-01-01'),
  ...overrides,
});

test('premium users can access billing features', async () => {
  const user = await userRepository.save(buildUser({ plan: 'premium' }));
  const result = await billingService.canAccess(user.id, 'billing');
  expect(result).toBe(true);
});

Test data factories and builders

// Factory using a builder pattern for complex objects
class OrderFactory {
  private dto: CreateOrderDto = {
    customerId: 'cus_default',
    items: [{ productId: 'prod_001', quantity: 1, pricePerUnit: 1000 }],
    currency: 'GBP',
  };

  withCustomer(customerId: string): this {
    this.dto.customerId = customerId;
    return this;
  }

  withItems(items: OrderItemDto[]): this {
    this.dto.items = items;
    return this;
  }

  async create(): Promise<Order> {
    return orderService.create(this.dto);
  }
}

// In tests:
const order = await new OrderFactory()
  .withCustomer(user.id)
  .withItems([{ productId: 'prod_premium', quantity: 2, pricePerUnit: 5000 }])
  .create();

Seed data vs. test-specific data

Type	Use for	Strategy
Seed data	Reference data that tests assume exists (product catalogue, roles, config)	Load once at test suite startup; treat as read-only
Test-specific data	Data created and consumed by individual tests	Create per test; clean up after each test
Fixture files	Static JSON/YAML test inputs	Version-control alongside tests; never use production exports

Data anonymisation for realistic test data

When realistic data shapes are needed:

Use a data generation library (Faker.js, factory_boy) to produce realistic-looking but entirely synthetic data
If using production data exports, run an anonymisation pipeline before it reaches any non-production environment
Never store real production data in environment variables, seed files, or fixture files

import { faker } from '@faker-js/faker';

const buildRealisticUser = (): CreateUserDto => ({
  name: faker.person.fullName(),
  email: faker.internet.email(),
  address: {
    line1: faker.location.streetAddress(),
    city: faker.location.city(),
    postcode: faker.location.zipCode(),
  },
});

Review checklist

Tests do not share mutable state — each test creates its own data or is fully isolated
Tests clean up after themselves (rollback/truncate/factory pattern)
No real production data in test environments
Seed data is version-controlled and environment-independent
Factory functions generate unique identifiers to avoid conflicts
Flaky tests are treated as bugs and fixed, not retried

Test Data Management

Test Data Management

Tags

Summary

Rationale

Shared mutable test state is the root cause of flaky tests

Production data is not acceptable test data

Guidance

Test isolation strategies

1. Transaction rollback per test (database tests)

2. Database truncation between tests

3. Test-owned data (per-test factories)

Test data factories and builders

Seed data vs. test-specific data

Data anonymisation for realistic test data

Review checklist