PushBackLog

Test Data Management

Soft enforcement Complete by PushBackLog team
Topic: testing Topic: quality Skillset: backend Skillset: devops Technology: generic Stage: execution Stage: review

Test Data Management

Status: Complete
Category: Testing
Default enforcement: Soft
Author: PushBackLog team


Tags

  • Topic: testing, quality
  • Skillset: backend, devops
  • Technology: generic
  • Stage: execution, review

Summary

Test data management is the set of practices that ensure automated tests have access to the data they need, that each test is isolated from the side effects of other tests, and that test data reflects the realistic complexity of production data without exposing real user information. Poor test data management produces flaky tests, tests that interfere with each other, and tests that pass against synthetic data but fail against real data with edge cases.


Rationale

Shared mutable test state is the root cause of flaky tests

Tests that share a database and do not clean up after themselves pass in isolation but fail when run together, because each test leaves the database in a modified state that the next test does not expect. This produces non-deterministic test suites where run order matters, some tests fail only when run alongside others, and CI passes intermittently. These failures erode confidence in the test suite until people stop trusting it.

Production data is not acceptable test data

Using a copy of production data in test environments solves the realism problem but creates serious compliance and security risks. Real names, emails, payment details, and health information must not be present in any non-production environment. Beyond compliance, production data changes over time, making tests that depend on specific production records brittle and unrepeatable.


Guidance

Test isolation strategies

1. Transaction rollback per test (database tests)

Wrap each test in a database transaction and roll it back at the end. The database is always restored to its pre-test state.

// Jest + TypeORM example
beforeEach(async () => {
  await dataSource.query('BEGIN');
});

afterEach(async () => {
  await dataSource.query('ROLLBACK');
});

test('creates an order', async () => {
  // This insert will be rolled back; no cleanup needed
  await orderService.create(orderDto);
  const orders = await orderRepository.findAll();
  expect(orders).toHaveLength(1);
});

2. Database truncation between tests

Truncate all tables after every test or test suite. Slower than rollback but works across test boundaries and with async operations.

afterEach(async () => {
  await truncateTables(['orders', 'order_items', 'payments']);
});

3. Test-owned data (per-test factories)

Each test creates exactly the data it needs using factory functions. No test relies on data created by another test.

// Factory function pattern
const buildUser = (overrides: Partial<User> = {}): User => ({
  id: generateId(),
  email: `test-${generateId()}@example.com`,
  plan: 'free',
  createdAt: new Date('2026-01-01'),
  ...overrides,
});

test('premium users can access billing features', async () => {
  const user = await userRepository.save(buildUser({ plan: 'premium' }));
  const result = await billingService.canAccess(user.id, 'billing');
  expect(result).toBe(true);
});

Test data factories and builders

// Factory using a builder pattern for complex objects
class OrderFactory {
  private dto: CreateOrderDto = {
    customerId: 'cus_default',
    items: [{ productId: 'prod_001', quantity: 1, pricePerUnit: 1000 }],
    currency: 'GBP',
  };

  withCustomer(customerId: string): this {
    this.dto.customerId = customerId;
    return this;
  }

  withItems(items: OrderItemDto[]): this {
    this.dto.items = items;
    return this;
  }

  async create(): Promise<Order> {
    return orderService.create(this.dto);
  }
}

// In tests:
const order = await new OrderFactory()
  .withCustomer(user.id)
  .withItems([{ productId: 'prod_premium', quantity: 2, pricePerUnit: 5000 }])
  .create();

Seed data vs. test-specific data

TypeUse forStrategy
Seed dataReference data that tests assume exists (product catalogue, roles, config)Load once at test suite startup; treat as read-only
Test-specific dataData created and consumed by individual testsCreate per test; clean up after each test
Fixture filesStatic JSON/YAML test inputsVersion-control alongside tests; never use production exports

Data anonymisation for realistic test data

When realistic data shapes are needed:

  • Use a data generation library (Faker.js, factory_boy) to produce realistic-looking but entirely synthetic data
  • If using production data exports, run an anonymisation pipeline before it reaches any non-production environment
  • Never store real production data in environment variables, seed files, or fixture files
import { faker } from '@faker-js/faker';

const buildRealisticUser = (): CreateUserDto => ({
  name: faker.person.fullName(),
  email: faker.internet.email(),
  address: {
    line1: faker.location.streetAddress(),
    city: faker.location.city(),
    postcode: faker.location.zipCode(),
  },
});

Review checklist

  • Tests do not share mutable state — each test creates its own data or is fully isolated
  • Tests clean up after themselves (rollback/truncate/factory pattern)
  • No real production data in test environments
  • Seed data is version-controlled and environment-independent
  • Factory functions generate unique identifiers to avoid conflicts
  • Flaky tests are treated as bugs and fixed, not retried