Test Data Management
Status: Complete
Category: Testing
Default enforcement: Soft
Author: PushBackLog team
Tags
- Topic: testing, quality
- Skillset: backend, devops
- Technology: generic
- Stage: execution, review
Summary
Test data management is the set of practices that ensure automated tests have access to the data they need, that each test is isolated from the side effects of other tests, and that test data reflects the realistic complexity of production data without exposing real user information. Poor test data management produces flaky tests, tests that interfere with each other, and tests that pass against synthetic data but fail against real data with edge cases.
Rationale
Shared mutable test state is the root cause of flaky tests
Tests that share a database and do not clean up after themselves pass in isolation but fail when run together, because each test leaves the database in a modified state that the next test does not expect. This produces non-deterministic test suites where run order matters, some tests fail only when run alongside others, and CI passes intermittently. These failures erode confidence in the test suite until people stop trusting it.
Production data is not acceptable test data
Using a copy of production data in test environments solves the realism problem but creates serious compliance and security risks. Real names, emails, payment details, and health information must not be present in any non-production environment. Beyond compliance, production data changes over time, making tests that depend on specific production records brittle and unrepeatable.
Guidance
Test isolation strategies
1. Transaction rollback per test (database tests)
Wrap each test in a database transaction and roll it back at the end. The database is always restored to its pre-test state.
// Jest + TypeORM example
beforeEach(async () => {
await dataSource.query('BEGIN');
});
afterEach(async () => {
await dataSource.query('ROLLBACK');
});
test('creates an order', async () => {
// This insert will be rolled back; no cleanup needed
await orderService.create(orderDto);
const orders = await orderRepository.findAll();
expect(orders).toHaveLength(1);
});
2. Database truncation between tests
Truncate all tables after every test or test suite. Slower than rollback but works across test boundaries and with async operations.
afterEach(async () => {
await truncateTables(['orders', 'order_items', 'payments']);
});
3. Test-owned data (per-test factories)
Each test creates exactly the data it needs using factory functions. No test relies on data created by another test.
// Factory function pattern
const buildUser = (overrides: Partial<User> = {}): User => ({
id: generateId(),
email: `test-${generateId()}@example.com`,
plan: 'free',
createdAt: new Date('2026-01-01'),
...overrides,
});
test('premium users can access billing features', async () => {
const user = await userRepository.save(buildUser({ plan: 'premium' }));
const result = await billingService.canAccess(user.id, 'billing');
expect(result).toBe(true);
});
Test data factories and builders
// Factory using a builder pattern for complex objects
class OrderFactory {
private dto: CreateOrderDto = {
customerId: 'cus_default',
items: [{ productId: 'prod_001', quantity: 1, pricePerUnit: 1000 }],
currency: 'GBP',
};
withCustomer(customerId: string): this {
this.dto.customerId = customerId;
return this;
}
withItems(items: OrderItemDto[]): this {
this.dto.items = items;
return this;
}
async create(): Promise<Order> {
return orderService.create(this.dto);
}
}
// In tests:
const order = await new OrderFactory()
.withCustomer(user.id)
.withItems([{ productId: 'prod_premium', quantity: 2, pricePerUnit: 5000 }])
.create();
Seed data vs. test-specific data
| Type | Use for | Strategy |
|---|---|---|
| Seed data | Reference data that tests assume exists (product catalogue, roles, config) | Load once at test suite startup; treat as read-only |
| Test-specific data | Data created and consumed by individual tests | Create per test; clean up after each test |
| Fixture files | Static JSON/YAML test inputs | Version-control alongside tests; never use production exports |
Data anonymisation for realistic test data
When realistic data shapes are needed:
- Use a data generation library (Faker.js, factory_boy) to produce realistic-looking but entirely synthetic data
- If using production data exports, run an anonymisation pipeline before it reaches any non-production environment
- Never store real production data in environment variables, seed files, or fixture files
import { faker } from '@faker-js/faker';
const buildRealisticUser = (): CreateUserDto => ({
name: faker.person.fullName(),
email: faker.internet.email(),
address: {
line1: faker.location.streetAddress(),
city: faker.location.city(),
postcode: faker.location.zipCode(),
},
});
Review checklist
- Tests do not share mutable state — each test creates its own data or is fully isolated
- Tests clean up after themselves (rollback/truncate/factory pattern)
- No real production data in test environments
- Seed data is version-controlled and environment-independent
- Factory functions generate unique identifiers to avoid conflicts
- Flaky tests are treated as bugs and fixed, not retried