PushBackLog

Load & Performance Testing

Soft enforcement Complete by PushBackLog team
Topic: testing Topic: performance Topic: reliability Skillset: backend Skillset: devops Technology: generic Stage: execution Stage: review

Load & Performance Testing

Status: Complete
Category: Testing
Default enforcement: Soft
Author: PushBackLog team


Tags

  • Topic: testing, performance, reliability
  • Skillset: backend, devops
  • Technology: generic
  • Stage: execution, review

Summary

Load testing verifies that a system meets its performance requirements under realistic and peak traffic conditions before those conditions occur in production. Unlike unit or integration tests, load tests reveal system-level properties — throughput, latency percentiles, resource exhaustion, and degradation under stress — that only emerge at scale. Running load tests before launch or after significant architectural changes is one of the most reliable ways to prevent avoidable production performance incidents.


Rationale

Performance requirements not specified are performance requirements not met

A feature that works correctly for one user may behave unacceptably for ten thousand. Database queries that complete in 10ms against 1,000 records take 30 seconds against 10,000,000. Thread pools that handle 50 concurrent requests safely saturate and queue at 500. These failures are entirely predictable and entirely preventable — but only if tested before they occur in production.

The most common failure mode is not that teams do not care about performance, but that they do not define performance requirements in measurable terms and do not verify them before shipping. “It should be fast” is not a requirement. “p95 response time < 500ms at 1,000 concurrent users” is.

Production is not the right place to discover capacity limits

Discovering a capacity ceiling in production during a peak traffic event means users experience the failure. Discovering it in a load test means engineers fix it quietly at a time of their choosing. The second outcome is strictly better.


Guidance

Test types

Test typePurposeCharacteristics
Load testVerify behaviour at expected traffic levelSustained expected peak load
Stress testFind the breaking pointGradually increase load until failure
Soak testDetect degradation over time (memory leaks, connection pool exhaustion)Moderate load sustained for hours
Spike testVerify behaviour under sudden traffic spikesRapid ramp from low to high and back

Setting performance targets

Write performance requirements before writing load tests. Define targets in SLO-compatible terms:

# Performance SLOs for the orders API
- endpoint: GET /orders
  p50_response_time: < 100ms
  p95_response_time: < 300ms
  p99_response_time: < 1000ms
  error_rate: < 0.1%
  target_concurrent_users: 500

- endpoint: POST /orders
  p95_response_time: < 500ms
  error_rate: < 0.5%
  target_concurrent_users: 100

Example load test (k6)

// k6 load test for order creation endpoint
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Trend, Rate } from 'k6/metrics';

const responseTime = new Trend('response_time');
const errorRate = new Rate('error_rate');

export const options = {
  stages: [
    { duration: '2m', target: 50 },   // Ramp up to 50 users
    { duration: '5m', target: 100 },  // Ramp up to expected peak
    { duration: '5m', target: 100 },  // Hold at peak
    { duration: '2m', target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],  // 95% of requests under 500ms
    error_rate: ['rate<0.01'],          // Error rate under 1%
  },
};

export default function () {
  const payload = JSON.stringify({
    customerId: 'cus_test_123',
    items: [{ productId: 'prod_001', quantity: 1 }],
  });

  const res = http.post('http://api.staging.example.com/orders', payload, {
    headers: { 'Content-Type': 'application/json' },
  });

  responseTime.add(res.timings.duration);
  errorRate.add(res.status !== 201);

  check(res, {
    'status is 201': (r) => r.status === 201,
    'response time < 500ms': (r) => r.timings.duration < 500,
  });

  sleep(1);
}

What to measure

MetricMeaning
Throughput (req/s)How many requests the system handles per second
Latency p50/p95/p99Median, 95th, 99th percentile response times
Error ratePercentage of responses with 4xx/5xx status codes
CPU/memory utilisationResource consumption under load
Database connection pool exhaustionPool saturation indicates a scalability limit
GC pressureExcessive garbage collection under load (JVM/Node.js)

Where to run load tests

  • Against staging, not production — load tests generate artificial traffic that distorts metrics and may consume real resources (emails, payments)
  • With production-representative data volumes — a staging database with 100 rows does not reveal N+1 query problems that only appear at 10 million rows
  • As part of CI on a pre-release gate — run a shorter smoke-load test (1 minute, expected volume) on every production deployment

Common performance problems revealed by load tests

  • N+1 queries (latency increases linearly with record count)
  • Missing database indexes (full table scans appear at high concurrency)
  • Thread/connection pool exhaustion (latency spikes and timeouts at concurrent user limits)
  • Memory leaks (soak tests — memory grows monotonically over time)
  • Synchronous blocking in async frameworks (event loop starvation in Node.js)
  • Unindexed pagination (OFFSET pagination degrades at large page numbers)