PushBackLog

Database Indexing Strategy

Soft enforcement Complete by PushBackLog team
Topic: performance Topic: database Skillset: backend Skillset: fullstack Technology: generic Stage: execution Stage: review

Database Indexing Strategy

Status: Complete
Category: Performance
Default enforcement: Soft
Author: PushBackLog team


Tags

  • Topic: performance, database
  • Skillset: backend, fullstack
  • Technology: generic
  • Stage: execution, review

Summary

A database index is a data structure that allows the database engine to locate rows matching a query condition without scanning every row in the table. Missing indexes on frequently queried columns are one of the most common and highest-impact performance problems in production systems — a query that takes 2ms against an indexed column can take 30 seconds against 10 million rows without one. Conversely, excessive or redundant indexes slow down writes and consume storage. The goal is to index what is queried and no more.


Rationale

Full table scans are invisible in development and catastrophic at scale

In development, database tables have hundreds or thousands of rows. Queries complete in milliseconds regardless of indexing. The same query against a production table with millions of rows performs a full sequential scan — reading every row to find the matching ones. The difference between a 2ms indexed query and a 30-second full scan is invisible in development and a production incident in production.

Indexing decisions made at design time (or caught in review) are far cheaper to correct than indexing problems discovered under production load.

Over-indexing has real costs

Every index is stored on disk and maintained on every write. A table with fifteen indexes on it has fifteen data structures to update on every INSERT, UPDATE, and DELETE. For write-heavy tables, excessive indexes slow write throughput and add storage overhead. Unused indexes should be identified and removed.


Guidance

Index types

TypeUse for
B-Tree (default)Equality and range queries (=, >, <, BETWEEN, LIKE 'prefix%')
HashEquality-only queries; faster than B-Tree for equality at the cost of range support
GINFull-text search, JSONB queries, array containment
GiST / SP-GiSTGeospatial queries, range types
BRINVery large, naturally ordered tables (time-series, sequential IDs) — low overhead

Single-column indexes

Add a single-column index to any column that appears frequently in:

  • WHERE clauses (WHERE email = $1)
  • JOIN conditions (ON users.id = orders.user_id)
  • ORDER BY clauses on large result sets
-- Index for user lookup by email (authentication hot path)
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

-- Index for order lookup by customer
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders(customer_id);

-- Index for filtering active subscriptions
CREATE INDEX CONCURRENTLY idx_subscriptions_status ON subscriptions(status)
WHERE status = 'active'; -- Partial index — smaller, more selective

Composite indexes

When queries filter on multiple columns, a composite index can serve the entire filter in one index scan:

-- Query: WHERE user_id = $1 AND status = $2 ORDER BY created_at DESC
CREATE INDEX CONCURRENTLY idx_orders_user_status_created
ON orders(user_id, status, created_at DESC);

Column order matters: the index is useful for queries that filter on a prefix of the index columns. An index on (user_id, status, created_at) serves:

  • WHERE user_id = $1
  • WHERE user_id = $1 AND status = $2
  • WHERE user_id = $1 AND status = $2 ORDER BY created_at

It does not efficiently serve WHERE status = $1 alone.

Covering indexes

A covering index includes all columns needed by a query so the engine can answer it entirely from the index without accessing the table:

-- If this query is frequent:
SELECT id, email, plan FROM users WHERE status = 'active';

-- A covering index avoids table access entirely:
CREATE INDEX idx_users_status_covering ON users(status) INCLUDE (id, email, plan);

Detecting missing indexes

-- PostgreSQL: identify sequential scans on large tables
SELECT
  schemaname, relname, seq_scan, seq_tup_read,
  idx_scan, idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > 0
  AND n_live_tup > 10000
ORDER BY seq_scan DESC;

-- Identify slow queries (requires pg_stat_statements)
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
WHERE mean_exec_time > 100
ORDER BY mean_exec_time DESC;

-- Use EXPLAIN ANALYZE to understand query plans
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 'cus_123';

An Seq Scan on a large table in an EXPLAIN ANALYZE output is a signal that an index may be missing or not being used.

Identifying unused indexes

-- PostgreSQL: indexes never used
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND schemaname = 'public';

Drop unused indexes — they cost writes without providing reads.

Review checklist

  • Any column appearing in a WHERE, JOIN, or ORDER BY on a table > 100k rows has an index
  • Composite indexes are ordered with the most selective filter column first
  • Indexes are created with CONCURRENTLY to avoid locking the table in production
  • Partial indexes are used where a common filtered value makes a full index oversized
  • Unused indexes are identified and scheduled for removal in the next migration
  • EXPLAIN ANALYZE is reviewed for new queries on large tables before deployment