Backup Strategies

Status: Complete
Category: Infrastructure
Default enforcement: Soft
Author: PushBackLog team

Summary

A backup strategy defines how, how often, where, and for how long copies of critical data are retained. The goal is to ensure data can be recovered within the organisation’s defined Recovery Point Objective (RPO) with high confidence. The most dangerous misconception about backups is that having a backup equals having recovery capability; the two are different things. A backup strategy is incomplete without a tested, documented restore procedure — an unverified backup is not a backup.

Rationale

An untested restore is not a backup

It is common for organisations to have automated backup processes running for years that have never been verified. The backup job succeeds nightly but produces corrupt files, uses the wrong format, or backs up an empty directory. The first time the restore is needed is during a crisis — the moment of maximum pressure and minimum time. Restore verification must be a regular, scheduled activity, not a paper assumption.

The 3-2-1 rule provides resilience by construction

The 3-2-1 rule is the simplest mental model for a resilient backup strategy: 3 copies of data, 2 different types of media, 1 off-site copy. The rationale is that any single failure mode — hardware failure, data centre loss, accidental deletion — cannot destroy all three copies simultaneously. A database backup written only to an S3 bucket in the same region as the database violates this principle.

Guidance

The 3-2-1 rule

Factor	What it means	Example
3 copies	Original + two backups	Production DB + S3 daily snapshot + S3 Glacier archive
2 media types	Two different storage types/locations	AWS region 1 + AWS region 2 (or on-premises tape)
1 off-site copy	At least one copy geographically separate	Cross-region S3 replication

Backup types

Type	Description	RPO potential	Storage impact
Full backup	Complete copy of all data	Days (if daily)	High
Incremental	Only changes since last backup	Hours (if hourly)	Low
Differential	Changes since last full backup	Hours	Medium
Continuous / point-in-time	Transaction log replay to any moment	Seconds/minutes	Varies

AWS RDS automated backups

# Terraform: configure RDS backup window and retention
resource "aws_db_instance" "primary" {
  identifier     = "myapp-prod"
  engine         = "postgres"
  engine_version = "15.3"
  instance_class = "db.t3.medium"

  # Backup configuration
  backup_retention_period    = 30        # Days to retain automated backups
  backup_window              = "03:00-04:00"  # UTC; low-traffic window
  delete_automated_backups   = false
  deletion_protection        = true      # Prevent accidental deletion

  # Point-in-time recovery enabled automatically when retention > 0
}

Point-in-time recovery (PITR) allows restoration to any second within the retention window — critical for recovering from data corruption events where the corruption was introduced gradually.

Cross-region backup replication

# Copy snapshots to another region for DR
resource "aws_db_instance_automated_backups_replication" "dr" {
  source_db_instance_arn = aws_db_instance.primary.arn
  retention_period        = 7

  # Backups are replicated to another region automatically
  # The KMS key must exist in the destination region
  kms_key_id = aws_kms_key.backup_dr.arn
}

Or schedule manual snapshot copies with a Lambda:

# Lambda: copy latest RDS snapshot to DR region
import boto3

def handler(event, context):
    source = boto3.client('rds', region_name='us-east-1')
    dest   = boto3.client('rds', region_name='eu-west-1')

    # Get latest automated snapshot
    snaps = source.describe_db_snapshots(
        DBInstanceIdentifier='myapp-prod',
        SnapshotType='automated',
    )['DBSnapshots']
    latest = sorted(snaps, key=lambda x: x['SnapshotCreateTime'])[-1]

    # Copy to DR region
    dest.copy_db_snapshot(
        SourceDBSnapshotIdentifier=latest['DBSnapshotArn'],
        TargetDBSnapshotIdentifier=f"dr-copy-{latest['DBSnapshotIdentifier']}",
        SourceRegion='us-east-1',
    )

S3 versioning and cross-region replication

For object storage:

resource "aws_s3_bucket_versioning" "primary" {
  bucket = aws_s3_bucket.primary.id

  versioning_configuration {
    status = "Enabled"  # Retains all versions; enables recovery from accidental deletion
  }
}

resource "aws_s3_bucket_replication_configuration" "dr" {
  bucket = aws_s3_bucket.primary.id
  role   = aws_iam_role.replication.arn

  rule {
    id     = "replicate-all"
    status = "Enabled"

    destination {
      bucket        = aws_s3_bucket.dr_region.arn
      storage_class = "STANDARD_IA"  # Cheaper for DR data rarely accessed
    }
  }
}

Retention policy

Define retention tiers based on compliance and operational requirements:

Tier	Retention	Storage class
Hourly backups	48 hours	S3 Standard
Daily backups	30 days	S3 Standard-IA
Weekly backups	12 weeks	S3 Glacier Instant Retrieval
Monthly backups	1 year	S3 Glacier Flexible
Yearly backups	7 years (compliance)	S3 Glacier Deep Archive

Define lifecycle policies in S3 / AWS Backup to automatically transition and expire backups.

Backup verification schedule

Test	Frequency	Procedure
Automated restore test	Weekly	Spin up a test DB from the latest snapshot; run integrity queries
Full recovery drill	Monthly	Restore to a separate environment; verify application starts and data is correct
Cross-region restore	Quarterly	Verify the DR region backup can be restored successfully

An automated weekly restore test is the minimum bar:

# CI/CD backup verification job (weekly cron)
aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier backup-verify-$(date +%Y%m%d) \
  --db-snapshot-identifier $(latest_snapshot) \
  --db-instance-class db.t3.medium
  
# Wait for available, run integrity queries, then delete

Review checklist

3-2-1 rule satisfied: 3 copies, 2 media types, 1 off-site
Point-in-time recovery enabled for all production databases
Backup retention period meets or exceeds RPO requirements and compliance requirements
Automated restore verification is scheduled and running
Cross-region backup replication is active for DR
Retention lifecycle policies are configured to manage storage cost
Access to backup data is restricted (least privilege — backups are highly sensitive)

Backup Strategies

Backup Strategies

Tags

Summary

Rationale

An untested restore is not a backup

The 3-2-1 rule provides resilience by construction

Guidance

The 3-2-1 rule

Backup types

AWS RDS automated backups

Cross-region backup replication

S3 versioning and cross-region replication

Retention policy

Backup verification schedule

Review checklist