PushBackLog

Blue/Green Deployments

Soft enforcement Complete by PushBackLog team
Topic: infrastructure Topic: delivery Topic: reliability Skillset: devops Skillset: engineering Technology: generic Stage: delivery Stage: operations

Blue/Green Deployments

Status: Complete
Category: Infrastructure
Default enforcement: Soft
Author: PushBackLog team


Tags

  • Topic: infrastructure, delivery, reliability
  • Skillset: devops, engineering
  • Technology: generic
  • Stage: delivery, operations

Summary

Blue/green deployment maintains two identical production environments — blue (current live version) and green (new version) — and switches traffic between them instantaneously by updating a load balancer or DNS record. The previous version remains running and idle after the switch, enabling a near-instant rollback if the new version has problems. Blue/green eliminates deployment downtime and minimises the rollback window for production failures.


Rationale

Zero-downtime deployments

A traditional in-place deployment — stopping the old version, deploying the new version — creates a window of unavailability. For high-traffic services, even thirty seconds of downtime during off-peak hours can represent thousands of failed requests. Blue/green deployments switch traffic atomically: the old version handles requests until the switch, and the new version handles them after. Users experience no interruption.

Instant, safe rollback

When a deployment introduces a regression that is not caught in pre-production, the rollback procedure in a blue/green setup is to switch the load balancer back to the previous environment. This takes seconds and requires no re-deployment. The risk of “the rollback itself fails” is minimised because the old version is still running and warmed up.


Guidance

How it works

Internet


Load Balancer ──── Blue (v1.4.2) [idle after promotion]

   └──────────────► Green (v1.5.0) [currently live]
  1. Deploy new version to the idle environment (green)
  2. Run smoke tests against green (not yet receiving live traffic)
  3. Switch the load balancer to point at green
  4. Monitor error rates and latency
  5. If healthy: celebrate; eventually de-provision or recycle blue
  6. If problem: switch load balancer back to blue (< 30 seconds)

AWS implementation

With Elastic Load Balancer target group swap:

# Get current live and idle target group ARNs
LIVE_TG_ARN=$(aws elbv2 describe-target-groups \
  --names "api-blue" --query "TargetGroups[0].TargetGroupArn" --output text)
IDLE_TG_ARN=$(aws elbv2 describe-target-groups \
  --names "api-green" --query "TargetGroups[0].TargetGroupArn" --output text)

# Deploy new version to idle target group
# ... deploy containers to green ECS service ...

# Run smoke tests against green
./scripts/smoke-test.sh --target $GREEN_URL

# Perform the swap — atomic traffic switch
aws elbv2 modify-listener \
  --listener-arn $LISTENER_ARN \
  --default-actions Type=forward,TargetGroupArn=$IDLE_TG_ARN

With AWS CodeDeploy (ECS):

# appspec.yaml
version: 0.0
Resources:
  - TargetService:
      Type: AWS::ECS::Service
      Properties:
        TaskDefinition: !Ref NewTaskDefinitionArn
        LoadBalancerInfo:
          ContainerName: api
          ContainerPort: 8080
        PlatformVersion: LATEST
Hooks:
  - BeforeAllowTraffic: ValidateServiceFunction  # Lambda smoke test
  - AfterAllowTraffic: MonitorProductionTraffic  # 5-minute canary window

Kubernetes with two Deployments

# Service always points to the 'live' selector
apiVersion: v1
kind: Service
metadata:
  name: api
spec:
  selector:
    app: api
    slot: green        # <-- change to 'blue' for rollback
  ports:
    - port: 80
      targetPort: 8080
---
# Deploy green in parallel with blue
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
      slot: green
  template:
    metadata:
      labels:
        app: api
        slot: green
    spec:
      containers:
        - name: api
          image: myorg/api:v1.5.0

Rollback: change the Service selector from green to blue — no re-deployment required.

Database migrations with blue/green

Database schema changes are the hardest part of blue/green deployments. Both versions must be able to run against the same database simultaneously:

  1. Expand: add new columns/tables in migration; old code ignores them; new code uses them
  2. Switch traffic: old version still working (uses old schema); new version uses new schema
  3. Contract: remove old columns/tables in a subsequent migration only after the old version is decommissioned

Migrations that rename columns, change types, or remove columns must be split into multiple releases.

When blue/green is overkill

ScenarioBetter approach
Gradual traffic shifting to validate the new versionCanary Releases
Simple stateless services with fast health checksRolling deployment
Services with negligible traffic/no SLADirect in-place deployment

Review checklist

  • Both environments are identically provisioned (same instance types, same config)
  • Smoke test suite runs against the idle environment before switching
  • Database migrations are backwards-compatible (expand/contract pattern)
  • Traffic switch is automated (not a manual DNS change)
  • Rollback procedure is documented, tested, and takes less than 5 minutes
  • Monitoring dashboards are reviewed for 5–15 minutes after each switch