Blue/Green Deployments
Status: Complete
Category: Infrastructure
Default enforcement: Soft
Author: PushBackLog team
Tags
- Topic: infrastructure, delivery, reliability
- Skillset: devops, engineering
- Technology: generic
- Stage: delivery, operations
Summary
Blue/green deployment maintains two identical production environments — blue (current live version) and green (new version) — and switches traffic between them instantaneously by updating a load balancer or DNS record. The previous version remains running and idle after the switch, enabling a near-instant rollback if the new version has problems. Blue/green eliminates deployment downtime and minimises the rollback window for production failures.
Rationale
Zero-downtime deployments
A traditional in-place deployment — stopping the old version, deploying the new version — creates a window of unavailability. For high-traffic services, even thirty seconds of downtime during off-peak hours can represent thousands of failed requests. Blue/green deployments switch traffic atomically: the old version handles requests until the switch, and the new version handles them after. Users experience no interruption.
Instant, safe rollback
When a deployment introduces a regression that is not caught in pre-production, the rollback procedure in a blue/green setup is to switch the load balancer back to the previous environment. This takes seconds and requires no re-deployment. The risk of “the rollback itself fails” is minimised because the old version is still running and warmed up.
Guidance
How it works
Internet
│
▼
Load Balancer ──── Blue (v1.4.2) [idle after promotion]
│
└──────────────► Green (v1.5.0) [currently live]
- Deploy new version to the idle environment (green)
- Run smoke tests against green (not yet receiving live traffic)
- Switch the load balancer to point at green
- Monitor error rates and latency
- If healthy: celebrate; eventually de-provision or recycle blue
- If problem: switch load balancer back to blue (< 30 seconds)
AWS implementation
With Elastic Load Balancer target group swap:
# Get current live and idle target group ARNs
LIVE_TG_ARN=$(aws elbv2 describe-target-groups \
--names "api-blue" --query "TargetGroups[0].TargetGroupArn" --output text)
IDLE_TG_ARN=$(aws elbv2 describe-target-groups \
--names "api-green" --query "TargetGroups[0].TargetGroupArn" --output text)
# Deploy new version to idle target group
# ... deploy containers to green ECS service ...
# Run smoke tests against green
./scripts/smoke-test.sh --target $GREEN_URL
# Perform the swap — atomic traffic switch
aws elbv2 modify-listener \
--listener-arn $LISTENER_ARN \
--default-actions Type=forward,TargetGroupArn=$IDLE_TG_ARN
With AWS CodeDeploy (ECS):
# appspec.yaml
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: !Ref NewTaskDefinitionArn
LoadBalancerInfo:
ContainerName: api
ContainerPort: 8080
PlatformVersion: LATEST
Hooks:
- BeforeAllowTraffic: ValidateServiceFunction # Lambda smoke test
- AfterAllowTraffic: MonitorProductionTraffic # 5-minute canary window
Kubernetes with two Deployments
# Service always points to the 'live' selector
apiVersion: v1
kind: Service
metadata:
name: api
spec:
selector:
app: api
slot: green # <-- change to 'blue' for rollback
ports:
- port: 80
targetPort: 8080
---
# Deploy green in parallel with blue
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-green
spec:
replicas: 3
selector:
matchLabels:
app: api
slot: green
template:
metadata:
labels:
app: api
slot: green
spec:
containers:
- name: api
image: myorg/api:v1.5.0
Rollback: change the Service selector from green to blue — no re-deployment required.
Database migrations with blue/green
Database schema changes are the hardest part of blue/green deployments. Both versions must be able to run against the same database simultaneously:
- Expand: add new columns/tables in migration; old code ignores them; new code uses them
- Switch traffic: old version still working (uses old schema); new version uses new schema
- Contract: remove old columns/tables in a subsequent migration only after the old version is decommissioned
Migrations that rename columns, change types, or remove columns must be split into multiple releases.
When blue/green is overkill
| Scenario | Better approach |
|---|---|
| Gradual traffic shifting to validate the new version | Canary Releases |
| Simple stateless services with fast health checks | Rolling deployment |
| Services with negligible traffic/no SLA | Direct in-place deployment |
Review checklist
- Both environments are identically provisioned (same instance types, same config)
- Smoke test suite runs against the idle environment before switching
- Database migrations are backwards-compatible (expand/contract pattern)
- Traffic switch is automated (not a manual DNS change)
- Rollback procedure is documented, tested, and takes less than 5 minutes
- Monitoring dashboards are reviewed for 5–15 minutes after each switch