Canary vs Blue-Green vs Rolling: Comparing Deployment Strategies
Deployments are nerve-wracking. The moment you push code, something can go wrong. So how you deploy matters just as much as what you deploy.
There are three major zero-downtime deployment strategies: rolling updates, blue-green, and canary. Each fits different situations with different trade-offs. Let's get clear on which to use when.
Why Zero Downtime Matters
It used to be normal to post a maintenance notice and deploy at 3am. Not anymore.
- Global services have users in every timezone
- SLA 99.9% = only 8.7 hours of downtime per year
- Teams doing CI/CD can't afford scheduled maintenance windows
Knowing how to deploy without downtime is now a baseline skill, not a bonus.
1. Rolling Update
How It Works
Replace instances one at a time (or in batches). Bring up a new version pod, then take down an old version pod.
Before: [v1] [v1] [v1] [v1]
Step 1: [v2] [v1] [v1] [v1]
Step 2: [v2] [v2] [v1] [v1]
Step 3: [v2] [v2] [v2] [v1]
After: [v2] [v2] [v2] [v2]
During the roll, both v1 and v2 serve traffic simultaneously. This means the two versions must be backward compatible with each other. DB schema changes or breaking API changes will cause issues.
Basic Kubernetes Config
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # how many extra pods can exist during rollout
maxUnavailable: 0 # how many pods can be down (0 = always keep 4 running)
selector:
matchLabels:
app: api-service
template:
metadata:
labels:
app: api-service
spec:
containers:
- name: api-service
image: my-registry/api-service:v2
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
With maxUnavailable: 0, at least 4 pods always run. The readinessProbe ensures traffic only flows to pods that are ready.
Rolling Back
# Roll back to previous version immediately
kubectl rollout undo deployment/api-service
# Roll back to a specific revision
kubectl rollout undo deployment/api-service --to-revision=3
# Check rollout status
kubectl rollout status deployment/api-service
Trade-offs
Pros
- No extra infrastructure needed (cost-efficient)
- Native Kubernetes support, simple to configure
- Gradual rollout surfaces problems early
Cons
- v1 and v2 coexist during the rollout — backward compat is required
- Rollback isn't instant (it re-rolls)
- Complex DB migrations are tricky to pair with this strategy
2. Blue-Green Deployment
How It Works
Maintain two identical production environments. "Blue" is currently live; "Green" is the new version. When the new version is ready, switch all traffic at once.
Phase 1 (before switch):
Traffic → [Blue: v1] [Blue: v1] [Blue: v1] [Blue: v1]
[Green: v2] [Green: v2] [Green: v2] [Green: v2] (standby)
Phase 2 (switch):
Traffic → [Green: v2] [Green: v2] [Green: v2] [Green: v2]
[Blue: v1] [Blue: v1] [Blue: v1] [Blue: v1] (standby, for rollback)
Phase 3 (after stabilization):
Decommission Blue, or repurpose it as the next deployment's "Blue"
Kubernetes Implementation
# blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service-blue
spec:
replicas: 4
selector:
matchLabels:
app: api-service
version: blue
template:
metadata:
labels:
app: api-service
version: blue
spec:
containers:
- name: api-service
image: my-registry/api-service:v1
---
# green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service-green
spec:
replicas: 4
selector:
matchLabels:
app: api-service
version: green
template:
metadata:
labels:
app: api-service
version: green
spec:
containers:
- name: api-service
image: my-registry/api-service:v2
---
# service.yaml — traffic switch happens here
apiVersion: v1
kind: Service
metadata:
name: api-service
spec:
selector:
app: api-service
version: blue # ← change this label to switch traffic
ports:
- port: 80
targetPort: 3000
Switching traffic means changing just one label in the Service selector:
# Switch to green
kubectl patch service api-service \
-p '{"spec":{"selector":{"version":"green"}}}'
# Something wrong? Instant rollback to blue (1-2 seconds)
kubectl patch service api-service \
-p '{"spec":{"selector":{"version":"blue"}}}'
Trade-offs
Pros
- Rollback is instant (just a label switch)
- No two versions serving traffic simultaneously — fewer compat concerns
- You can thoroughly test Green before switching
Cons
- Infrastructure cost doubles during the switchover period
- DB schema changes are still complex (both envs share the same DB)
- Tricky with stateful services
3. Canary Deployment
How It Works
Named after the canary in the coal mine — miners sent canaries ahead to detect toxic gas. You expose the new version to a small percentage of traffic first, validate it's safe, then gradually increase.
Traffic split (gradually increasing):
90% → [v1] [v1] [v1]
10% → [v2] ← canary
After validation:
70% → [v1] [v1] [v1]
30% → [v2] [v2]
Full rollout:
0% → (v1 removed)
100% → [v2] [v2] [v2] [v2]
Kubernetes + Nginx Ingress Implementation
# canary-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-service-canary
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "10" # 10% of traffic
spec:
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-service-canary
port:
number: 80
# stable-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-service-stable
spec:
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-service-stable
port:
number: 80
Adjusting the canary weight is just an annotation update:
# Increase canary to 30%
kubectl annotate ingress api-service-canary \
nginx.ingress.kubernetes.io/canary-weight="30" --overwrite
# If clean, bump to 50%
kubectl annotate ingress api-service-canary \
nginx.ingress.kubernetes.io/canary-weight="50" --overwrite
# After full rollout, update stable and remove canary ingress
kubectl set image deployment/api-service-stable api-service=my-registry/api-service:v2
kubectl delete ingress api-service-canary
Header-Based Canary
Send only specific users (internal staff, beta testers) to the new version:
metadata:
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
nginx.ingress.kubernetes.io/canary-by-header-value: "true"
Only requests with the X-Canary: true header get routed to the canary.
What to Monitor During a Canary
const CANARY_METRICS = {
errorRate: {
threshold: 0.01, // error rate below 1%
query: 'rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])'
},
p99Latency: {
threshold: 500, // P99 latency below 500ms
query: 'histogram_quantile(0.99, rate(http_request_duration_ms_bucket[5m]))'
},
successRate: {
threshold: 0.99, // success rate above 99%
query: 'rate(http_requests_total{status=~"2.."}[5m]) / rate(http_requests_total[5m])'
}
};
Trade-offs
Pros
- Tests with real production traffic (staging data doesn't capture everything)
- Problems are contained to a small user percentage
- Enables metric-based automatic rollback
Cons
- v1 and v2 serve simultaneously — backward compat still required
- Requires traffic-splitting infrastructure (Ingress, service mesh)
- Higher setup complexity
Comparison Table
| Factor | Rolling Update | Blue-Green | Canary |
|---|
| Downtime | None | None | None |
| Rollback Speed | Slow (re-roll) | Very fast (seconds) | Fast |
| Cost | Low | High (2x infra) | Medium |
| Complexity | Low | Medium | High |
| Blast Radius | Gradual | Full (on switch) | Limited (10–30%) |
| Version Coexistence | Yes | No | Yes |
| DB Migration | Tricky | Tricky | Tricky |
| Best Fit | Small/medium teams | Medium/large teams | Large, high-availability |
Choosing the Right Strategy
When Rolling Update Fits
- Startups or small services where infra cost matters
- APIs with well-maintained backward compatibility
- Teams new to Kubernetes who want to keep things simple
When Blue-Green Fits
- When instant rollback is a hard business requirement
- When you want to thoroughly test before exposing to users
- When managing cross-service compatibility is complex
When Canary Fits
- High-traffic services where you need real-user validation
- When combining with A/B testing for feature validation
- When you have SRE capacity to build metric-gated automation pipelines
In practice, many teams use a mix: rolling for routine deploys, blue-green for major releases, canary for large feature launches.
Pairing with DB Migrations
Regardless of strategy, schema changes need to be managed separately. The expand-contract pattern:
-- Step 1: Add new column (backward-compatible, before deploy)
ALTER TABLE users ADD COLUMN display_name VARCHAR(100);
-- Step 2: Deploy new code (starts writing display_name)
-- Old code still running, so column must be nullable
-- Step 3: Backfill data
UPDATE users SET display_name = username WHERE display_name IS NULL;
-- Step 4: Add NOT NULL constraint (after old version is fully gone)
ALTER TABLE users ALTER COLUMN display_name SET NOT NULL;
-- Step 5: Drop old column (a few deploys later)
ALTER TABLE users DROP COLUMN username;
This staged approach works safely with any deployment strategy.
Automating with Argo Rollouts
Argo Rollouts is a Kubernetes controller that makes canary and blue-green far easier to manage:
# rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: api-service
spec:
replicas: 4
strategy:
canary:
steps:
- setWeight: 10 # 10% canary
- pause: {} # wait for manual approval
- setWeight: 30 # bump to 30%
- pause:
duration: 10m # auto-wait 10 minutes
- setWeight: 60
- pause:
duration: 10m
- setWeight: 100 # full rollout
analysis:
templates:
- templateName: success-rate
startingStep: 2
args:
- name: service-name
value: api-service-canary
template:
# ... Pod spec
If error rate exceeds the threshold, Argo Rollouts automatically rolls back. This is "Progressive Delivery" in practice.
Wrap-Up
Choosing a deployment strategy comes down to balancing three trade-offs:
- Cost vs rollback speed (blue-green's strength)
- Simplicity vs safety (rolling vs canary)
- Speed vs blast radius control (rolling vs canary)
The right fit depends on team size, service criticality, and deploy frequency. The goal is to make deployment a routine process, not a stressful event. The right strategy makes that journey a lot smoother.