S·19SYSTEM DESIGN2026.03.224 MIN READ

Canary vs Blue-Green vs Rolling: Comparing Deployment Strategies

카나리 배포 vs 블루-그린 vs 롤링: 배포 전략 비교

There are several ways to ship a new version without downtime. This post covers how rolling updates, blue-green, and canary deployments work, their trade-offs, and basic Kubernetes implementations.

codemapo

INTERDISCIPLINARY DEV · SEOUL

Canary vs Blue-Green vs Rolling: Comparing Deployment Strategies

Deployments are nerve-wracking. The moment you push code, something can go wrong. So how you deploy matters just as much as what you deploy.

There are three major zero-downtime deployment strategies: rolling updates, blue-green, and canary. Each fits different situations with different trade-offs. Let's get clear on which to use when.

Why Zero Downtime Matters

It used to be normal to post a maintenance notice and deploy at 3am. Not anymore.

Global services have users in every timezone
SLA 99.9% = only 8.7 hours of downtime per year
Teams doing CI/CD can't afford scheduled maintenance windows

Knowing how to deploy without downtime is now a baseline skill, not a bonus.

1. Rolling Update

How It Works

Replace instances one at a time (or in batches). Bring up a new version pod, then take down an old version pod.

Before:  [v1] [v1] [v1] [v1]
Step 1:  [v2] [v1] [v1] [v1]
Step 2:  [v2] [v2] [v1] [v1]
Step 3:  [v2] [v2] [v2] [v1]
After:   [v2] [v2] [v2] [v2]

During the roll, both v1 and v2 serve traffic simultaneously. This means the two versions must be backward compatible with each other. DB schema changes or breaking API changes will cause issues.

Basic Kubernetes Config

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # how many extra pods can exist during rollout
      maxUnavailable: 0  # how many pods can be down (0 = always keep 4 running)
  selector:
    matchLabels:
      app: api-service
  template:
    metadata:
      labels:
        app: api-service
    spec:
      containers:
        - name: api-service
          image: my-registry/api-service:v2
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 5

With maxUnavailable: 0, at least 4 pods always run. The readinessProbe ensures traffic only flows to pods that are ready.

Rolling Back

# Roll back to previous version immediately
kubectl rollout undo deployment/api-service

# Roll back to a specific revision
kubectl rollout undo deployment/api-service --to-revision=3

# Check rollout status
kubectl rollout status deployment/api-service

Trade-offs

Pros

No extra infrastructure needed (cost-efficient)
Native Kubernetes support, simple to configure
Gradual rollout surfaces problems early

Cons

v1 and v2 coexist during the rollout — backward compat is required
Rollback isn't instant (it re-rolls)
Complex DB migrations are tricky to pair with this strategy

2. Blue-Green Deployment

How It Works

Maintain two identical production environments. "Blue" is currently live; "Green" is the new version. When the new version is ready, switch all traffic at once.

Phase 1 (before switch):
  Traffic → [Blue: v1] [Blue: v1] [Blue: v1] [Blue: v1]
            [Green: v2] [Green: v2] [Green: v2] [Green: v2] (standby)

Phase 2 (switch):
  Traffic → [Green: v2] [Green: v2] [Green: v2] [Green: v2]
            [Blue: v1] [Blue: v1] [Blue: v1] [Blue: v1] (standby, for rollback)

Phase 3 (after stabilization):
  Decommission Blue, or repurpose it as the next deployment's "Blue"

Kubernetes Implementation

# blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service-blue
spec:
  replicas: 4
  selector:
    matchLabels:
      app: api-service
      version: blue
  template:
    metadata:
      labels:
        app: api-service
        version: blue
    spec:
      containers:
        - name: api-service
          image: my-registry/api-service:v1
---
# green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service-green
spec:
  replicas: 4
  selector:
    matchLabels:
      app: api-service
      version: green
  template:
    metadata:
      labels:
        app: api-service
        version: green
    spec:
      containers:
        - name: api-service
          image: my-registry/api-service:v2
---
# service.yaml — traffic switch happens here
apiVersion: v1
kind: Service
metadata:
  name: api-service
spec:
  selector:
    app: api-service
    version: blue  # ← change this label to switch traffic
  ports:
    - port: 80
      targetPort: 3000

Switching traffic means changing just one label in the Service selector:

# Switch to green
kubectl patch service api-service \
  -p '{"spec":{"selector":{"version":"green"}}}'

# Something wrong? Instant rollback to blue (1-2 seconds)
kubectl patch service api-service \
  -p '{"spec":{"selector":{"version":"blue"}}}'

Trade-offs

Pros

Rollback is instant (just a label switch)
No two versions serving traffic simultaneously — fewer compat concerns
You can thoroughly test Green before switching

Cons

Infrastructure cost doubles during the switchover period
DB schema changes are still complex (both envs share the same DB)
Tricky with stateful services

3. Canary Deployment

How It Works

Named after the canary in the coal mine — miners sent canaries ahead to detect toxic gas. You expose the new version to a small percentage of traffic first, validate it's safe, then gradually increase.

Traffic split (gradually increasing):
  90% → [v1] [v1] [v1]
  10% → [v2]          ← canary

  After validation:
  70% → [v1] [v1] [v1]
  30% → [v2] [v2]

  Full rollout:
  0%  → (v1 removed)
  100% → [v2] [v2] [v2] [v2]

Kubernetes + Nginx Ingress Implementation

# canary-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-service-canary
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"  # 10% of traffic
spec:
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: api-service-canary
                port:
                  number: 80

# stable-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-service-stable
spec:
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: api-service-stable
                port:
                  number: 80

Adjusting the canary weight is just an annotation update:

# Increase canary to 30%
kubectl annotate ingress api-service-canary \
  nginx.ingress.kubernetes.io/canary-weight="30" --overwrite

# If clean, bump to 50%
kubectl annotate ingress api-service-canary \
  nginx.ingress.kubernetes.io/canary-weight="50" --overwrite

# After full rollout, update stable and remove canary ingress
kubectl set image deployment/api-service-stable api-service=my-registry/api-service:v2
kubectl delete ingress api-service-canary

Header-Based Canary

Send only specific users (internal staff, beta testers) to the new version:

metadata:
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
    nginx.ingress.kubernetes.io/canary-by-header-value: "true"

Only requests with the X-Canary: true header get routed to the canary.

What to Monitor During a Canary

const CANARY_METRICS = {
  errorRate: {
    threshold: 0.01,  // error rate below 1%
    query: 'rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])'
  },
  p99Latency: {
    threshold: 500,   // P99 latency below 500ms
    query: 'histogram_quantile(0.99, rate(http_request_duration_ms_bucket[5m]))'
  },
  successRate: {
    threshold: 0.99,  // success rate above 99%
    query: 'rate(http_requests_total{status=~"2.."}[5m]) / rate(http_requests_total[5m])'
  }
};

Trade-offs

Pros

Tests with real production traffic (staging data doesn't capture everything)
Problems are contained to a small user percentage
Enables metric-based automatic rollback

Cons

v1 and v2 serve simultaneously — backward compat still required
Requires traffic-splitting infrastructure (Ingress, service mesh)
Higher setup complexity

Comparison Table

Factor	Rolling Update	Blue-Green	Canary
Downtime	None	None	None
Rollback Speed	Slow (re-roll)	Very fast (seconds)	Fast
Cost	Low	High (2x infra)	Medium
Complexity	Low	Medium	High
Blast Radius	Gradual	Full (on switch)	Limited (10–30%)
Version Coexistence	Yes	No	Yes
DB Migration	Tricky	Tricky	Tricky
Best Fit	Small/medium teams	Medium/large teams	Large, high-availability

Choosing the Right Strategy

When Rolling Update Fits

Startups or small services where infra cost matters
APIs with well-maintained backward compatibility
Teams new to Kubernetes who want to keep things simple

When Blue-Green Fits

When instant rollback is a hard business requirement
When you want to thoroughly test before exposing to users
When managing cross-service compatibility is complex

When Canary Fits

High-traffic services where you need real-user validation
When combining with A/B testing for feature validation
When you have SRE capacity to build metric-gated automation pipelines

In practice, many teams use a mix: rolling for routine deploys, blue-green for major releases, canary for large feature launches.

Pairing with DB Migrations

Regardless of strategy, schema changes need to be managed separately. The expand-contract pattern:

-- Step 1: Add new column (backward-compatible, before deploy)
ALTER TABLE users ADD COLUMN display_name VARCHAR(100);

-- Step 2: Deploy new code (starts writing display_name)
--         Old code still running, so column must be nullable

-- Step 3: Backfill data
UPDATE users SET display_name = username WHERE display_name IS NULL;

-- Step 4: Add NOT NULL constraint (after old version is fully gone)
ALTER TABLE users ALTER COLUMN display_name SET NOT NULL;

-- Step 5: Drop old column (a few deploys later)
ALTER TABLE users DROP COLUMN username;

This staged approach works safely with any deployment strategy.

Automating with Argo Rollouts

Argo Rollouts is a Kubernetes controller that makes canary and blue-green far easier to manage:

# rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: api-service
spec:
  replicas: 4
  strategy:
    canary:
      steps:
        - setWeight: 10    # 10% canary
        - pause: {}        # wait for manual approval
        - setWeight: 30    # bump to 30%
        - pause:
            duration: 10m  # auto-wait 10 minutes
        - setWeight: 60
        - pause:
            duration: 10m
        - setWeight: 100   # full rollout
      analysis:
        templates:
          - templateName: success-rate
        startingStep: 2
        args:
          - name: service-name
            value: api-service-canary
  template:
    # ... Pod spec

If error rate exceeds the threshold, Argo Rollouts automatically rolls back. This is "Progressive Delivery" in practice.

Wrap-Up

Choosing a deployment strategy comes down to balancing three trade-offs:

Cost vs rollback speed (blue-green's strength)
Simplicity vs safety (rolling vs canary)
Speed vs blast radius control (rolling vs canary)

The right fit depends on team size, service criticality, and deploy frequency. The goal is to make deployment a routine process, not a stressful event. The right strategy makes that journey a lot smoother.

#Deployment#Canary#Blue-Green#Rolling Update#DevOps

← Back to List

S·19SYSTEM DESIGN2026.03.224 MIN READ

Canary vs Blue-Green vs Rolling: Comparing Deployment Strategies

카나리 배포 vs 블루-그린 vs 롤링: 배포 전략 비교

There are several ways to ship a new version without downtime. This post covers how rolling updates, blue-green, and canary deployments work, their trade-offs, and basic Kubernetes implementations.

codemapo

INTERDISCIPLINARY DEV · SEOUL

Canary vs Blue-Green vs Rolling: Comparing Deployment Strategies

Deployments are nerve-wracking. The moment you push code, something can go wrong. So how you deploy matters just as much as what you deploy.

There are three major zero-downtime deployment strategies: rolling updates, blue-green, and canary. Each fits different situations with different trade-offs. Let's get clear on which to use when.

Why Zero Downtime Matters

It used to be normal to post a maintenance notice and deploy at 3am. Not anymore.

Global services have users in every timezone
SLA 99.9% = only 8.7 hours of downtime per year
Teams doing CI/CD can't afford scheduled maintenance windows

Knowing how to deploy without downtime is now a baseline skill, not a bonus.

1. Rolling Update

How It Works

Replace instances one at a time (or in batches). Bring up a new version pod, then take down an old version pod.

Before:  [v1] [v1] [v1] [v1]
Step 1:  [v2] [v1] [v1] [v1]
Step 2:  [v2] [v2] [v1] [v1]
Step 3:  [v2] [v2] [v2] [v1]
After:   [v2] [v2] [v2] [v2]

During the roll, both v1 and v2 serve traffic simultaneously. This means the two versions must be backward compatible with each other. DB schema changes or breaking API changes will cause issues.

Basic Kubernetes Config

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # how many extra pods can exist during rollout
      maxUnavailable: 0  # how many pods can be down (0 = always keep 4 running)
  selector:
    matchLabels:
      app: api-service
  template:
    metadata:
      labels:
        app: api-service
    spec:
      containers:
        - name: api-service
          image: my-registry/api-service:v2
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 5

With maxUnavailable: 0, at least 4 pods always run. The readinessProbe ensures traffic only flows to pods that are ready.

Rolling Back

# Roll back to previous version immediately
kubectl rollout undo deployment/api-service

# Roll back to a specific revision
kubectl rollout undo deployment/api-service --to-revision=3

# Check rollout status
kubectl rollout status deployment/api-service

Trade-offs

Pros

No extra infrastructure needed (cost-efficient)
Native Kubernetes support, simple to configure
Gradual rollout surfaces problems early

Cons

v1 and v2 coexist during the rollout — backward compat is required
Rollback isn't instant (it re-rolls)
Complex DB migrations are tricky to pair with this strategy

2. Blue-Green Deployment

How It Works

Maintain two identical production environments. "Blue" is currently live; "Green" is the new version. When the new version is ready, switch all traffic at once.

Phase 1 (before switch):
  Traffic → [Blue: v1] [Blue: v1] [Blue: v1] [Blue: v1]
            [Green: v2] [Green: v2] [Green: v2] [Green: v2] (standby)

Phase 2 (switch):
  Traffic → [Green: v2] [Green: v2] [Green: v2] [Green: v2]
            [Blue: v1] [Blue: v1] [Blue: v1] [Blue: v1] (standby, for rollback)

Phase 3 (after stabilization):
  Decommission Blue, or repurpose it as the next deployment's "Blue"

Kubernetes Implementation

# blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service-blue
spec:
  replicas: 4
  selector:
    matchLabels:
      app: api-service
      version: blue
  template:
    metadata:
      labels:
        app: api-service
        version: blue
    spec:
      containers:
        - name: api-service
          image: my-registry/api-service:v1
---
# green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service-green
spec:
  replicas: 4
  selector:
    matchLabels:
      app: api-service
      version: green
  template:
    metadata:
      labels:
        app: api-service
        version: green
    spec:
      containers:
        - name: api-service
          image: my-registry/api-service:v2
---
# service.yaml — traffic switch happens here
apiVersion: v1
kind: Service
metadata:
  name: api-service
spec:
  selector:
    app: api-service
    version: blue  # ← change this label to switch traffic
  ports:
    - port: 80
      targetPort: 3000

Switching traffic means changing just one label in the Service selector:

# Switch to green
kubectl patch service api-service \
  -p '{"spec":{"selector":{"version":"green"}}}'

# Something wrong? Instant rollback to blue (1-2 seconds)
kubectl patch service api-service \
  -p '{"spec":{"selector":{"version":"blue"}}}'

Trade-offs

Pros

Rollback is instant (just a label switch)
No two versions serving traffic simultaneously — fewer compat concerns
You can thoroughly test Green before switching

Cons

Infrastructure cost doubles during the switchover period
DB schema changes are still complex (both envs share the same DB)
Tricky with stateful services

3. Canary Deployment

How It Works

Traffic split (gradually increasing):
  90% → [v1] [v1] [v1]
  10% → [v2]          ← canary

  After validation:
  70% → [v1] [v1] [v1]
  30% → [v2] [v2]

  Full rollout:
  0%  → (v1 removed)
  100% → [v2] [v2] [v2] [v2]

Kubernetes + Nginx Ingress Implementation

# canary-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-service-canary
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"  # 10% of traffic
spec:
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: api-service-canary
                port:
                  number: 80

# stable-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-service-stable
spec:
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: api-service-stable
                port:
                  number: 80

Adjusting the canary weight is just an annotation update:

# Increase canary to 30%
kubectl annotate ingress api-service-canary \
  nginx.ingress.kubernetes.io/canary-weight="30" --overwrite

# If clean, bump to 50%
kubectl annotate ingress api-service-canary \
  nginx.ingress.kubernetes.io/canary-weight="50" --overwrite

# After full rollout, update stable and remove canary ingress
kubectl set image deployment/api-service-stable api-service=my-registry/api-service:v2
kubectl delete ingress api-service-canary

Header-Based Canary

Send only specific users (internal staff, beta testers) to the new version:

metadata:
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
    nginx.ingress.kubernetes.io/canary-by-header-value: "true"

Only requests with the X-Canary: true header get routed to the canary.

What to Monitor During a Canary

const CANARY_METRICS = {
  errorRate: {
    threshold: 0.01,  // error rate below 1%
    query: 'rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])'
  },
  p99Latency: {
    threshold: 500,   // P99 latency below 500ms
    query: 'histogram_quantile(0.99, rate(http_request_duration_ms_bucket[5m]))'
  },
  successRate: {
    threshold: 0.99,  // success rate above 99%
    query: 'rate(http_requests_total{status=~"2.."}[5m]) / rate(http_requests_total[5m])'
  }
};

Trade-offs

Pros

Tests with real production traffic (staging data doesn't capture everything)
Problems are contained to a small user percentage
Enables metric-based automatic rollback

Cons

v1 and v2 serve simultaneously — backward compat still required
Requires traffic-splitting infrastructure (Ingress, service mesh)
Higher setup complexity

Comparison Table

Factor	Rolling Update	Blue-Green	Canary
Downtime	None	None	None
Rollback Speed	Slow (re-roll)	Very fast (seconds)	Fast
Cost	Low	High (2x infra)	Medium
Complexity	Low	Medium	High
Blast Radius	Gradual	Full (on switch)	Limited (10–30%)
Version Coexistence	Yes	No	Yes
DB Migration	Tricky	Tricky	Tricky
Best Fit	Small/medium teams	Medium/large teams	Large, high-availability

Choosing the Right Strategy

When Rolling Update Fits

Startups or small services where infra cost matters
APIs with well-maintained backward compatibility
Teams new to Kubernetes who want to keep things simple

When Blue-Green Fits

When instant rollback is a hard business requirement
When you want to thoroughly test before exposing to users
When managing cross-service compatibility is complex

When Canary Fits

High-traffic services where you need real-user validation
When combining with A/B testing for feature validation
When you have SRE capacity to build metric-gated automation pipelines

In practice, many teams use a mix: rolling for routine deploys, blue-green for major releases, canary for large feature launches.

Pairing with DB Migrations

Regardless of strategy, schema changes need to be managed separately. The expand-contract pattern:

-- Step 1: Add new column (backward-compatible, before deploy)
ALTER TABLE users ADD COLUMN display_name VARCHAR(100);

-- Step 2: Deploy new code (starts writing display_name)
--         Old code still running, so column must be nullable

-- Step 3: Backfill data
UPDATE users SET display_name = username WHERE display_name IS NULL;

-- Step 4: Add NOT NULL constraint (after old version is fully gone)
ALTER TABLE users ALTER COLUMN display_name SET NOT NULL;

-- Step 5: Drop old column (a few deploys later)
ALTER TABLE users DROP COLUMN username;

This staged approach works safely with any deployment strategy.

Automating with Argo Rollouts

Argo Rollouts is a Kubernetes controller that makes canary and blue-green far easier to manage:

# rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: api-service
spec:
  replicas: 4
  strategy:
    canary:
      steps:
        - setWeight: 10    # 10% canary
        - pause: {}        # wait for manual approval
        - setWeight: 30    # bump to 30%
        - pause:
            duration: 10m  # auto-wait 10 minutes
        - setWeight: 60
        - pause:
            duration: 10m
        - setWeight: 100   # full rollout
      analysis:
        templates:
          - templateName: success-rate
        startingStep: 2
        args:
          - name: service-name
            value: api-service-canary
  template:
    # ... Pod spec

If error rate exceeds the threshold, Argo Rollouts automatically rolls back. This is "Progressive Delivery" in practice.

Wrap-Up

Choosing a deployment strategy comes down to balancing three trade-offs:

Cost vs rollback speed (blue-green's strength)
Simplicity vs safety (rolling vs canary)
Speed vs blast radius control (rolling vs canary)

#Deployment#Canary#Blue-Green#Rolling Update#DevOps

← Back to List