Deployment Strategies — Blue-Green, Canary, Rolling, and Feature Flags
Your code passed all tests, the PR is approved, and you're ready to deploy. But how you deploy matters as much as what you deploy. A bad deployment strategy turns a minor bug into a site-wide outage, while a good one lets you roll back in seconds with zero customer impact.
Why Deployment Strategy Matters
The deployment strategy you choose directly affects:
- Risk: How many users are impacted if something goes wrong?
- Speed: How quickly can you roll back?
- Cost: How much extra infrastructure do you need?
- Complexity: How hard is it to operate?
There's no single "best" strategy. The right choice depends on your application, traffic patterns, and risk tolerance.
Rolling Update (Kubernetes Default)
A rolling update gradually replaces old pods with new ones, maintaining availability throughout:
# rolling-update-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-api
spec:
replicas: 6
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 # At most 1 pod down during update
maxSurge: 2 # At most 2 extra pods during update
selector:
matchLabels:
app: payment-api
template:
metadata:
labels:
app: payment-api
spec:
containers:
- name: payment-api
image: payment-api:v2.1.0
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 3
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
Rolling Update Timeline (6 replicas):
Time 0: [v1] [v1] [v1] [v1] [v1] [v1] ← All old
Time 1: [v1] [v1] [v1] [v1] [v1] [v2] [v2] ← 2 new pods surge
Time 2: [v1] [v1] [v1] [v1] [v2] [v2] [v2] ← Old pods terminating
Time 3: [v1] [v1] [v2] [v2] [v2] [v2]
Time 4: [v2] [v2] [v2] [v2] [v2] [v2] ← All new
Pros: Simple, built into Kubernetes, no extra infra. Cons: Both versions serve traffic simultaneously (API compatibility required), slow rollback (re-deploy old version).
Blue-Green Deployment
Blue-green maintains two identical environments. Traffic switches instantly from "blue" (current) to "green" (new):
# blue-green with Kubernetes services
# Step 1: Green deployment is running alongside blue
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-api-green
spec:
replicas: 6
selector:
matchLabels:
app: payment-api
version: green
template:
metadata:
labels:
app: payment-api
version: green
spec:
containers:
- name: payment-api
image: payment-api:v2.1.0
---
# Step 2: Switch traffic by updating the service selector
apiVersion: v1
kind: Service
metadata:
name: payment-api
spec:
selector:
app: payment-api
version: green # ← Change from "blue" to "green"
ports:
- port: 80
targetPort: 8080
Blue-Green Switch:
Before: Users → Load Balancer → [Blue: v1.0] (active)
[Green: v2.0] (idle, tested)
Switch: Users → Load Balancer → [Blue: v1.0] (idle, standby)
[Green: v2.0] (active)
Rollback: Users → Load Balancer → [Blue: v1.0] (active again)
[Green: v2.0] (idle)
Pros: Instant switchover, instant rollback, zero downtime. Cons: Double the infrastructure cost, database migration complexity.
Canary Deployment
A canary deployment routes a small percentage of traffic to the new version before rolling it out to everyone:
Canary Progression:
Stage 1: 5% traffic → v2.0 | 95% traffic → v1.0
Monitor for 10 minutes...
✓ Error rate < 0.1%, latency normal
Stage 2: 25% traffic → v2.0 | 75% traffic → v1.0
Monitor for 15 minutes...
✓ Error rate < 0.1%, latency normal
Stage 3: 50% traffic → v2.0 | 50% traffic → v1.0
Monitor for 15 minutes...
✓ Error rate < 0.1%, latency normal
Stage 4: 100% traffic → v2.0
✓ Deployment complete
If any stage fails the health criteria, traffic automatically rolls back to v1.0.
Canary with Argo Rollouts
Argo Rollouts extends Kubernetes with advanced deployment strategies:
# argo-rollout-canary.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: payment-api
spec:
replicas: 10
strategy:
canary:
canaryService: payment-api-canary
stableService: payment-api-stable
trafficRouting:
istio:
virtualServices:
- name: payment-api-vsvc
routes:
- primary
steps:
# Step 1: 5% traffic to canary
- setWeight: 5
- pause: { duration: 10m }
# Step 2: Run analysis
- analysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: payment-api-canary
# Step 3: 25% traffic
- setWeight: 25
- pause: { duration: 15m }
# Step 4: 50% traffic
- setWeight: 50
- pause: { duration: 15m }
# Step 5: Full rollout
- setWeight: 100
selector:
matchLabels:
app: payment-api
template:
metadata:
labels:
app: payment-api
spec:
containers:
- name: payment-api
image: payment-api:v2.1.0
---
# Automated analysis template
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
args:
- name: service-name
metrics:
- name: success-rate
interval: 60s
successCondition: result[0] >= 0.99
provider:
prometheus:
address: http://prometheus:9090
query: |
sum(rate(http_requests_total{service="{{args.service-name}}",status=~"2.."}[5m]))
/
sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))
Argo Rollouts automates the entire canary process: increment traffic, run Prometheus queries, auto-rollback if metrics degrade.
Feature Flags
Feature flags decouple deployment from release. You deploy code to production but control who sees it:
// Feature flag with Unleash (open-source)
const { initialize, isEnabled } = require('unleash-client');
const unleash = initialize({
url: 'https://unleash.internal/api',
appName: 'payment-service',
customHeaders: { Authorization: process.env.UNLEASH_API_TOKEN },
});
app.post('/checkout', async (req, res) => {
const context = {
userId: req.user.id,
properties: {
region: req.user.region,
plan: req.user.plan,
},
};
if (isEnabled('new-payment-flow', context)) {
// New checkout experience (rolling out to 10% of users)
return newCheckoutHandler(req, res);
}
// Existing checkout experience
return currentCheckoutHandler(req, res);
});
Feature Flag Rollout:
Day 1: Deploy v2.0 with flag OFF (0% see new feature)
Day 2: Enable for internal team (dogfooding)
Day 3: Enable for 5% of users (beta)
Day 7: Enable for 25% of users
Day 14: Enable for 100% of users
Day 21: Remove flag and old code (cleanup!)
↑ Don't forget this step!
Popular feature flag tools:
| Tool | Type | Pricing | Best For |
|---|---|---|---|
| LaunchDarkly | SaaS | $$$ | Enterprise, advanced targeting |
| Unleash | OSS / SaaS | Free / $$ | Self-hosted, full control |
| Flipt | OSS | Free | Simple, GitOps-friendly |
| Flagsmith | OSS / SaaS | Free / $$ | Multi-platform SDKs |
| ConfigCat | SaaS | Free tier | Small teams, simple needs |
Strategy Comparison
| Strategy | Risk | Rollback Speed | Infra Cost | Complexity | Best For |
|---|---|---|---|---|---|
| Rolling Update | Medium | Minutes (re-deploy) | 1x | Low | Standard deployments |
| Blue-Green | Low | Seconds (switch) | 2x | Medium | Critical services |
| Canary | Very Low | Seconds (route back) | 1.1x | High | High-traffic services |
| A/B Testing | Low | Seconds | 1.1x | High | UX experiments |
| Shadow/Dark | None | N/A (no user impact) | 2x | Very High | Major rewrites |
| Feature Flags | Very Low | Instant (toggle) | 1x | Medium | Decoupled releases |
Database Migrations During Deployments
The hardest part of any deployment strategy is the database. When v1 and v2 run simultaneously, they must both work with the same schema:
Safe Migration Pattern (Expand-Contract):
Phase 1 — Expand (backward compatible):
┌──────────────────────────────────────┐
│ ALTER TABLE users │
│ ADD COLUMN email_v2 VARCHAR(255); │
│ │
│ -- Both v1 (uses email) and │
│ -- v2 (uses email_v2) work │
└──────────────────────────────────────┘
Phase 2 — Migrate data:
┌──────────────────────────────────────┐
│ UPDATE users │
│ SET email_v2 = LOWER(email) │
│ WHERE email_v2 IS NULL; │
└──────────────────────────────────────┘
Phase 3 — Contract (after v1 is fully gone):
┌──────────────────────────────────────┐
│ ALTER TABLE users │
│ DROP COLUMN email; │
│ ALTER TABLE users │
│ RENAME COLUMN email_v2 TO email; │
└──────────────────────────────────────┘
UNSAFE migration (breaks blue-green/canary):
ALTER TABLE users RENAME COLUMN email TO email_address;
-- v1 pods immediately crash: column "email" does not exist
SAFE migration:
1. Add new column (v1 and v2 both work)
2. Deploy v2 (writes to both columns)
3. Backfill old data
4. Remove v1 completely
5. Drop old column
Never rename or drop a column while both versions are running. The expand-contract pattern ensures zero-downtime migrations.
Closing Note
The best deployment strategy is the one your team can operate confidently. Start with rolling updates, add canary when your traffic justifies it, and use feature flags to decouple deployments from releases. In the next post, we'll explore Infrastructure Testing — how to verify your Terraform, Ansible, and cloud resources are correct before they hit production.
