Prologue: The "Friday 5 PM, No Deploy" Rule
In the days of manual deployment, there was an unwritten law:
"Never deploy on Friday."
The reason was simple: Deploy → bug → weekend overtime.
I've been there. Pushed a "small bug fix" at 4:50 PM on a Friday. Changed just one line... and the entire service went down after deployment.
// Before
const result = data.filter(item => item.status === "active")
// After - "safely" added optional chaining
const result = data?.filter(item => item.status === "active")
// Problem: Elsewhere in the code, when data was undefined,
// there was no error handling → cascading errors → total service outage
"But tests passed locally!" Turns out I had only tested on my machine, never in the actual production environment. Spending half the night fixing the server is a memory I'll never forget.
Why I Studied CI/CD: This Experience as Motivation
After that incident, I started reading CI/CD documentation properly.
CI/CD Pipeline: Push code → robots automatically test and deploy.
"So humans don't have to do it step by step?" Exactly. Robots handle it. The very structure designed to prevent Friday disasters.
That's when I realized: The problem wasn't just an individual mistake—it was the lack of automation in the entire process.
My First Solo Deployment Experience
# Step 1: Build locally
$ npm run build
# Step 2: Open FTP client (Filezilla)
[Drag and drop 200 files]
[Progress: 32%... connection dropped]
[Retry upload]
# Step 3: SSH to server
$ ssh user@production-server.com
$ cd /var/www/app
$ pm2 restart app
# Step 4: Check browser
"Why isn't the page loading?"
# Step 5: Check logs
$ pm2 logs
"Error: Cannot find module..."
# Oh right, updated package.json but forgot npm install
# Step 6: Install again
$ npm install
$ pm2 restart app
# Step 7: Check again
"Still broken..."
# This time forgot to upload .env file
# Step 8: Upload .env
[Back to FTP to upload .env]
# Step 9: 30 minutes later, finally success
Deploy 10 times a day like this and you'll lose your mind. And this was my workflow in 2024. Ridiculous, right?
What Confused Me Initially
"CI (Continuous Integration)" and "CD (Continuous Delivery/Deployment)"
- "Continuous" means ongoing, but ongoing what?
- "Integration" means merge, but merge what exactly? Code? Servers?
- What's the difference between Delivery and Deployment? Isn't deployment just deployment?
The terminology was too abstract. And worse, no one around me could explain it clearly. Everyone just said "Oh that? It's automation, that's all" and moved on.
And while "auto-running tests" made some sense, I didn't understand why it mattered. "I don't even write tests, so why bother?"
The Aha Moment: The "Factory Assembly Line" Metaphor
The car factory analogy is what finally made CI/CD click for me:
"In the old days, one craftsman built a car from start to finish. (Manual deployment)
Modern factories use conveyor belts:
- Robot #1: Welds the frame (Build)
- Robot #2: Applies paint (Lint & Format)
- Robot #3: Quality inspection (Test)
- Robot #4: Loads onto shipping truck (Deploy)
Humans only draw blueprints (code). Robots do everything else."
That was it. CI/CD was essentially "deployment factory automation." After this metaphor, everything clicked. Manual FTP uploads were like hand-assembling cars one at a time.
1. CI: Automated Code Integration
Definition in My Own Words
When multiple developers write code simultaneously, conflicts can occur. CI is a system that frequently merges code (Continuous Integration) and automatically tests it each time to catch problems early.
Developer A → Git Push → CI Server
↓
1. Pull code
2. Install dependencies
3. Build
4. Run linter
5. Run unit tests
6. Run integration tests
↓
All pass → ✅ Allow merge
Any fail → ❌ Block merge + Notify developer
Why Is It Needed? My "It Works on My Machine" Hell
Real Case 1: OS Path Differences
// Developer A on Mac
const path = require('path')
const filePath = 'data/users.json' // Works fine
// Developer B on Windows
const filePath = 'data\\users.json' // Works fine
// Production server (Linux)
// Both versions pushed → path conflict → service down
Works locally, breaks in production. This was exactly the problem I faced that Friday night.
CI's Solution: Test in Production-Identical Environment
# .github/workflows/ci.yml
name: CI Pipeline
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest # Same as production environment
strategy:
matrix:
node-version: [16, 18, 20] # Test across multiple versions
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
- name: Install dependencies
run: npm ci # Safer than npm install (uses package-lock.json)
- name: Run linter
run: npm run lint
- name: Type check
run: npm run typecheck
- name: Unit tests
run: npm test -- --coverage
- name: Integration tests
run: npm run test:integration
- name: Build
run: npm run build
The moment you push code:
- Builds in the same environment (ubuntu-latest)
- Automatically runs lint, type checks, unit tests, integration tests
- Must pass build to allow merge
- Fails → Blocks PR merge + Shows red X on GitHub
Catches problems before production deployment. No more "it worked on my machine" excuses.
My Personal CI Experience
Before CI: The Blame Game
Monday morning:
5 developers submit PRs simultaneously
→ Senior manually reviews and merges one by one
→ PR #1 merged: OK
→ PR #2 merged: OK
→ PR #3 merged: OK
→ Attempt build → 💥 Fails
→ "Who broke the build?" hunt begins
→ 30 minutes later: PR #2 and #3 modified the same function
→ Slack message: "@DeveloperC, please fix the build"
→ DeveloperC: "I'm in a meeting right now..."
→ 1 hour later, finally fixed
→ Entire team's development delayed
After CI: Instant Feedback
Monday morning:
5 developers submit PRs simultaneously
→ CI automatically runs for each PR
→ PR #1: ✅ "All checks passed"
→ PR #2: ✅ "All checks passed"
→ PR #3: ❌ "Tests failed: 2 conflicts detected with PR #2"
→ DeveloperC immediately notified via GitHub
→ DeveloperC reviews PR #2 code and fixes immediately
→ Re-pushes after fix
→ ✅ Passes → Merges
→ Entire team continues development uninterrupted
Find problems instantly, fix instantly. That's the core of CI. This was the lightbulb moment for me.
2. CD: Automated Deployment
Delivery vs Deployment - Understanding the Difference
Continuous Delivery (Semi-Automated)
- Automates deployment preparation only
- Actual production release requires a human button click
- "This version is verified; you can deploy whenever business is ready"
- When to use? Finance, healthcare, or when deployment timing is business-critical
Continuous Deployment (Fully Automated)
- Fully automated deployment
- Tests pass → Automatically deploys to production immediately
- No human intervention
- When to use? Startups, SaaS where rapid iteration is critical
I started with Delivery (button click required), then evolved to Deployment (fully automated). Full automation from the start can be scary.
Real Pipeline Example: What I Currently Use
# .github/workflows/deploy.yml
name: Production Deploy
on:
push:
branches: [main] # Only on main branch pushes
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: '18'
cache: 'npm'
- name: Install dependencies
run: npm ci # Full reinstall (safer than npm install)
- name: Run tests
run: npm test
- name: Build production
run: npm run build
env:
NODE_ENV: production
NEXT_PUBLIC_API_URL: ${{ secrets.API_URL }}
DATABASE_URL: ${{ secrets.DATABASE_URL }}
- name: Deploy to Vercel
uses: amondnet/vercel-action@v25
with:
vercel-token: ${{ secrets.VERCEL_TOKEN }}
vercel-project-id: ${{ secrets.VERCEL_PROJECT_ID }}
vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
vercel-args: '--prod'
- name: Wait for deployment
run: sleep 30
- name: Run smoke tests
run: |
npx wait-on https://my-app.vercel.app --timeout 60000
npx cypress run --spec "cypress/e2e/smoke.cy.js"
- name: Notify Slack on success
if: success()
uses: 8398a7/action-slack@v3
with:
status: custom
custom_payload: |
{
text: "✅ Deployment Successful",
attachments: [{
color: 'good',
text: `Version: ${{ github.sha }}\nAuthor: ${{ github.actor }}\nMessage: ${{ github.event.head_commit.message }}`
}]
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}
- name: Notify Slack on failure
if: failure()
uses: 8398a7/action-slack@v3
with:
status: custom
custom_payload: |
{
text: "❌ Deployment Failed",
attachments: [{
color: 'danger',
text: `Check logs: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}`
}]
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}
Pushing to main branch automatically triggers:
- Dependency installation (fast with caching)
- Test execution (stops here if fails)
- Production build (injects environment variables)
- Deploy to Vercel
- Wait for deployed site to be live
- Smoke tests (basic functionality check)
- Slack notification of success/failure
Humans only push code. Everything else is fully automated.
3. Real-World Application: Before vs After
Before: Manual Deployment Hell
# Friday 5:50 PM
$ ssh user@production-server.com
$ cd /var/www/my-app
$ git pull origin main
$ npm install
# [Wait 5 minutes]
$ npm run build
# [Wait 3 minutes]
$ pm2 restart app
# 6:05 PM
> "Wait, server isn't running?"
> Check logs: "Error: Missing environment variable"
> "Oh... forgot to update .env file"
$ vim .env
[Manually type environment variables]
$ pm2 restart app
# 6:25 PM
> "This time it seems to work..."
> "Wait, API responses look weird?"
> "Oh no, didn't run database migrations"
$ npm run migrate
# [Error: Previous migration scripts are tangled]
$ npm run migrate:rollback
$ npm run migrate
# 7:15 PM
> Slack alert: "Production down - 500 errors"
> Team lead: "What's going on?"
# 8:00 PM
> Finally recovered
> Leave at 10 PM
After: CI/CD Pipeline
# Friday 5:55 PM
$ git add .
$ git commit -m "Fix: Critical user authentication bug"
$ git push origin main
# GitHub Actions automatically starts
# [I go get coffee]
# 2 minutes later (automatic progression)
✅ Checkout: 0.5s
✅ Install dependencies: 12s (cached)
✅ Lint: 3s
✅ Type check: 5s
✅ Unit tests: 18s (247/247 passed)
✅ Integration tests: 25s (58/58 passed)
✅ Build: 45s
✅ Deploy to Vercel: 30s
✅ Smoke tests: 15s (12/12 passed)
# 6:02 PM
> Slack notification:
> "✅ v1.2.3 deployed successfully by @yourname"
> "Deployment took 2m 53s"
> "Test coverage: 87.3%"
> "0 errors, 0 warnings"
> "Live at: https://my-app.vercel.app"
# 6:05 PM
> Leave work
Friday deployments are no longer scary. After experiencing this difference firsthand, I can't imagine developing without CI/CD.
4. Tool Comparison: What I've Used
GitHub Actions
Pros:
- Perfect GitHub integration (results show directly on PRs)
- Free for public repos
- Private repos get 2,000 free minutes/month
- Simple YAML syntax
- Thousands of actions in Marketplace (one-line installation)
- No server management required
Cons:
- Paid after free minutes on private repos
- Complex pipelines lead to long YAML files
- Debugging slightly inconvenient (can't run locally easily)
My choice: I use GitHub Actions 99% of the time. If you're already on GitHub, this is the most convenient.
Jenkins
Pros:
- Completely open source (free)
- Massive plugin ecosystem
- Maximum customization freedom
- Can configure complex pipelines
Cons:
- Must host your own server (EC2 costs)
- UI feels like 2010
- Complex initial setup (must learn Groovy)
- Must manually manage security updates
When to use? Enterprise environments, legacy systems, on-premise requirements
GitLab CI/CD
Pros:
- Perfect GitLab integration
- Self-hosting possible
- Docker-based runners
- Better UI/UX than GitHub Actions
Cons:
- Must use GitLab (pointless if using GitHub)
CircleCI
Pros:
- Fastest performance (especially Docker builds)
- Excellent UI/UX
- Well-optimized parallel execution
Cons:
- Expensive (very limited free tier)
- Starts at $30/month for small teams
My recommendation: Small projects use GitHub Actions, large companies use Jenkins, need speed use CircleCI.
5. Latest Trend: GitOps (Declarative Deployment)
"If infrastructure is managed as code, deployment state should also be managed as code."
Traditional Approach (Push-based)
CI tool (Jenkins) pushes deployments to cluster using kubectl apply commands.
Problems:
- Jenkins needs cluster Admin permissions (security vulnerability)
- If someone manually edits with
kubectl edit, creates Git/cluster state mismatch - Rollback is difficult
GitOps Approach (Pull-based) - ArgoCD
ArgoCD runs inside the cluster and continuously monitors the Git repository.
Git Repo (manifest.yaml)
image: myapp:v2
↑
│ (ArgoCD checks every 5 seconds)
│
ArgoCD (inside cluster)
↓
"Wait, Git says v2 but cluster is running v1?"
↓
Automatic sync
↓
Cluster state = Git state
Benefits:
- No external cluster access needed (enhanced security)
- Git = Single Source of Truth
- Even if manually changed, automatically reverts to Git state
- Rollback = Git revert (simple)
6. DevSecOps: Security in the Pipeline
"Security teams shouldn't be gatekeepers before deployment—they should be baked into the pipeline."
Adding Security Stages to CI Pipeline
jobs:
security-scan:
steps:
# 1. SAST (Static Analysis): Code security vulnerabilities
- name: SonarQube Scan
run: sonar-scanner
# 2. SCA (Dependency Check): Library vulnerabilities
- name: Snyk Dependency Check
run: snyk test
# 3. Secret Scanning: Hardcoded passwords
- name: GitLeaks
run: gitleaks detect
# 4. Container Scanning: Docker image vulnerabilities
- name: Trivy Scan
run: trivy image myapp:latest
Real experience: Snyk automatically detected Log4Shell vulnerability and blocked deployment before it reached production. Would've missed it if done manually.
7. Docker CI/CD
Docker-Based Pipeline
name: Docker Build & Deploy
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to ECR
uses: aws-actions/amazon-ecr-login@v1
- name: Build and push
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: |
123456789.dkr.ecr.ap-northeast-2.amazonaws.com/myapp:latest
123456789.dkr.ecr.ap-northeast-2.amazonaws.com/myapp:${{ github.sha }}
cache-from: type=registry,ref=myapp:buildcache
cache-to: type=inline
- name: Update Kubernetes manifest
run: |
sed -i "s|image: myapp:.*|image: myapp:${{ github.sha }}|" k8s/deployment.yaml
git config user.name "GitHub Actions"
git config user.email "actions@github.com"
git add k8s/deployment.yaml
git commit -m "Update image to ${{ github.sha }}"
git push
Key insight: Proper Docker layer caching reduces build time from 10 minutes → 1 minute.
8. Mistakes and Lessons (Things I Actually Experienced)
Mistake 1: Hardcoded Secrets in Code
# ❌ Never do this
- name: Deploy to AWS
run: |
export AWS_ACCESS_KEY_ID="AKIAIOSFODNN7EXAMPLE"
export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
aws s3 sync ./build s3://my-bucket
5 minutes after pushing to GitHub:
- GitHub Security Alert: "AWS credentials exposed"
- Email from AWS: "Your key has been compromised"
- Bots attempted cryptocurrency mining on my account
- Account temporarily suspended
Solution:
# ✅ Use GitHub Secrets
- name: Deploy to AWS
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: aws s3 sync ./build s3://my-bucket
Lesson: Never put secrets in code. Use GitHub Secrets, AWS Secrets Manager, etc.
Mistake 2: Slow Tests
E2E tests took 10 minutes, making every PR wait unbearable.
Solution 1: Parallel Execution (Matrix Strategy)
strategy:
matrix:
browser: [chrome, firefox, safari]
node-version: [16, 18, 20]
# 3 browsers × 3 versions = 9 simultaneous jobs
# 10 minutes → ~2 minutes
Solution 2: Dependency Caching
- name: Cache node_modules
uses: actions/cache@v3
with:
path: |
~/.npm
node_modules
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-
# npm install time: 2m 30s → 8s
Mistake 3: Flaky Tests (Intermittent Failures)
// ❌ Time-dependent test (fails on slow CI servers)
test('debounce function', () => {
fireEvent.click(button);
setTimeout(() => {
expect(apiMock).toHaveBeenCalled();
}, 100); // 100ms might not be enough on CI servers
});
// ✅ Explicit waiting (using waitFor)
test('debounce function', async () => {
fireEvent.click(button);
await waitFor(() => {
expect(apiMock).toHaveBeenCalled();
}, { timeout: 3000 });
});
9. Cost Analysis: Calculating ROI
Before CI/CD
| Item | Monthly Cost | Time Cost |
|---|---|---|
| Manual deploy (2x/day) | $0 | 40 hrs/month |
| Emergency hotfixes | $0 | 10 hrs/month |
| Bug-related downtime | Revenue loss | - |
| Total | $0 | 50 hrs/month |
After CI/CD
| Item | Monthly Cost | Time Cost |
|---|---|---|
| GitHub Actions (Pro) | $21/month | 0 hrs |
| Emergency hotfixes | $0 | 2 hrs/month |
| Downtime | Nearly zero | - |
| Total | $21/month | 2 hrs/month |
Invest $21/month → Save 48 hours. At $50/hr rate → Save $2,400/month ($2,379 net profit)
Summary: What I Learned Through CI/CD
- "Let Robots Do It" - Humans make mistakes, scripts never do
- "Integrate Often, Deploy Often" - Smaller changes = smaller risks
- "Tests Are Mandatory" - CI without tests eventually breaks production
- "Fearless Fridays" - Automation creates confidence
Initially I thought "Writing tests is tedious... isn't it a waste of time?" Now I think "How did I ever develop without CI/CD?"
The bottom line: CI/CD isn't just a tool—it's the fundamental infrastructure of modern development.
Epilogue: Now, Friday 5:55 PM
I now deploy on Friday afternoons.
$ git push origin main
And leave work at 6 PM.
Robots automatically test, build, deploy, verify, and notify me.
This is how we develop in 2025.