CDN: 넷플릭스가 전 세계에서 버퍼링 없이 재생되는 이유 (완전정복)

1. 프롤로그 - "빛의 속도도 느리다"는 사실을 받아들였다

제가 처음 글로벌 웹 서비스를 런칭했을 때의 일입니다. AWS us-east-1 (미국 버지니아)에 서버를 두고 한국에서 접속했더니 이미지 하나 뜨는 데 3초가 걸렸습니다. 미국 친구는 "엄청 빠른데?"라고 하는데 말이죠.

"빛의 속도(30만 km/s)라서 순식간 아니야?"

아닙니다. 이게 제가 처음으로 물리적 한계를 체감한 순간이었습니다.

이론상: 지구 반대편까지 왕복 133ms (지구 둘레 4만 km ÷ 광속 30만 km/s)
현실: 해저 광케이블은 직선이 아니라 굽어져 있고, 라우터를 여러 번 경유하고, 패킷이 손실되면 재전송하고... 결국 왕복 200~300ms
HTTP 요청이 수십 개라면? 수 초(Seconds) 단위 지연 발생

제가 정말 화가 났던 건, 제 서버가 느린 게 아니라 물리학이 문제라는 거였습니다. 아무리 코드를 최적화해도 이 지연은 못 줄입니다.

그때 이해했다: "서버를 빠르게 하는 게 아니라, 서버를 가까이 보내야 한다."

이 물리적 한계를 극복하기 위해 나온 기술이 바로 CDN(Content Delivery Network)입니다. 결국 이거였다 - 인터넷 속도 문제의 90%는 거리 문제였던 겁니다.

2. 역사 (History) - MIT 교수가 만든 "인터넷 택배 시스템"

1990년대 후반의 악몽: "World Wide Wait"

1990년대 후반, 인터넷 트래픽이 폭증하면서 웹사이트 하나 열 때마다 몇 분씩 걸렸습니다. 사람들은 비아냥거리며 "World Wide Web"을 "World Wide Wait"라고 불렀죠.

당시 상황을 이해하려면 이걸 상상해보세요:

Yahoo! 메인 페이지 하나 = 100KB
56k 모뎀 속도 = 실제론 4KB/s
로딩 시간 = 25초
이미지가 10개면? 4분

MIT에서 나온 혁신 - Akamai의 탄생

1998년, MIT의 응용수학 교수 톰 레이튼(Tom Leighton)과 제자 대니 루인(Danny Lewin)이 이 문제를 수학적으로 접근했습니다.

핵심 아이디어:

"본점 하나가 아니라 전국에 지점을 만들자"
"손님을 가장 가까운 지점으로 안내하자"
"지점이 늘어나도 시스템이 안 깨지게 하자" → 이게 바로 일관된 해싱(Consistent Hashing)

그렇게 Akamai Technologies가 설립되었고, 이것이 상용 CDN의 시초입니다.

재미있는 사실: Akamai는 하와이 원주민 말로 "똑똑한"이라는 뜻입니다. 실제로 똑똑한 선택이었죠.

현재 - 인터넷 트래픽의 절반 이상

지금은 Cloudflare, AWS CloudFront, Fastly, Akamai 등이 전 세계 인터넷 트래픽의 50% 이상을 처리합니다. 넷플릭스, 유튜브, 페이스북 모두 CDN 없이는 작동 불가능합니다.

Cloudflare만 해도 전 세계 310개 도시에 데이터센터가 있습니다. 여러분이 이 글을 읽는 지금도 여러분 근처 10km 이내에 Cloudflare 서버가 있을 가능성이 높습니다.

3. 핵심 원리 - "본점 말고 지점" 전략을 정리해본다

1) 오리진(Origin) vs 엣지(Edge)

제가 처음엔 이 용어가 헷갈렸는데, 프랜차이즈 카페로 비유하니까 와닿았다:

Origin Server (본점): 원본 데이터가 있는 메인 서버
- 예: 미국 버지니아 AWS 데이터센터
- 역할: "진짜" 데이터 보관, 엣지 서버한테 복사본 제공
Edge Server (지점): 전 세계 300여 개 도시에 흩어진 캐시 서버
- 예: 서울 가산디지털단지, 도쿄 시부야, 런던 등
- 역할: 근처 손님들한테 빠르게 서빙

2) 실제 동작 과정 (자세히)

이걸 실제 시나리오로 따라가봅시다:

[한국 사용자] → "logo.png 주세요" → [어디로 가야 하지?]

Step 1: DNS Resolution (가장 가까운 지점 찾기)

# 사용자가 example.com/logo.png를 요청
# 브라우저는 먼저 DNS 조회를 함

$ dig example.com

;; ANSWER SECTION:
example.com.  60  IN  A  104.16.132.229  # Cloudflare Anycast IP

여기서 마법이 일어납니다. 이 IP는 하나가 아닙니다. 전 세계 수천 대 서버가 같은 IP를 공유합니다 (Anycast).

BGP(Border Gateway Protocol) 라우팅이 자동으로 가장 가까운 서버로 연결해줍니다:

한국에서 접속 → 서울 엣지 서버
미국에서 접속 → 뉴욕 엣지 서버
같은 IP, 다른 물리적 위치

Step 2: Edge Server에서 처리

[서울 엣지 서버 로그]
2025-05-19 14:32:15 KST
Request: GET /logo.png
Cache Status: MISS (처음 요청이라 캐시에 없음)
Action: Fetching from Origin (us-east-1)
Origin Response Time: 285ms
Saved to Cache with TTL: 86400s (24시간)
Response to Client: 320ms (총 소요 시간)

Step 3: 다음 사용자는 빠르다

[서울 엣지 서버 로그]
2025-05-19 14:32:18 KST
Request: GET /logo.png
Cache Status: HIT (캐시에 있음!)
Response Time: 4ms (엣지에서 바로 전달)

285ms → 4ms, 70배 빠름

3) Cache Hit Rate: 성공률이 수익을 결정한다

CDN의 성능은 Cache Hit Rate(캐시 적중률)로 측정됩니다.

Cache Hit Rate = (Cache Hits / Total Requests) × 100%

90% Hit Rate: 100번 요청 중 90번은 엣지에서 처리, 10번만 오리진으로
50% Hit Rate: 절반은 미국까지 왕복... 비효율적

CDN을 도입하고 캐시 전략을 잘 세우면 Hit Rate를 60%대에서 95%까지 끌어올릴 수 있다. 그렇게 되면 서버 비용과 응답 시간이 크게 줄어든다는 사례가 많다.

어떻게? 다음 섹션에서 설명합니다.

4. 일관된 해싱 (Consistent Hashing) - Akamai의 핵심 특허 한 걸음 더

실무에서 CDN이나 분산 캐시를 논할 때 빠질 수 없는 주제입니다. 저도 처음엔 "왜 이게 중요한지" 이해 못 했는데, 실제 상황을 겪어보니 와닿았다.

문제 상황 - 단순 해싱의 재앙

엣지 서버가 100대 있을 때, logo.png를 어떤 서버에 저장할까요?

단순한 방법: 모듈러 연산

def get_server(key):
    server_id = hash(key) % 100  # 0~99 중 하나
    return f"edge-{server_id}"

# 예시
get_server("logo.png")  # edge-42
get_server("video.mp4")  # edge-87

문제없어 보이죠? 그런데...

서버 1대가 추가되면?

def get_server(key):
    server_id = hash(key) % 101  # 이제 101대

# 똑같은 파일인데...
get_server("logo.png")  # edge-43 (바뀜!)
get_server("video.mp4")  # edge-88 (바뀜!)

결과: 모든 키의 해시값이 바뀌어 대규모 Cache Miss 발생

실제 시나리오:

트래픽이 늘어서 서버 1대 추가
모든 캐시가 무효화됨 (다른 서버 번호로 매핑되니까)
모든 요청이 오리진으로 몰려감 (Cache Stampede)
오리진 서버 다운
서비스 중단

제가 처음 이걸 겪었을 때 진짜 패닉했습니다. "서버 추가했는데 왜 서비스가 죽어?"

해결 - Consistent Hashing (일관된 해싱)

MIT 교수들이 만든 이 알고리즘의 핵심은 "서버가 추가/삭제되어도 대부분의 키는 그대로 유지된다"입니다.

원리: 링(Ring) 구조

        0° (= 360°)
           │
    ┌──────┼──────┐
    │      ↓      │
  Server-A      Server-C
    │              │
    │    Ring      │
    │  (0-2^32)    │
    │              │
  Server-B ←─────┘

동작 방식:

import hashlib

class ConsistentHash:
    def __init__(self):
        self.ring = {}  # {hash_value: server_name}
        self.sorted_keys = []

    def add_server(self, server_name):
        # 서버를 링에 배치 (여러 개의 가상 노드 생성)
        for i in range(150):  # 150개 복제본 (Virtual Nodes)
            virtual_key = f"{server_name}:{i}"
            hash_val = int(hashlib.md5(virtual_key.encode()).hexdigest(), 16)
            self.ring[hash_val] = server_name

        self.sorted_keys = sorted(self.ring.keys())

    def get_server(self, key):
        # 키를 해싱
        hash_val = int(hashlib.md5(key.encode()).hexdigest(), 16)

        # 링에서 시계방향으로 첫 번째 서버 찾기
        for ring_key in self.sorted_keys:
            if hash_val <= ring_key:
                return self.ring[ring_key]

        # 끝까지 못 찾으면 첫 번째 서버 (링이 순환하니까)
        return self.ring[self.sorted_keys[0]]

# 테스트
ch = ConsistentHash()
ch.add_server("edge-seoul")
ch.add_server("edge-tokyo")
ch.add_server("edge-osaka")

print(ch.get_server("logo.png"))      # edge-tokyo
print(ch.get_server("video.mp4"))     # edge-seoul

# 서버 추가
ch.add_server("edge-busan")

print(ch.get_server("logo.png"))      # edge-tokyo (그대로!)
print(ch.get_server("video.mp4"))     # edge-seoul (그대로!)
print(ch.get_server("new-file.jpg"))  # edge-busan (새 파일만 새 서버로)

핵심 포인트:

Virtual Nodes (가상 노드): 한 서버를 링에 여러 번 배치 (150개)
- 이유: 균등 분산. 안 그러면 특정 서버에 부하 몰림
서버 추가 시: 인접한 일부 키만 이동
- 100대 → 101대: 약 1%의 키만 재배치 (100%가 아니라!)
서버 삭제 시: 그 서버 키들만 다음 서버로
- 장애 난 서버의 트래픽만 이웃으로 분산

실제 예시 - DynamoDB의 파티션 분배

AWS DynamoDB도 Consistent Hashing을 씁니다:

데이터를 여러 노드에 분산
노드 추가/삭제 시 최소한의 데이터만 이동
그래서 페타바이트 규모에서도 안정적

결국 이거였다: 확장 가능한 분산 시스템의 핵심은 Consistent Hashing

5. 엣지 컴퓨팅 (Edge Computing) - 캐시를 넘어선 혁명

단순히 정적 파일(이미지, CSS)만 전달하던 시대는 끝났습니다. 이제는 엣지 서버에서 코드를 실행합니다.

기존 방식의 한계

[한국 사용자] "이미지를 400x300으로 리사이즈 해줘"
    ↓
[서울 엣지] "나 그거 못 해. 미국 본사로 가봐"
    ↓ (200ms)
[미국 오리진] "알았어" → 이미지 리사이징 → 응답
    ↓ (200ms)
[한국 사용자] 받음 (총 400ms+)

새로운 방식: Serverless at the Edge

[한국 사용자] "이미지를 400x300으로 리사이즈 해줘"
    ↓
[서울 엣지] "내가 바로 해줄게!" → V8 엔진으로 코드 실행 → 응답
    ↓ (15ms)
[한국 사용자] 받음 (총 15ms)

26배 빠름

실제 코드 - Cloudflare Workers 예시

// Cloudflare Workers = V8 엔진이 전 세계 310개 도시에서 돌아감

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
  const url = new URL(request.url)

  // 1. A/B 테스트 (엣지에서 바로 처리)
  const variant = Math.random() < 0.5 ? 'A' : 'B'

  // 2. Geo-blocking (국가별 접근 제어)
  const country = request.cf.country  // Cloudflare가 자동으로 감지
  if (country === 'CN') {
    return new Response('Not available in your region', { status: 403 })
  }

  // 3. 이미지 리사이징
  if (url.pathname.endsWith('.jpg')) {
    const imageRequest = new Request(url, {
      cf: { image: { width: 400, quality: 85 } }
    })
    return fetch(imageRequest)
  }

  // 4. 커스텀 캐시 키
  const cacheKey = new Request(url, {
    cf: { cacheKey: `${url.pathname}:${request.headers.get('Accept-Language')}` }
  })

  return fetch(cacheKey)
}

실제 사용 사례들:

인증 체크: JWT 토큰 검증을 엣지에서
- 오리진까지 안 가도 됨 → 오리진 부하 70% 감소
다국어 리다이렉트: Accept-Language 헤더 보고 /ko/ vs /en/ 분기
- 서버 로직 필요 없음
Bot 차단: User-Agent 보고 악성 크롤러 차단
- 오리진까지 못 오게 막음
이미지 최적화: WebP 지원 브라우저엔 WebP, 나머진 JPEG
- 대역폭 40% 절약

AWS Lambda@Edge 예시

// CloudFront의 각 단계에서 Lambda 실행 가능
// Viewer Request → Origin Request → Origin Response → Viewer Response

exports.handler = async (event) => {
  const request = event.Records[0].cf.request
  const headers = request.headers

  // 모바일 기기 감지해서 다른 Origin으로
  const userAgent = headers['user-agent'][0].value
  if (/Mobile|Android|iPhone/i.test(userAgent)) {
    request.origin = {
      custom: {
        domainName: 'mobile-api.example.com',
        port: 443,
        protocol: 'https'
      }
    }
  }

  // URL 정규화 (쿼리 파라미터 정렬로 캐시 효율 향상)
  const params = new URLSearchParams(request.querystring)
  const sortedParams = Array.from(params.entries()).sort()
  request.querystring = new URLSearchParams(sortedParams).toString()

  return request
}

저는 이걸 써서 오리진 요청을 60% 줄였습니다. 엣지에서 해결 가능한 건 엣지에서 처리하니까요.

6. 스트리밍 프로토콜 - 넷플릭스와 유튜브의 비밀을 이해했다

우리가 넷플릭스로 4K 영화를 볼 때, 수 기가바이트 통파일을 다운로드받는 게 아닙니다. 그랬다간 재생까지 1시간이 걸릴 테니까요.

여기서 HLS(HTTP Live Streaming)와 DASH(Dynamic Adaptive Streaming over HTTP)가 등장합니다.

전통적 방식의 문제

[넷플릭스 영화: 15GB]
    ↓
"15GB 다 받을 때까지 기다려주세요..."
    ↓
30분 후
    ↓
"자, 이제 재생됩니다!"

누가 이렇게 기다립니까?

HLS/DASH: 잘게 쪼개서 보낸다 (Chunking)

[원본 영화 2시간 = 15GB]
    ↓
[트랜스코딩 서버]
    ↓
┌──────────────────────────────────┐
│ 2초짜리 조각 3,600개로 쪼갬      │
│                                  │
│ chunk_0000.ts (1080p) - 4MB      │
│ chunk_0001.ts (1080p) - 4MB      │
│ chunk_0002.ts (1080p) - 4MB      │
│ ...                              │
│ chunk_3599.ts (1080p) - 4MB      │
│                                  │
│ 동시에 다른 화질도 생성:         │
│ chunk_0000.ts (720p)  - 2MB      │
│ chunk_0000.ts (480p)  - 1MB      │
│ chunk_0000.ts (360p)  - 0.5MB    │
└──────────────────────────────────┘
    ↓
[CDN에 전부 업로드]

재생 목록 파일 (master.m3u8):

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=8000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=4000000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2000000,RESOLUTION=854x480
480p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=640x360
360p/playlist.m3u8

각 화질의 playlist.m3u8:

#EXTM3U
#EXT-X-TARGETDURATION:2
#EXTINF:2.0,
chunk_0000.ts
#EXTINF:2.0,
chunk_0001.ts
#EXTINF:2.0,
chunk_0002.ts
...

적응형 비트레이트 (Adaptive Bitrate Streaming)

네트워크가 느려지면 화질이 1080p에서 720p로 자동으로 바뀌죠?

클라이언트 로직 (의사 코드):

class HLSPlayer {
  constructor() {
    this.currentBandwidth = 0
    this.bufferHealth = 0
  }

  async playVideo() {
    // 1. master.m3u8 받아오기
    const master = await fetch('master.m3u8')
    const qualities = this.parseMaster(master)

    // 2. 초기 품질 선택 (보수적으로 중간)
    let currentQuality = qualities[2]  // 720p

    while (true) {
      // 3. 다음 청크 다운로드
      const startTime = Date.now()
      const chunk = await fetch(currentQuality.nextChunk)
      const downloadTime = Date.now() - startTime

      // 4. 대역폭 측정
      const bandwidth = (chunk.size * 8) / (downloadTime / 1000)  // bps
      this.currentBandwidth = bandwidth

      // 5. 버퍼 상태 확인
      this.bufferHealth = this.getBufferLength()

      // 6. 품질 조정 로직
      if (this.bufferHealth < 5 && bandwidth < currentQuality.bandwidth) {
        // 버퍼가 5초 미만이고 속도가 느리면 → 화질 낮춤
        currentQuality = this.selectLowerQuality(qualities, bandwidth)
        console.log('Switching to lower quality:', currentQuality.resolution)
      } else if (this.bufferHealth > 20 && bandwidth > currentQuality.bandwidth * 1.5) {
        // 버퍼가 충분하고 속도가 빠르면 → 화질 높임
        currentQuality = this.selectHigherQuality(qualities, bandwidth)
        console.log('Switching to higher quality:', currentQuality.resolution)
      }

      // 7. 청크 재생
      this.appendToBuffer(chunk)

      await this.sleep(2000)  // 2초 기다림 (청크 길이)
    }
  }
}

실제 동작 시나리오:

[00:00] 재생 시작 → 720p 선택 (안전한 선택)
[00:10] 대역폭 8Mbps 측정 → 1080p로 업그레이드
[00:45] 지하철 진입, 대역폭 1.5Mbps → 480p로 다운그레이드
[01:12] WiFi 연결, 대역폭 12Mbps → 1080p로 복귀

사용자는 끊김 없이 계속 볼 수 있습니다. 이게 버퍼링 없는 스트리밍의 비밀입니다.

CDN이 핵심인 이유

이 모든 조각(수천 개)이 CDN 엣지 서버에 캐싱되어 있습니다:

[한국 사용자 A] → chunk_0042.ts 요청 → [서울 엣지] 있음! → 4ms 응답
[한국 사용자 B] → chunk_0042.ts 요청 → [서울 엣지] 있음! → 4ms 응답
...
[한국 사용자 Z] → chunk_0042.ts 요청 → [서울 엣지] 있음! → 4ms 응답

만약 CDN 없이 미국 오리진에서 직접 받는다면:

200ms × 1,800조각(1시간) = 6분의 누적 지연

CDN 덕분에 우리는 지구 반대편 영화를 실시간처럼 봅니다.

7. DDoS 방어 - 거대한 방파제로서의 CDN

CDN은 보안 장비로도 필수입니다.

DDoS 공격 시나리오

[AWS CloudWatch Alarm]
🚨 EC2 CPU 100%
🚨 Network In: 15 Gbps (평소 0.2 Gbps)
🚨 Server Unreachable

DDoS 공격이 들어오면 이런 상황이 벌어진다. 작은 서버(1Gbps 네트워크)에 15Gbps 트래픽이 쏟아지면?

결과: 서버 즉사. 정상 사용자도 접속 불가.

CDN이 막아주는 원리

1) 용량전(Volumetric Attack) 흡수

[공격자] 100 Gbps 트래픽 발사
    ↓
[CDN Global Network]
┌─────────────────────────────────────┐
│ 전 세계 310개 도시, 수천 대 서버   │
│ 총 용량: 200 Tbps (200,000 Gbps)   │
│                                     │
│ 서울 엣지: 3 Gbps 받음              │
│ 도쿄 엣지: 5 Gbps 받음              │
│ 런던 엣지: 2 Gbps 받음              │
│ ...                                 │
│                                     │
│ → 각 서버는 여유롭게 처리           │
└─────────────────────────────────────┘
    ↓
[오리진 서버] 정상 트래픽만 받음 (CDN이 필터링)

2) Anycast 라우팅으로 분산

# CDN은 전 세계에서 같은 IP를 광고함 (BGP Anycast)

[공격자 in 중국] → 공격 트래픽 → [가장 가까운 CDN = 홍콩 엣지]
[공격자 in 러시아] → 공격 트래픽 → [가장 가까운 CDN = 모스크바 엣지]
[공격자 in 브라질] → 공격 트래픽 → [가장 가까운 CDN = 상파울루 엣지]

# 공격 트래픽이 자동으로 전 세계로 분산됨
# 각 엣지는 소량만 받아서 처리 가능

3) 악성 패턴 자동 차단

// Cloudflare Firewall Rules 예시

// Rate Limiting: 같은 IP에서 초당 100회 이상 요청 → 차단
if (request.countPerSecond(request.ip) > 100) {
  return { action: 'block', reason: 'rate_limit' }
}

// Challenge: 의심스러운 User-Agent → CAPTCHA
if (/curl|bot|python/i.test(request.userAgent)) {
  return { action: 'challenge' }
}

// Geo-blocking: 내 서비스는 한국만 → 나머지 차단
if (!['KR', 'US'].includes(request.country)) {
  return { action: 'block', reason: 'geo' }
}

// WAF Rule: SQL Injection 패턴 감지
if (/union.*select|drop.*table/i.test(request.url)) {
  return { action: 'block', reason: 'sql_injection' }
}

실제 방어 사례: GitHub DDoS (2018)

공격 규모: 1.35 Tbps (역대 최대)
공격 방식: Memcached 반사 증폭 공격
GitHub 대응: Akamai CDN으로 트래픽 우회
결과: 8분 만에 방어 완료, 서비스 정상화

GitHub의 자체 서버로는 절대 못 막습니다. CDN의 글로벌 용량이 필수였죠.

8. 실제 사례 - 인터넷이 멈춘 날 (Fastly Outage 2021)

2021년 6월 8일 오전, 제가 아마존 들어가려는데 안 됐습니다. 레딧도 안 되고, 트위치도 안 되고, CNN도 안 되고...

"우리 집 와이파이 문제인가?" → 아니었습니다. 전 세계가 동시에 접속 불능이었습니다.

원인 - Fastly CDN 장애

조사 결과, Fastly CDN의 글로벌 장애였습니다.

영향받은 사이트들:

Amazon
Reddit
Twitch
GitHub
Stack Overflow
New York Times
Financial Times
BBC
CNN

전 세계 인터넷 트래픽의 약 10%가 먹통.

왜 이런 일이?

놀랍게도 해킹도, 하드웨어 고장도 아니었습니다.

Timeline:

[05:00 UTC] 한 고객사가 서비스 설정 변경
    ↓
[설정 내용] Varnish VCL (CDN 캐싱 로직) 파일에 정규표현식 추가
    ↓
[05:47 UTC] 이 설정이 전 세계 엣지 서버로 자동 배포됨
    ↓
[문제] 특정 HTTP 헤더를 만나면 정규표현식 무한 루프 발생 (ReDoS)
    ↓
[결과] CPU 100% 사용 → 서버 Hang → 전 세계 85% 엣지 서버 다운
    ↓
[05:58 UTC] Fastly 긴급 롤백
    ↓
[06:44 UTC] 서비스 정상화

문제의 정규표현식 (추정):

# 의도 - URL 파라미터 추출
/(\?|&)([^=]+)=([^&]*)/

# 하지만 특정 입력에서 Catastrophic Backtracking 발생
# 예: "?a=1&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&"
# → CPU가 수십 초 동안 멈춤 (ReDoS)

교훈 1: SPOF(Single Point of Failure)

CDN 하나에만 의존하면 그 CDN이 죽었을 때 대안이 없습니다.

해결책: Multi-CDN 전략

# nginx 설정으로 여러 CDN 사용

upstream cdn_pool {
  server cdn1.cloudflare.com:443 weight=3;
  server cdn2.fastly.com:443 weight=2;
  server cdn3.akamai.com:443 backup;  # 평소엔 안 씀, 비상용
}

server {
  location /static/ {
    proxy_pass https://cdn_pool;
    proxy_next_upstream error timeout http_503;  # 실패하면 다음 CDN으로
  }
}

대기업 전략:

Netflix: Akamai + Limelight + 자체 CDN(Open Connect)
Facebook: Akamai + 자체 PoP
Shopify: Fastly + Cloudflare (2021년 사고 이후 추가)

교훈 2 - 설정 배포의 위험성

코드가 아닌 설정(Config) 변경 하나가 전 세계를 멈출 수도 있습니다.

Best Practice:

Canary Deployment: 설정 변경을 1% 서버에만 먼저 적용
Circuit Breaker: 에러율이 임계값 넘으면 자동 롤백
Gradual Rollout: 5% → 25% → 50% → 100% 단계적 적용

# Fastly는 이 사고 후 이런 시스템 도입 (추정)

deployment:
  strategy: canary
  stages:
    - percent: 1
      duration: 10m
      error_threshold: 0.1%  # 에러율 0.1% 넘으면 중단
    - percent: 10
      duration: 30m
      error_threshold: 0.5%
    - percent: 100
      auto_rollback: true

이 사건으로 저는 깨달았습니다: 인터넷은 생각보다 취약하다. 그리고 CDN은 생각보다 중요하다.

9. 실제 Lab - Cache Invalidation (캐시 무효화) - 개발자의 영원한 숙제

Phil Karlton의 명언:

"컴퓨터 과학에는 두 가지 어려운 문제가 있다: 캐시 무효화와 이름 짓기."

제가 가장 골머리 앓았던 부분입니다. "서버 배포했는데 왜 예전 화면이 나오죠?"

문제 상황

[11:00] 개발자: style.css 수정 → 서버 배포 완료
[11:01] 사용자: "아직도 옛날 스타일인데요?"
[11:02] 개발자: "제 브라우저에선 잘 나오는데요?" (로컬 서버 보고 있음 🤦)
[11:03] 사용자: "Ctrl+F5 눌러도 그대로예요"
[11:05] 개발자: "CDN 캐시 때문이네... 어떻게 지우지?"

방법 1 - 강제 퍼지 (Purge/Invalidation) - 비추천

Cloudflare 예시:

# API로 캐시 전체 삭제
curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/purge_cache" \
  -H "Authorization: Bearer {api_token}" \
  -H "Content-Type: application/json" \
  --data '{"purge_everything":true}'

AWS CloudFront 예시:

# 특정 파일 무효화
aws cloudfront create-invalidation \
  --distribution-id E1234567890ABC \
  --paths "/css/*" "/js/*" "/images/logo.png"

문제점:

전파 지연: 전 세계 엣지에 퍼지는 데 시간이 걸림 (몇 초 ~ 몇 분)
비용: AWS는 월 1,000건까지 무료, 초과 시 건당 $0.005
Race Condition: 무효화 요청 중에 누군가 요청하면 옛날 버전이 다시 캐싱됨

방법 2 - Versioning (권장) - 파일명 바꾸기

개념:

Before: /static/style.css
After:  /static/style.v2.css
        또는 /static/style.a8b3c9.css (Hash)

CDN은 다른 파일로 인식하므로 캐시를 안 찾습니다 → 즉시 반영

Webpack/Vite 자동화:

// webpack.config.js
module.exports = {
  output: {
    filename: '[name].[contenthash].js',  // main.a8b3c9f2.js
    chunkFilename: '[name].[contenthash].js',
  },
  plugins: [
    new MiniCssExtractPlugin({
      filename: '[name].[contenthash].css',  // style.b4d8e1a3.css
    }),
  ],
}

결과:

<!-- 빌드 전 -->
<link rel="stylesheet" href="/static/style.css">

<!-- 빌드 후 -->
<link rel="stylesheet" href="/static/style.b4d8e1a3.css">

매번 빌드할 때마다 파일명이 바뀌므로:

CDN은 새 파일로 인식 → Cache Miss → 오리진에서 새 파일 받아옴
무효화 비용 0원
즉시 반영 (전파 지연 없음)

방법 3 - Cache-Control 헤더 전략

HTTP 헤더로 캐싱 제어:

# nginx 설정

# 1. HTML: 절대 캐싱 안 함 (항상 최신 확인)
location ~ \.html$ {
  add_header Cache-Control "no-cache, no-store, must-revalidate";
  add_header Pragma "no-cache";
  add_header Expires 0;
}

# 2. CSS/JS (해시 없는 버전) - 짧게 캐싱
location ~ \.(css|js)$ {
  add_header Cache-Control "public, max-age=3600";  # 1시간
}

# 3. 이미지 - 길게 캐싱
location ~ \.(jpg|jpeg|png|gif|webp)$ {
  add_header Cache-Control "public, max-age=31536000, immutable";  # 1년
}

# 4. 해시 포함 파일 - 영구 캐싱 (파일명이 바뀌면 다른 파일이니까)
location ~ \.(css|js)\?v= {
  add_header Cache-Control "public, max-age=31536000, immutable";
}

헤더 설명:

헤더	의미	캐싱 위치
`public`	누구나 캐싱 가능	CDN, 프록시, 브라우저
`private`	브라우저만 캐싱	브라우저만
`no-cache`	캐싱하되, 매번 서버에 확인	조건부 캐싱
`no-store`	절대 캐싱 금지	없음 (보안 중요 데이터)
`max-age=3600`	3600초(1시간) 동안 유효	브라우저 + CDN
`s-maxage=3600`	CDN용 TTL (max-age 오버라이드)	CDN만
`immutable`	절대 안 바뀜 (재검증 불필요)	브라우저

실제 예시:

// Express.js 서버
app.get('/api/user', (req, res) => {
  res.set({
    'Cache-Control': 'private, max-age=300',  // 브라우저만 5분 캐싱
    'Vary': 'Authorization',  // 인증 헤더별로 다른 캐시
  })
  res.json({ name: 'John' })
})

app.get('/static/logo.png', (req, res) => {
  res.set({
    'Cache-Control': 'public, max-age=31536000, immutable',  # CDN+브라우저 1년
  })
  res.sendFile('logo.png')
})

제가 쓰는 전략 (하이브리드)

1. HTML 파일: no-cache (항상 최신)
2. CSS/JS: 파일명에 해시 포함 + immutable (영구 캐싱)
3. 이미지: 1년 캐싱 (잘 안 바뀜)
4. API: private + 짧은 TTL (사용자별 다름)

결과:

캐시 Hit Rate: 95%
배포 시 즉시 반영 (Purge 불필요)
서버 비용 대폭 절감 (Hit Rate가 높아질수록 오리진 요금이 줄어드는 구조)

10. 비용 최적화 - 대역폭 요금 줄이기

클라우드 비용의 60~80%는 Egress(외부로 나가는 트래픽) 비용입니다.

제가 받았던 청구서 (Before CDN)

AWS 월 청구서
────────────────────────────────
EC2 t3.medium        $30
RDS db.t3.small      $25
Data Transfer Out    $740 ← 😱
────────────────────────────────
Total                $795

트래픽이 늘어나자 서버보다 대역폭 비용이 25배 비쌌습니다.

AWS 대역폭 요금 구조

Data Transfer OUT from EC2 to Internet:

First 10 TB/month:    $0.09 per GB
Next 40 TB/month:     $0.085 per GB
Next 100 TB/month:    $0.07 per GB
Over 150 TB/month:    $0.05 per GB

제 서비스는 월 8TB 전송:

8,000 GB × $0.09 = $720

CloudFront CDN 요금

Data Transfer OUT from CloudFront:

First 10 TB/month:    $0.085 per GB
Next 40 TB/month:     $0.080 per GB
Next 100 TB/month:    $0.060 per GB

약간 싸지만... 진짜 마법은 다음입니다.

AWS 내부망 전송은 무료

[EC2] → [CloudFront] : 무료 (같은 리전이면)
[CloudFront] → [사용자] : $0.085/GB

[EC2] → [사용자] : $0.09/GB

핵심: 오리진(EC2)에서 CloudFront로 보낼 때는 무료니까, Cache Hit Rate이 높으면 엄청난 비용 절감!

제 경우:

월 트래픽: 8 TB
Cache Hit Rate: 95%

Origin → CDN: 0.4 TB (5% Miss) → 무료
CDN → 사용자: 8 TB → $0.085 × 8,000 = $680

하지만!

Origin 트래픽이 줄어서 EC2 네트워크도 작은 인스턴스로 가능
EC2 t3.medium → t3.small ($30 → $17)

최종 비용:
CloudFront: $680
EC2: $17
RDS: $25
────────────
Total: $722

추가 최적화 (압축):
Gzip/Brotli 압축으로 트래픽 40% 감소
8TB → 4.8TB
$680 → $408

최종: $450

$795 → $450, 43% 절감

Cloudflare의 혁신: Bandwidth Alliance

Cloudflare는 특정 스토리지 파트너와 Egress 비용 0원 정책을 만들었습니다.

파트너 목록:

Backblaze B2
DigitalOcean Spaces
Linode Object Storage
Vultr Object Storage

예시:

일반적인 구조:
[S3] → [CloudFront] → [사용자]
     무료           $0.085/GB

Bandwidth Alliance:
[Backblaze B2] → [Cloudflare CDN] → [사용자]
            무료                  무료 (!)

실제 비용 비교 (월 10TB 전송):

AWS S3 + CloudFront:
- S3 저장: $23 (1TB)
- CDN 전송: $850 (10TB × $0.085)
- 합계: $873

Backblaze B2 + Cloudflare:
- B2 저장: $5 (1TB)
- CDN 전송: $0 (Bandwidth Alliance)
- 합계: $5 (!)

174배 차이

저는 이걸 알고 즉시 마이그레이션했습니다. 월 비용이 $450 → $45로 떨어졌습니다.

최적화 체크리스트

// 1. 이미지 최적화
// Before: 2MB JPEG
// After: 300KB WebP (압축 + 포맷 변환)

// Cloudflare Polish 또는 직접 구현
app.get('/images/:name', async (req, res) => {
  const acceptsWebP = req.headers.accept?.includes('image/webp')

  if (acceptsWebP) {
    const webpPath = `${req.params.name}.webp`
    res.sendFile(webpPath)  // 85% 작음
  } else {
    res.sendFile(req.params.name)  // 원본 JPEG
  }
})

// 2. Gzip/Brotli 압축
// nginx에서 자동 처리
gzip on;
gzip_types text/css application/javascript application/json;
gzip_min_length 1000;

# Brotli (더 강력, 최신 브라우저)
brotli on;
brotli_types text/css application/javascript;

// 3. 불필요한 데이터 제거
// API 응답에서 null 필드 제거
const response = {
  id: 123,
  name: "John",
  avatar: null,  // 제거 대상
  bio: null,     // 제거 대상
}

// JSON.stringify로 null 제외
res.json(
  JSON.parse(JSON.stringify(response, (k, v) => v === null ? undefined : v))
)

// 4. 리사이즈된 이미지 제공
// 2000×2000 원본을 모바일에도 보내지 말 것
<img
  src="image-400.jpg"  # 모바일
  srcset="image-400.jpg 400w, image-800.jpg 800w, image-1200.jpg 1200w"
  sizes="(max-width: 600px) 400px, (max-width: 1200px) 800px, 1200px"
/>

결국 이거였다: "어디에 저장하고 어디로 쏘느냐"가 클라우드 비용의 90%를 결정한다.

11. 고급 주제 - CDN의 미래와 트렌드

1) Compute@Edge의 진화

엣지에서 돌아가는 코드가 점점 복잡해지고 있습니다.

현재:

Cloudflare Workers: V8 JavaScript/WASM
Fastly Compute@Edge: WASM (Rust, C++, Go 등)
AWS Lambda@Edge: Node.js, Python

미래:

전체 웹 애플리케이션이 엣지에서 실행
데이터베이스 쿼리도 엣지에서 (Cloudflare D1, PlanetScale 등)
AI 추론도 엣지에서 (텍스트 분석, 이미지 인식 등)

예시: 엣지에서 AI 실행

// Cloudflare Workers AI
addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
  const ai = new Ai(env.AI)

  const response = await ai.run('@cf/meta/llama-2-7b-chat-int8', {
    prompt: "서울에서 부산까지 가는 방법은?"
  })

  return new Response(JSON.stringify(response))
}

// 미국 오리진까지 안 가고 서울 엣지에서 AI 실행
// 지연 시간: 200ms → 15ms

2) Real-time 최적화

WebSocket, Server-Sent Events(SSE) 같은 실시간 연결도 CDN을 거치게 되었습니다.

// Cloudflare Durable Objects = Stateful Edge Computing

export class ChatRoom {
  constructor(state, env) {
    this.state = state
    this.sessions = []
  }

  async fetch(request) {
    // WebSocket 연결
    const [client, server] = Object.values(new WebSocketPair())

    this.sessions.push(server)

    server.addEventListener('message', event => {
      // 같은 엣지 서버에 있는 다른 사용자들에게 즉시 브로드캐스트
      this.sessions.forEach(session => {
        session.send(event.data)
      })
    })

    return new Response(null, { status: 101, webSocket: client })
  }
}

// 한국 사용자들끼리 채팅 → 서울 엣지에서 처리
// 미국까지 왕복 불필요 → 지연 4ms

3) Privacy-first CDN

GDPR, 개인정보보호법 강화로 로그도 지역별로 격리하는 추세입니다.

기존:
- 한국 사용자 로그 → 미국 본사로 전송 → GDPR 위반 가능

새 방식:
- 한국 사용자 로그 → 한국 엣지에만 저장
- EU 사용자 로그 → EU 엣지에만 저장
- 리전 간 전송 금지

Cloudflare는 이미 Regional Services 옵션을 제공합니다.

12. 용어 사전 (Glossary)

CDN (Content Delivery Network): 콘텐츠를 효율적으로 전달하기 위해 전 세계에 분산된 서버 네트워크
Origin Server: 원본 콘텐츠가 저장된 메인 서버 (본점)
Edge Server/PoP (Point of Presence): 사용자와 가까운 위치에 설치된 캐시 서버 (지점)
Cache Hit/Miss: 요청 데이터가 캐시에 있으면 Hit (빠름), 없으면 Miss (오리진 조회 필요)
TTL (Time To Live): 캐시된 데이터가 유효한 시간 (초 단위)
Anycast: 1개의 IP를 여러 서버가 공유하여, 가장 가까운 서버가 응답하게 하는 라우팅 기술
Consistent Hashing: 분산 캐시 시스템에서 노드 추가/삭제 시 데이터 재배치를 최소화하는 알고리즘
Edge Computing: 데이터를 중앙 서버가 아닌, 사용자 근처의 엣지 서버에서 처리하는 기술
Cache Purge/Invalidation: 캐시된 데이터를 강제로 삭제하여 최신 데이터를 받아오게 하는 작업
Geo-blocking: 접속자의 IP 위치를 기반으로 특정 국가의 접속을 차단/허용하는 기능
WAF (Web Application Firewall): SQL Injection, XSS 같은 웹 해킹을 막는 방화벽
Cache Stampede: 캐시가 만료되는 순간 수많은 요청이 동시에 오리진 서버로 몰려 서버가 다운되는 현상
Dynamic Content Acceleration: 캐싱되지 않는 동적 콘텐츠를 가속하기 위해, 오리진까지의 최적 경로를 찾아주는 기술
Last Mile: 최종 사용자에게 도달하는 마지막 통신 구간 (집 앞 전봇대 → 집)
Pre-fetching: 사용자가 요청하기 전에 미리 데이터를 엣지로 가져다 놓는 기술
HLS (HTTP Live Streaming): Apple이 만든 비디오 스트리밍 프로토콜, 작은 조각으로 나눠서 전송
DASH (Dynamic Adaptive Streaming over HTTP): 국제 표준 스트리밍 프로토콜, HLS와 유사
Egress: 데이터센터에서 외부로 나가는 트래픽 (요금이 비쌈)
ReDoS (Regular Expression Denial of Service): 정규표현식 무한 루프로 서버를 다운시키는 공격
BGP (Border Gateway Protocol): 인터넷 라우터들이 경로를 광고하고 선택하는 프로토콜

13. FAQ

Q1: CDN은 정적 파일만 처리하나요?

A: 예전엔 그랬지만, 지금은 동적 컨텐츠도 가속합니다.

정적: 이미지, CSS, JS → 캐싱
동적: API, 로그인 → 최적 경로로 전달 (Argo Smart Routing)
Compute: 엣지에서 코드 실행 (Workers, Lambda@Edge)

Q2: Cloudflare 무료 플랜과 유료 플랜 차이는?

무료 ($0/월):
✅ 무제한 대역폭
✅ 기본 DDoS 방어
✅ SSL/TLS
✅ Anycast DNS
❌ 이미지 자동 최적화
❌ 상세 WAF 규칙
❌ 중국 가속

Pro ($20/월):
✅ 위 모든 기능
✅ Image Optimization (Polish)
✅ Mobile Redirect
✅ WAF 규칙 5개

Business ($200/월):
✅ WAF 규칙 무제한
✅ 우선 지원
✅ PCI 준수

Enterprise (가격 협상):
✅ 중국 네트워크
✅ 전담 팀
✅ SLA 보장

개인/스타트업은 무료로도 충분합니다.

Q3: 개인 프로젝트에도 CDN이 필요한가요?

경우의 수:

전 세계 사용자 → 필수
- 미국 서버인데 한국 사용자 있으면 CDN 없인 너무 느림
한국만 → 선택
- 필수는 아니지만 무료(Cloudflare)라면 쓰는 게 이득
- 이유: HTTPS 자동, DDoS 방어, 캐싱 등
대용량 파일 (이미지, 비디오) → 강력 추천
- 서버 대역폭 요금 절감

Q4: CDN 설정 얼마나 어렵나요?

Cloudflare 예시 (5분 완성):

1. Cloudflare 가입
2. 도메인 추가 (example.com)
3. DNS 네임서버 변경 (도메인 등록 업체에서)
   - 기존: ns1.godaddy.com
   - 새로: chad.ns.cloudflare.com
4. 끝! (자동으로 CDN 활성화)

DNS가 전파되면 (5분~48시간) 모든 트래픽이 Cloudflare를 거칩니다.

AWS CloudFront (좀 더 복잡):

1. CloudFront Distribution 생성
2. Origin 설정 (EC2, S3 등)
3. Behavior 설정 (캐싱 규칙)
4. DNS에 CNAME 추가
   - d1234abcd.cloudfront.net → cdn.example.com

Q5: CDN이 느려질 수도 있나요?

있습니다:

첫 요청 (Cold Start): Cache Miss라 오리진까지 왕복 → 느림
- 해결: Pre-fetching, 인기 컨텐츠 미리 로딩
지역에 PoP 없음: 가까운 엣지가 없으면 효과 반감
- 예: 아프리카 일부 지역
잘못된 캐싱 설정: 캐싱하면 안 되는 걸 캐싱 → 오히려 문제
- 예: 사용자별 다른 API를 public 캐싱 → 다른 사람 정보 노출

Q6: Multi-CDN은 언제 필요한가요?

기준:

트래픽 > 월 100TB → 고려
중요도 높음 (금융, 의료) → 고려
SLA 필요 (99.99% 이상) → 필수

일반 스타트업은 단일 CDN으로 충분합니다.

14. 마무리 - 제가 CDN에서 배운 것들을 정리해본다

3년 전 제가 처음 글로벌 서비스를 만들 때, 저는 "좋은 코드가 빠른 서비스를 만든다"고 믿었습니다.

틀렸습니다.

빠른 서비스는 물리학과의 싸움입니다.

빛의 속도는 못 바꿉니다
하지만 거리는 줄일 수 있습니다

CDN은 단순한 캐싱 서버가 아니라, 지리적 한계를 극복하는 인프라였습니다.

제가 이해한 핵심:

거리가 지연을 만든다 → 서버를 가까이 보내라
캐싱이 돈을 아낀다 → Hit Rate 95%면 비용 1/20
분산이 안정성을 만든다 → DDoS도, 장애도 견딤
엣지가 미래다 → 단순 캐시에서 컴퓨팅 플랫폼으로

결국 이거였다: 현대 웹은 CDN 없이 불가능하다.

넷플릭스, 유튜브, 페이스북... 우리가 쓰는 모든 서비스 뒤에는 CDN이 있습니다. 그게 보이지 않을 뿐이죠.

이제 여러분도 압니다. 버퍼링 없는 스트리밍, 빠른 웹사이트, 끊기지 않는 서비스... 그 모든 것의 비밀을.

CDN: How Netflix Streams Globally Without Buffering

1. Prologue: The Day I Learned Physics Beats Code

When I first launched a global web service, I hosted it on AWS us-east-1 (Virginia). My American friend said, "Wow, this is fast!" Meanwhile, users in Korea complained that a single image took 3 seconds to load.

I thought, "Isn't light supposed to be fast? 300,000 km/s?"

Wrong.

The Reality Check:

Theory: Round-trip to the other side of Earth = 133ms
Practice: Undersea cables aren't straight, packets route through 15+ hops, packet loss triggers retransmission → 200-300ms RTT
With 50 HTTP requests: Total delay reaches several seconds

I was furious. Not at my server, not at my code, but at physics itself. No amount of optimization can beat the speed of light.

That's when I realized: "Don't make the server faster. Bring the server closer."

This physical limitation is what CDN (Content Delivery Network) solves. It's not about speed - it's about distance.

2. History: How MIT Professors Built the "Internet Delivery Network"

1998: The Birth of Akamai

In the late 1990s, the web was so slow that people sarcastically called it the "World Wide Wait" instead of "World Wide Web."

The Problem:

Yahoo! homepage = 100KB
56k modem = 4KB/s actual speed
Load time = 25 seconds
With 10 images? 4 minutes

The Solution:

MIT math professor Tom Leighton and his student Danny Lewin approached this mathematically:

"Don't put all eggs in one basket - create branches nationwide"
"Route customers to their nearest branch"
"Make sure adding branches doesn't break the system" → This became Consistent Hashing

They founded Akamai Technologies (Hawaiian for "smart"), the first commercial CDN.

Today: Over 50% of global internet traffic flows through CDNs like Cloudflare, AWS CloudFront, Fastly, and Akamai. Netflix, YouTube, and Facebook would be impossible without them.

Cloudflare alone operates in 310+ cities worldwide. Right now, there's probably a Cloudflare server within 10km of you.

3. Core Mechanics: "Headquarters vs Branches"

Origin vs Edge Servers

Think of it like a franchise coffee shop:

Origin Server (Headquarters): The main server with the "real" data
- Example: AWS Virginia data center
- Role: Store master copies, feed Edge servers
Edge Server (Branch): Thousands of cache servers worldwide
- Examples: Seoul, Tokyo, London, São Paulo
- Role: Serve nearby customers instantly

How It Actually Works

Step 1: DNS Resolution (Finding the Nearest Branch)

# User requests example.com/logo.png
# Browser does DNS lookup first

$ dig example.com

;; ANSWER SECTION:
example.com.  60  IN  A  104.16.132.229  # Cloudflare Anycast IP

Here's the magic: This IP isn't one server. Thousands of servers share the same IP (Anycast).

BGP routing automatically connects you to the nearest server:

Request from Korea → Seoul Edge
Request from USA → New York Edge
Same IP, different physical location

Step 2: Edge Processing

[Seoul Edge Server Log]
2025-05-19 14:32:15 KST
Request: GET /logo.png
Cache Status: MISS (first request, not in cache)
Action: Fetching from Origin (us-east-1)
Origin Response Time: 285ms
Saved to Cache with TTL: 86400s (24 hours)
Response to Client: 320ms total

Step 3: Subsequent Requests Are Fast

[Seoul Edge Server Log]
2025-05-19 14:32:18 KST
Request: GET /logo.png
Cache Status: HIT (found in cache!)
Response Time: 4ms

285ms → 4ms = 70x faster

Cache Hit Rate: The Metric That Matters

CDN performance is measured by Cache Hit Rate:

Cache Hit Rate = (Cache Hits / Total Requests) × 100%

90% Hit Rate: 90 requests served from Edge, only 10 go to Origin
50% Hit Rate: Half the requests hit Origin → inefficient

Tuning cache strategies well can push Hit Rate from 60% to 95%. There are many reported cases where this kind of improvement leads to dramatic reductions in server costs and response times.

4. Deep Dive: Consistent Hashing - Akamai's Core Patent

This is crucial for distributed systems interviews. I didn't understand its importance until I experienced the disaster firsthand.

The Problem: Naive Hashing Breaks Everything

You have 100 Edge servers. Where do you store logo.png?

Naive Approach: Modulo Operation

def get_server(key):
    server_id = hash(key) % 100  # 0-99
    return f"edge-{server_id}"

get_server("logo.png")  # edge-42
get_server("video.mp4")  # edge-87

Looks fine, right? But then...

Add 1 Server:

def get_server(key):
    server_id = hash(key) % 101  # now 101 servers

get_server("logo.png")  # edge-43 (changed!)
get_server("video.mp4")  # edge-88 (changed!)

Result: All keys remap → Cache Stampede → Origin server dies.

Real scenario I experienced:

Traffic increased, added 1 server
All cache mappings changed
All requests hit Origin
Origin server overloaded
Service down

Solution: Consistent Hashing

The breakthrough: "Adding/removing servers only affects a small portion of keys."

Concept: Ring Structure

import hashlib

class ConsistentHash:
    def __init__(self):
        self.ring = {}  # {hash_value: server_name}
        self.sorted_keys = []

    def add_server(self, server_name):
        # Place server on ring (with virtual nodes)
        for i in range(150):  # 150 replicas
            virtual_key = f"{server_name}:{i}"
            hash_val = int(hashlib.md5(virtual_key.encode()).hexdigest(), 16)
            self.ring[hash_val] = server_name

        self.sorted_keys = sorted(self.ring.keys())

    def get_server(self, key):
        hash_val = int(hashlib.md5(key.encode()).hexdigest(), 16)

        # Find first server clockwise
        for ring_key in self.sorted_keys:
            if hash_val <= ring_key:
                return self.ring[ring_key]

        # Wrap around
        return self.ring[self.sorted_keys[0]]

# Test
ch = ConsistentHash()
ch.add_server("edge-seoul")
ch.add_server("edge-tokyo")
ch.add_server("edge-osaka")

print(ch.get_server("logo.png"))      # edge-tokyo
print(ch.get_server("video.mp4"))     # edge-seoul

# Add server
ch.add_server("edge-busan")

print(ch.get_server("logo.png"))      # edge-tokyo (unchanged!)
print(ch.get_server("video.mp4"))     # edge-seoul (unchanged!)

Key Points:

Virtual Nodes: Each server appears 150 times on the ring
- Prevents uneven distribution
Adding servers: Only ~1% of keys remap (not 100%!)
Removing servers: Only that server's keys redistribute

This is also how AWS DynamoDB handles partition distribution at petabyte scale.

5. Edge Computing: Beyond Simple Caching

CDNs have evolved from static file servers to code execution platforms.

Old Way: Everything Goes to Origin

[User] "Resize this image to 400x300"
    ↓
[Edge] "Can't do that, forwarding to Origin"
    ↓ (200ms)
[Origin] Resizes image → Response
    ↓ (200ms)
[User] Receives (400ms+ total)

New Way: Serverless at the Edge

[User] "Resize this image to 400x300"
    ↓
[Edge] Executes V8 JavaScript → Resizes → Response
    ↓ (15ms)
[User] Receives (15ms total)

26x faster

Real Code: Cloudflare Workers

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
  const url = new URL(request.url)

  // 1. A/B Testing at the Edge
  const variant = Math.random() < 0.5 ? 'A' : 'B'

  // 2. Geo-blocking
  const country = request.cf.country
  if (country === 'CN') {
    return new Response('Not available', { status: 403 })
  }

  // 3. Image Resizing
  if (url.pathname.endsWith('.jpg')) {
    const imageRequest = new Request(url, {
      cf: { image: { width: 400, quality: 85 } }
    })
    return fetch(imageRequest)
  }

  // 4. Custom Cache Key
  const cacheKey = `${url.pathname}:${request.headers.get('Accept-Language')}`

  return fetch(new Request(url, { cf: { cacheKey } }))
}

Use Cases:

Auth checks: Validate JWT at Edge → 70% less Origin load
Bot blocking: Filter malicious crawlers before they reach Origin
Image optimization: Serve WebP to supporting browsers → 40% bandwidth savings

I used this to reduce Origin requests by 60%.

6. Streaming Protocols: The Netflix Secret

When you watch a 4K movie on Netflix, you're not downloading a 15GB file. That would take an hour to start.

HLS/DASH: Chunking Strategy

[Original Movie: 2 hours = 15GB]
    ↓
[Transcoding Server]
    ↓
Split into 3,600 × 2-second chunks:
- chunk_0000.ts (1080p) - 4MB
- chunk_0000.ts (720p)  - 2MB
- chunk_0000.ts (480p)  - 1MB
- chunk_0000.ts (360p)  - 0.5MB
...
    ↓
[Upload all to CDN]

Master Playlist (master.m3u8):

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=8000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=4000000,RESOLUTION=1280x720
720p/playlist.m3u8

Adaptive Bitrate Streaming

The client measures bandwidth in real-time and switches quality:

class HLSPlayer {
  async playVideo() {
    let currentQuality = '720p'  // Start conservatively

    while (true) {
      const startTime = Date.now()
      const chunk = await fetch(currentQuality.nextChunk)
      const downloadTime = Date.now() - startTime

      // Measure bandwidth
      const bandwidth = (chunk.size * 8) / (downloadTime / 1000)

      // Adjust quality
      if (bufferHealth < 5 && bandwidth < currentQuality.bandwidth) {
        currentQuality = selectLowerQuality()  // Downgrade
      } else if (bufferHealth > 20 && bandwidth > currentQuality.bandwidth * 1.5) {
        currentQuality = selectHigherQuality()  // Upgrade
      }

      appendToBuffer(chunk)
      await sleep(2000)
    }
  }
}

Timeline:

[00:00] Start → 720p (safe choice)
[00:10] Bandwidth 8Mbps → Upgrade to 1080p
[00:45] Enter subway, 1.5Mbps → Downgrade to 480p
[01:12] WiFi connected, 12Mbps → Back to 1080p

No buffering, seamless experience.

Why CDN Is Critical

All these chunks (thousands of them) are cached at Edge servers:

[Korean User A] → chunk_0042.ts → [Seoul Edge] Hit! → 4ms
[Korean User B] → chunk_0042.ts → [Seoul Edge] Hit! → 4ms
[Korean User Z] → chunk_0042.ts → [Seoul Edge] Hit! → 4ms

Without CDN (direct from US Origin):

200ms × 1,800 chunks = 6 minutes cumulative delay

With CDN: Watch a movie from the other side of the planet as if it's local.

7. DDoS Defense: The Giant Breakwater

CDN also serves as a critical security layer.

DDoS Attack Scenario

[AWS CloudWatch]
🚨 EC2 CPU 100%
🚨 Network In: 15 Gbps (normal: 0.2 Gbps)
🚨 Server Unreachable

When a DDoS attack hits, a small server (1Gbps network) getting flooded with 15Gbps traffic has no chance.

Result: Server dead. Legitimate users blocked.

How CDN Defends

1) Volumetric Attack Absorption

[Attacker] 100 Gbps attack
    ↓
[CDN Global Network]
Total capacity: 200 Tbps (200,000 Gbps)
310 cities, thousands of servers
    ↓
Seoul Edge: 3 Gbps (manageable)
Tokyo Edge: 5 Gbps (manageable)
London Edge: 2 Gbps (manageable)
...
    ↓
[Origin] Only legitimate traffic reaches here

2) Anycast Distribution

[Attacker in China] → Attack → [Nearest CDN = Hong Kong Edge]
[Attacker in Russia] → Attack → [Nearest CDN = Moscow Edge]
[Attacker in Brazil] → Attack → [Nearest CDN = São Paulo Edge]

# Attack traffic automatically distributed globally
# Each Edge receives manageable portion

3) Pattern-based Blocking

// Cloudflare Firewall Rules

// Rate limiting
if (request.countPerSecond(request.ip) > 100) {
  return { action: 'block', reason: 'rate_limit' }
}

// Bot detection
if (/curl|bot|python/i.test(request.userAgent)) {
  return { action: 'challenge' }  // CAPTCHA
}

// SQL Injection detection
if (/union.*select|drop.*table/i.test(request.url)) {
  return { action: 'block', reason: 'sql_injection' }
}

Real Case: GitHub DDoS (2018)

Attack size: 1.35 Tbps (largest ever)
Attack type: Memcached reflection amplification
GitHub response: Routed through Akamai CDN
Result: Mitigated in 8 minutes

GitHub's own servers couldn't handle it. CDN's global capacity was essential.

8. The Day the Internet Stopped: Fastly Outage 2021

June 8, 2021. I tried to access Amazon - down. Reddit - down. Twitch - down. CNN - down.

"Is my WiFi broken?" Nope. The entire internet was broken.

Cause: Fastly CDN Failure

Affected sites:

Amazon, Reddit, Twitch, GitHub, Stack Overflow
New York Times, Financial Times, BBC, CNN
~10% of global internet traffic

What Happened?

Not a hack. Not hardware failure.

Timeline:

[05:00 UTC] Customer updates service config
    ↓
[Config] Added regex pattern to Varnish VCL
    ↓
[05:47 UTC] Config deployed globally
    ↓
[Bug] Certain HTTP headers trigger regex infinite loop (ReDoS)
    ↓
[Result] CPU 100% → Servers hang → 85% of Edge servers down
    ↓
[05:58 UTC] Fastly emergency rollback
    ↓
[06:44 UTC] Service restored

The problematic regex (estimated):

/(\?|&)([^=]+)=([^&]*)/

# With certain input: "?a=1&&&&&&&&&&&&&&&&&&&&&&&&"
# → Catastrophic backtracking → CPU freeze (ReDoS)

Lessons

1) SPOF (Single Point of Failure)

One CDN dependency = no backup when it fails.

Solution: Multi-CDN

upstream cdn_pool {
  server cdn1.cloudflare.com:443 weight=3;
  server cdn2.fastly.com:443 weight=2;
  server cdn3.akamai.com:443 backup;
}

server {
  location /static/ {
    proxy_pass https://cdn_pool;
    proxy_next_upstream error timeout http_503;
  }
}

2) Config Changes Are Code

One config change crashed the internet.

Best Practice: Canary Deployment

deployment:
  strategy: canary
  stages:
    - percent: 1
      duration: 10m
      error_threshold: 0.1%  # Abort if >0.1% errors
    - percent: 10
      duration: 30m
    - percent: 100
      auto_rollback: true

9. Cache Invalidation: The Developer's Eternal Struggle

Phil Karlton said:

"There are only two hard things in Computer Science: cache invalidation and naming things."

My biggest headache: "Deployed new code but users see old version!"

Method 1: Purge (Not Recommended)

# Cloudflare API
curl -X POST "https://api.cloudflare.com/client/v4/zones/{id}/purge_cache" \
  -H "Authorization: Bearer {token}" \
  --data '{"purge_everything":true}'

Problems:

Slow propagation (minutes)
Costs money (AWS: $0.005 per invalidation after 1,000/month)
Race condition: Old version re-cached during purge

Method 2: Versioning (Recommended)

Before: /static/style.css
After:  /static/style.a8b3c9.css (hash)

CDN sees it as a different file → instant update, zero cost.

Webpack automation:

// webpack.config.js
module.exports = {
  output: {
    filename: '[name].[contenthash].js',  // main.a8b3c9.js
  },
  plugins: [
    new MiniCssExtractPlugin({
      filename: '[name].[contenthash].css',
    }),
  ],
}

Every build creates new filenames → No purge needed.

Method 3: Cache-Control Headers

# HTML: Never cache
location ~ \.html$ {
  add_header Cache-Control "no-cache, no-store, must-revalidate";
}

# CSS/JS with hash: Cache forever
location ~ \.(css|js)$ {
  add_header Cache-Control "public, max-age=31536000, immutable";
}

# Images: 1 year
location ~ \.(jpg|png|webp)$ {
  add_header Cache-Control "public, max-age=31536000";
}

Headers explained:

Header	Meaning	Cached Where
`public`	Anyone can cache	CDN, proxies, browsers
`private`	Browser only	Browser
`no-cache`	Revalidate every time	Conditional caching
`no-store`	Never cache	Nowhere (sensitive data)
`max-age=3600`	Valid for 1 hour	Browser + CDN
`s-maxage=3600`	CDN TTL (overrides max-age)	CDN only
`immutable`	Never changes	Browser

My strategy:

HTML: no-cache
CSS/JS: Hash + immutable
Images: 1 year
APIs: private + short TTL

Result: 95% Hit Rate, instant deployments, significant cost reduction (the higher the Hit Rate, the less you pay for origin traffic).

10. Cost Optimization: Slashing Bandwidth Bills

Cloud costs = 60-80% Egress (outbound traffic).

My Bill (Before CDN)

AWS Monthly Bill
────────────────────────
EC2 t3.medium        $30
RDS db.t3.small      $25
Data Transfer OUT    $740 ← 😱
────────────────────────
Total                $795

Bandwidth cost 25x more than servers.

AWS Bandwidth Pricing

EC2 to Internet:
First 10 TB:    $0.09/GB
Next 40 TB:     $0.085/GB
Next 100 TB:    $0.07/GB

My service: 8TB/month

8,000 GB × $0.09 = $720

CloudFront CDN Pricing

CloudFront to Internet:
First 10 TB:    $0.085/GB

Slightly cheaper, but the real magic:

EC2 → CloudFront = Free

[EC2] → [CloudFront]: FREE (same region)
[CloudFront] → [Users]: $0.085/GB

With 95% Cache Hit Rate:

Monthly traffic: 8 TB
Cache Hit: 95%

Origin → CDN: 0.4 TB (5% Miss) → FREE
CDN → Users: 8 TB → $680

Plus:
Reduced Origin traffic → Smaller EC2 instance
t3.medium → t3.small: $30 → $17

Final cost:
CloudFront: $680
EC2: $17
RDS: $25
──────────
Total: $722

With compression (40% reduction):
8TB → 4.8TB
$722 → $450

$795 → $450 = 43% savings

Cloudflare Bandwidth Alliance

Cloudflare + certain storage partners = $0 egress.

Partners:

Backblaze B2
DigitalOcean Spaces
Linode Object Storage

Cost comparison (10TB/month):

AWS S3 + CloudFront:
- S3 storage: $23 (1TB)
- CDN transfer: $850 (10TB)
- Total: $873

Backblaze B2 + Cloudflare:
- B2 storage: $5 (1TB)
- CDN transfer: $0 (Bandwidth Alliance)
- Total: $5 (!)

174x cheaper

I migrated immediately. Monthly cost: $450 → $45.

11. Advanced Topics: The Future of CDN

1) Compute@Edge Evolution

Code running at the Edge is getting more complex:

Current:

Cloudflare Workers: V8 JavaScript/WASM
Fastly Compute@Edge: WASM (Rust, C++, Go)
AWS Lambda@Edge: Node.js, Python

Future:

Full applications at Edge
Database queries at Edge (Cloudflare D1, PlanetScale)
AI inference at Edge

// AI at the Edge
addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
  const ai = new Ai(env.AI)

  const response = await ai.run('@cf/meta/llama-2-7b-chat', {
    prompt: "How to get from Seoul to Busan?"
  })

  return new Response(JSON.stringify(response))
}

// AI runs in Seoul Edge, not US Origin
// Latency: 200ms → 15ms

2) Real-time Optimization

WebSocket, SSE (Server-Sent Events) now go through CDN:

// Cloudflare Durable Objects
export class ChatRoom {
  async fetch(request) {
    const [client, server] = Object.values(new WebSocketPair())

    this.sessions.push(server)

    server.addEventListener('message', event => {
      // Broadcast to all users on same Edge server
      this.sessions.forEach(s => s.send(event.data))
    })

    return new Response(null, { status: 101, webSocket: client })
  }
}

// Korean users chatting → Seoul Edge handles it
// No round-trip to US → 4ms latency

3) Privacy-first CDN

GDPR compliance: Logs stay in their region.

Old:
- Korean user logs → Sent to US → GDPR violation

New:
- Korean user logs → Korean Edge only
- EU user logs → EU Edge only
- No cross-region transfer

Cloudflare already offers Regional Services.

CDN: 넷플릭스가 전 세계에서 버퍼링 없이 재생되는 이유 (완전정복)

관련 포스트

메모리 관리: 연속 할당과 분산 할당

BFS와 DFS: 그래프 탐색의 두 축

브라우저 저장소 완벽 가이드: Cookie, LocalStorage, IndexedDB

퀵 정렬(Quick Sort): 분할 정복의 대표

1. 프롤로그 - "빛의 속도도 느리다"는 사실을 받아들였다

2. 역사 (History) - MIT 교수가 만든 "인터넷 택배 시스템"

1990년대 후반의 악몽: "World Wide Wait"

MIT에서 나온 혁신 - Akamai의 탄생

현재 - 인터넷 트래픽의 절반 이상

3. 핵심 원리 - "본점 말고 지점" 전략을 정리해본다

1) 오리진(Origin) vs 엣지(Edge)

2) 실제 동작 과정 (자세히)

3) Cache Hit Rate: 성공률이 수익을 결정한다

4. 일관된 해싱 (Consistent Hashing) - Akamai의 핵심 특허 한 걸음 더

문제 상황 - 단순 해싱의 재앙

해결 - Consistent Hashing (일관된 해싱)

실제 예시 - DynamoDB의 파티션 분배

5. 엣지 컴퓨팅 (Edge Computing) - 캐시를 넘어선 혁명

기존 방식의 한계

새로운 방식: Serverless at the Edge

실제 코드 - Cloudflare Workers 예시

AWS Lambda@Edge 예시

6. 스트리밍 프로토콜 - 넷플릭스와 유튜브의 비밀을 이해했다

전통적 방식의 문제

HLS/DASH: 잘게 쪼개서 보낸다 (Chunking)

적응형 비트레이트 (Adaptive Bitrate Streaming)

CDN이 핵심인 이유

7. DDoS 방어 - 거대한 방파제로서의 CDN

DDoS 공격 시나리오

CDN이 막아주는 원리

실제 방어 사례: GitHub DDoS (2018)

8. 실제 사례 - 인터넷이 멈춘 날 (Fastly Outage 2021)

원인 - Fastly CDN 장애

왜 이런 일이?

교훈 1: SPOF(Single Point of Failure)

교훈 2 - 설정 배포의 위험성

9. 실제 Lab - Cache Invalidation (캐시 무효화) - 개발자의 영원한 숙제

문제 상황

방법 1 - 강제 퍼지 (Purge/Invalidation) - 비추천

방법 2 - Versioning (권장) - 파일명 바꾸기

방법 3 - Cache-Control 헤더 전략

제가 쓰는 전략 (하이브리드)

10. 비용 최적화 - 대역폭 요금 줄이기

제가 받았던 청구서 (Before CDN)

AWS 대역폭 요금 구조

CloudFront CDN 요금

AWS 내부망 전송은 무료

Cloudflare의 혁신: Bandwidth Alliance

최적화 체크리스트

11. 고급 주제 - CDN의 미래와 트렌드

1) Compute@Edge의 진화

2) Real-time 최적화

3) Privacy-first CDN

12. 용어 사전 (Glossary)

13. FAQ

Q1: CDN은 정적 파일만 처리하나요?

Q2: Cloudflare 무료 플랜과 유료 플랜 차이는?

Q3: 개인 프로젝트에도 CDN이 필요한가요?

Q4: CDN 설정 얼마나 어렵나요?

Q5: CDN이 느려질 수도 있나요?

Q6: Multi-CDN은 언제 필요한가요?

14. 마무리 - 제가 CDN에서 배운 것들을 정리해본다

CDN: How Netflix Streams Globally Without Buffering

1. Prologue: The Day I Learned Physics Beats Code

2. History: How MIT Professors Built the "Internet Delivery Network"

1998: The Birth of Akamai

3. Core Mechanics: "Headquarters vs Branches"

Origin vs Edge Servers

How It Actually Works

Cache Hit Rate: The Metric That Matters

4. Deep Dive: Consistent Hashing - Akamai's Core Patent

The Problem: Naive Hashing Breaks Everything

Solution: Consistent Hashing

5. Edge Computing: Beyond Simple Caching

Old Way: Everything Goes to Origin

New Way: Serverless at the Edge

Real Code: Cloudflare Workers

6. Streaming Protocols: The Netflix Secret