내 서버를 내가 DDOS 칠 뻔했다 (Rate Limiting 완벽 가이드)

1. 문지기 없는 서버의 최후

Rate Limiting을 처음 공부하게 된 건 단순한 상상에서 시작했다.

"선착순 100명에게 치킨 쿠폰 지급!" 같은 이벤트를 하면 어떤 일이 벌어질까?

실제 사례는 흔하다. 대규모 마케팅 메시지가 발송되면, 사용자들은 동시에 새로고침(F5)을 누르기 시작한다. 특정 IP 몇 개가 1초에 API를 500번씩 호출한다. 북한 해커일까? 경쟁사의 공격일까?

아니다. 그냥 "쿠폰 받으려고 새로고침을 광클하는 순수한 사용자"들이다. 서버에 문지기(Rate Limiter)가 없으면, 누구든 요청을 보내는 족족 다 받아주다가 과로사한다. 모니터링 대시보드는 빨간불로 도배되고, CPU는 100%를 찍고, 데이터베이스 커넥션 풀은 고갈된다. 사용자가 곧 DDOS 공격자가 되는 셈이다.

2. Rate Limiting이 필수인 3가지 이유

API 서버는 공공재가 아닙니다. 한정된 리소스(CPU, 메모리, DB 커넥션)를 나눠 써야 합니다. Rate Limiter는 선택이 아니라 시스템의 생명보험입니다.

DDOS 및 Brute Force 방어: 해커가 비밀번호를 무작위로 대입하거나(Brute Force), 서버를 마비시키려 할 때(DDOS) 가장 먼저 막아주는 방패입니다.
Noisy Neighbor (시끄러운 이웃) 문제 해결: 한 명의 헤비 유저가 리소스를 독점해서 나머지 99명의 일반 유저가 접속을 못 하는 상황을 막습니다. (공정성 보장)
비용 절감: 오토스케일링이 무한정 늘어나는 것을 방지합니다. 또한, SMS 발송이나 유료 API 호출 같은 비용이 드는 로직을 보호합니다.

3. 핵심 알고리즘 - 어떻게 막을까?

Rate Limiting을 구현하는 방법은 여러 가지가 있습니다. 상황에 맞는 알고리즘을 골라야 합니다.

1) Token Bucket (토큰 버킷) - 아마존이 쓰는 방식

가장 널리 쓰이고 이해하기 쉽습니다. EC2의 CPU 크레딧도 이 방식입니다.

원리:
- 버킷(양동이)에 1초마다 토큰이 10개씩 채워집니다.
- 버킷의 최대 크기는 100개입니다 (넘치면 버려짐).
- API 요청이 올 때마다 토큰을 하나 꺼내 씁니다.
- 토큰이 없으면 요청을 거절(429 Too Many Requests)합니다.
장점: 짧은 시간의 트래픽 폭주(Burst)를 허용합니다. 사용자가 잠시 쉬었다가 한 번에 많이 요청해도 처리해 줍니다.

2) Leaky Bucket (구멍 난 양동이) - Nginx 기본

원리:
- 요청이 오면 큐(양동이)에 담습니다.
- 양동이 밑 구멍으로 일정한 속도로 요청이 처리됩니다.
- 양동이가 꽉 차면 넘친 요청은 버립니다.
장점: 서버 처리 속도를 일정하게 유지(Traffic Smoothing)하여 DB 부하를 예방합니다.
단점: Burst 트래픽 처리가 느립니다.

3) Fixed Window (고정 윈도우) - 가장 구현하기 쉬움

원리: "1분당 100개 허용"
치명적 단점 (Boundary Issue):
- 12:00:59에 100개 요청.
- 12:01:00에 카운터 초기화 후 다시 100개 요청.
- 결과적으로 2초 만에 200개가 들어와서 서버가 죽을 수 있습니다.

4) Sliding Window Log / Counter - 가장 완벽함

Fixed Window의 단점을 해결하기 위해 윈도우를 시간 흐름에 따라 이동시킵니다.

Counter 방식: 이전 윈도우의 요청 횟수와 현재 윈도우의 요청 횟수를 가중 평균 내서 계산합니다. Redis로 구현하기 좋습니다.

4. 실제 구현 - Redis + Lua Script (Atomic 보장)

분산 환경(서버가 여러 대)에서는 메모리(In-Memory)에서 카운트하면 안 됩니다. 서버 A의 카운트와 서버 B의 카운트가 공유되지 않기 때문입니다. 이때 Redis가 정답입니다. 하지만 Redis 명령어를 여러 번 쓰면 Race Condition이 발생할 수 있습니다.

해결책은 Lua Script입니다. Redis 안에서 스크립트가 실행되는 동안은 다른 명령어가 끼어들지 못합니다(Atomic).

-- redis-rate-limit.lua
local key = KEYS[1] -- rate_limit:user:123
local limit = tonumber(ARGV[1]) -- 100회
local window = tonumber(ARGV[2]) -- 60초

local current = redis.call('get', key)

if current and tonumber(current) >= limit then
    return 0 -- 차단
else
    redis.call('incr', key)
    if not current then
        redis.call('expire', key, window) -- 처음일 때 만료 시간 설정
    end
    return 1 -- 허용
end

Node.js (NestJS/Express)에서 이렇게 씁니다:

const isAllowed = await redis.eval(
  luaScript,
  1,
  `rate_limit:${userId}`,
  100, // limit
  60   // window seconds
);

if (!isAllowed) {
  throw new HttpException('Too Many Requests', 429);
}

5. 차단당한 사용자 달래기 (UX)

무작정 429 에러만 던지면 사용자는 "뭐야? 서버 고장 났나?" 하고 더 광클을 합니다. 친절하게 알려줘야 합니다.

1) Retry-After 헤더

응답 헤더에 "이만큼 기다렸다가 다시 오세요"라고 알려줍니다.

HTTP/1.1 429 Too Many Requests
Retry-After: 30

(30초 뒤에 다시 시도해라)

2) 클라이언트의 지수 백오프 (Exponential Backoff)

프론트엔드나 모바일 앱에서는 429를 받으면 바로 재시도하면 안 됩니다. 1초 후, 2초 후, 4초 후, 8초 후... 이렇게 대기 시간을 2배씩 늘려가며 재시도해야 합니다. 이것이 네트워크 예절(Etiquette)입니다.

6. GraphQL에서의 Rate Limiting 깊이 들여다보기

REST API는 "요청 수"로 제한하면 되지만, GraphQL은 다릅니다. 단 한 번의 요청으로 100만 개의 데이터를 가져올 수 있기 때문입니다. ("Nested Query 공격")

그래서 GraphQL에서는 "복잡도(Complexity) 기반" 제한을 걸어야 합니다.

User 필드: 1점
Posts 필드: 5점
Comments 필드: 10점

요청이 들어오면 쿼리의 총점을 계산하고, 그 점수만큼 토큰을 차감하는 방식입니다. 이를 통해 "무거운 쿼리"를 날리는 사용자를 효과적으로 제어할 수 있습니다.

6. 마무리 - 서버에도 안전벨트가 필요하다

Rate Limiting을 적용하면, 이벤트 때 서버가 죽지 않는다. 광클하는 사용자에게는 "잠시 후 다시 시도해주세요" 라는 메시지를 보내고, 일반 사용자들은 쾌적하게 서비스를 이용한다.

API를 만들고 있다면 기억하자. 사용자를 믿지 마라. 그들은 (악의가 있든 없든) 서버를 부수러 온다. Rate Limiter는 선택이 아니라 필수다.

How I Accidentally DDOSed My Own Server (The Ultimate Rate Limiting Guide)

1. A Server Without a Bouncer

Rate Limiting clicked for me when I thought through a simple scenario.

Imagine sending a push notification: "First 100 people get a free Chicken Coupon!"

What happens next is predictable. Users immediately start hammering the Refresh button. A few IP addresses end up hitting the API 500 times per second. North Korean hackers? Corporate espionage?

No. Just "users spamming Refresh to get the coupon."

A server without a Bouncer (Rate Limiter) politely tries to process every single request until it collapses. Dashboards turn red, CPU hits 100%, DB connection pools exhaust. In effect, well-meaning users become a Distributed Denial of Service (DDOS) attack. This scenario is common enough that Rate Limiting isn't optional — it's infrastructure.

2. Three Reasons Why Rate Limiting is Mandatory

API servers are shared resources. If one person hogs the CPU/DB, other 99 people can't connect. Rate Limiting is System Life Insurance.

Defense against DDOS & Brute Force: It stops hackers from guessing passwords (filling 1000s of combos) or flooding the server.
Solving the Noisy Neighbor Problem: Guarantees fairness so that one heavy user doesn't degrade the experience for everyone else.
Cost Control: Prevents auto-scaling bills from exploding ($$$) and protects expensive 3rd party API calls (like SMS or AI tokens).

3. Core Algorithms: How to Block?

1) Token Bucket (Amazon's Choice)

Most common and easy to understand. AWS EBS Burst Balance uses this.

Logic:
- A bucket refills tokens at a fixed rate (e.g., 10 tokens/sec).
- Max bucket size is 100.
- Each API call consumes 1 token.
- No tokens = 429 Too Many Requests.
Pro: Allows Traffic Bursts. Users can be inactive for a while and then make a burst of requests (up to max bucket size).

2) Leaky Bucket (Nginx Default)

Logic:
- Requests enter a queue (bucket).
- They "leak" out (are processed) at a constant rate.
- If the bucket is full, new requests are discarded.
Pro: Traffic Smoothing. Ensures a stable load on the database.
Con: Slows down valid burst traffic.

3) Fixed Window (The Simplest)

Logic: "100 requests per 1 minute window".
The Critical Flaw (Boundary Issue):
- User sends 100 requests at 12:00:59.
- Window resets at 12:01:00.
- User sends another 100 requests at 12:01:01.
- Result: 200 requests in 2 seconds. This can crash the server.

4) Sliding Window Counter (The Best)

Solves the boundary issue by calculating a weighted average of the previous window and current window. This is the industry standard for production.

4. Implementation: Redis + Lua Script (Atomic)

In a distributed system (multiple servers), you cannot store the counter in local memory variables. You need a shared store like Redis. However, typical Redis operations (GET then INCR) suffer from Race Conditions.

The solution is Lua Script. Scripts executed inside Redis are Atomic (no other commands can interrupt them).

-- redis-rate-limit.lua
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])

local current = redis.call('get', key)

if current and tonumber(current) >= limit then
    return 0 -- Block
else
    redis.call('incr', key)
    if not current then
        redis.call('expire', key, window) -- Set TTL if first time
    end
    return 1 -- Allow
end

Using this in Node.js ensures perfectly accurate counting even with high concurrency.

5. Handling 429 Errors (UX Patterns)

Don't just throw an error. Tell the user when to come back.

1) Retry-After Header

Standard HTTP header that tells the client how many seconds to wait.

HTTP/1.1 429 Too Many Requests
Retry-After: 30

2) Client-Side Exponential Backoff

If you are building the frontend or mobile app, NEVER retry immediately on a 429. Use Exponential Backoff:

Wait 1s, retry.
Wait 2s, retry.
Wait 4s, retry.
Wait 8s, retry. This is "Network Etiquette" prevents your client from DDoSing your own server again.

6. Conclusion: Seatbelts for Servers

After applying Rate Limiting, my server survived the next marketing event. Spammers received a polite "429 Too Many Requests", while normal users redeemed their coupons smoothly.

If you are building an API, remember: Never trust the client. Whether intentional or accidental, they have the power to crush your infrastructure. Rate Limiting is not a feature; it is a necessity.

7. Advanced: Sliding Window Log Algorithm

While "Sliding Window Counter" is efficient, it's an approximation. For 100% accuracy, we use Sliding Window Log.

Logic: Keep a sorted set (Redis ZSET) of timestamps for each user.
Process:
1. Remove all timestamps older than the window (e.g., older than 1 minute).
2. Count the remaining timestamps.
3. If count < limit, add current timestamp and allow.
4. Else, block.
Pros: Perfectly accurate. No boundary issues.
Cons: Expensive RAM usage. Storing 1 million timestamps takes a lot of memory compared to a single integer counter. Use this only for strict limits (e.g., login attempts).

8. Advanced: Rate Limiting in Microservices (Redis Cluster)

In a huge system, a single Redis instance might become the bottleneck. You can use a Redis Cluster or share the load using consistent hashing based on User ID. However, ensure your Lua scripts run on the correct shard where the User's key resides. Also, consider Local Caching (in-memory) for extremely hot keys (like a global DDOS attack IP), syncing with Redis asynchronously to save network calls.

9. Scenarios: Rate Limiting in GraphQL

GraphQL is tricky because one HTTP request can query the entire database. "100 requests per minute" doesn't work if one request has a complexity cost of 10,000. Solution: Cost Analysis Rate Limiting.

Assign points to each field (e.g., User = 1 point, Posts = 5 points).
Calculate the Total Complexity Score of the incoming query.
Deduct this score from the user's bucket.
Block if the bucket is empty.

10. Business Strategy: Dynamic Rate Limits

Rate Limiting isn't just for security; it's a business model.

Guest: 10 req/min (Prevent abuse)
Free User: 100 req/min (Standard usage)
Premium User: 1000 req/min (Paid perk)
Whitelisted Partner: Unlimited (B2B contract)

Implement this by checking the User's Role/Plan in your middleware before checking the Redis counter.

11. Common Mistakes to Avoid

Blocking SEO Bots: Don't rate limit Googlebot or Bingbot. Check their User-Agent (and verify IP).
Blocking Static Assets: Don't rate limit styles.css or logo.png. Only limit API endpoints.
Shared IP Issues: Be careful when limiting by IP only. Users behind a Corporate NAT or University Wifi might share one IP. If you block that IP, you block 1000 users. Use IP + UserID if possible.

내 서버를 내가 DDOS 칠 뻔했다 (Rate Limiting 완벽 가이드)

관련 포스트

클린 아키텍처(Clean Architecture): 변하지 않는 핵심을 지켜라

브라우저 저장소 완벽 가이드: Cookie, LocalStorage, IndexedDB

내 코드를 훔쳐보지 마세요 (난독화와 Release 에러)

스택(Stack)과 큐(Queue): 개발자가 줄을 서는 방법

내 서버를 내가 DDOS 칠 뻔했다 (Rate Limiting 완벽 가이드)

1. 문지기 없는 서버의 최후

2. Rate Limiting이 필수인 3가지 이유

3. 핵심 알고리즘 - 어떻게 막을까?

1) Token Bucket (토큰 버킷) - 아마존이 쓰는 방식

2) Leaky Bucket (구멍 난 양동이) - Nginx 기본

3) Fixed Window (고정 윈도우) - 가장 구현하기 쉬움

4) Sliding Window Log / Counter - 가장 완벽함

4. 실제 구현 - Redis + Lua Script (Atomic 보장)

5. 차단당한 사용자 달래기 (UX)

1) Retry-After 헤더

2) 클라이언트의 지수 백오프 (Exponential Backoff)

6. GraphQL에서의 Rate Limiting 깊이 들여다보기

6. 마무리 - 서버에도 안전벨트가 필요하다

How I Accidentally DDOSed My Own Server (The Ultimate Rate Limiting Guide)

1. A Server Without a Bouncer

2. Three Reasons Why Rate Limiting is Mandatory

3. Core Algorithms: How to Block?

1) Token Bucket (Amazon's Choice)

2) Leaky Bucket (Nginx Default)

3) Fixed Window (The Simplest)

4) Sliding Window Counter (The Best)

4. Implementation: Redis + Lua Script (Atomic)

5. Handling 429 Errors (UX Patterns)

1) Retry-After Header

2) Client-Side Exponential Backoff

6. Conclusion: Seatbelts for Servers

7. Advanced: Sliding Window Log Algorithm

8. Advanced: Rate Limiting in Microservices (Redis Cluster)

9. Scenarios: Rate Limiting in GraphQL

10. Business Strategy: Dynamic Rate Limits

11. Common Mistakes to Avoid