Serverless Architecture: The Complete Guide
1. Prologue: "No Server Management?"
"Deploy code without managing servers?" When I first heard about Serverless, I was skeptical.
Anyone who's used EC2 knows the drill: pick the instance type, set CPU and memory, apply OS patches, configure auto-scaling. More time goes into infrastructure than code. When the server dies, you reboot it yourself, dig through logs, find the cause. That frustration is what got me looking into Serverless.
2. The Confusion: What "Serverless" Actually Means
The Name Fooled Me
When I first heard "Serverless", I completely misunderstood it. "No servers? Then where does the code run?" I was confused. Later, I learned it doesn't mean "no servers" but rather "servers you don't manage." Servers definitely exist. The key is that developers don't have to think about them.
It took me a while to wrap my head around this. Anyone who's used EC2 knows you always have to worry about "instance type," "CPU cores," "memory capacity," "OS patches," etc. But with serverless, you hand all that to AWS. Developers only think about "what does my code do?"
The Metaphor That Clicked: Car Ownership (EC2) vs Uber (Lambda)
This concept finally clicked when someone explained it this way:
- EC2 (IaaS): You buy a car. You pay insurance and taxes even when it's parked. You have to change the oil (patch management). You need a parking spot (static IP). If the car breaks down (server crashes), you have to fix it yourself.
- Lambda (FaaS): You use Uber. You only pay when you ride. The driver (AWS) handles maintenance. If 1,000 people need rides simultaneously, 1,000 Ubers show up. If nobody rides, you pay $0.
This metaphor made me slap my knee. "Ah, so it's perfect for services with irregular traffic." For a service where users only flood in at 9 AM daily, EC2 forces you to run expensive instances 24/7, but Lambda only executes at 9 AM and costs $0 the rest of the time.
3. Core Architecture: FaaS and Event-Driven Design
The heart of serverless is FaaS (Function as a Service). Instead of running a giant monolithic server, you deploy individual functions to the cloud.
How It Works (Event-Driven)
Functions sit dormant until an event wakes them up. Like firefighters waiting at the station until a fire alarm goes off.
- HTTP Request: User calls API → API Gateway wakes Lambda.
- DB Change: Data saved to DynamoDB → Stream wakes Lambda.
- File Upload: Photo uploaded to S3 → Lambda wakes.
- Time: Every night at midnight (Cron) → Lambda wakes.
"No server runs 24/7. When an event occurs, it executes. When done, it vanishes."
When I understood this philosophy, I realized why I'd been suffering with server monitoring. EC2 always runs, so you have to watch if it's alive or dead. But Lambda only exists when needed, so there's nothing to monitor.
Real Service Example: API Gateway + Lambda + DynamoDB
Take a signup API as an example. The natural first instinct is to deploy an Express.js server on EC2. But signup requests only come in a few times a day — running a server 24/7 for that is wasteful. Switching to Lambda looks like this:
// Lambda Function: Handle Signup
const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB.DocumentClient();
const bcrypt = require('bcryptjs');
exports.handler = async (event) => {
const body = JSON.parse(event.body);
const { email, password, name } = body;
// 1. Check for duplicate email
const existingUser = await dynamodb.get({
TableName: 'Users',
Key: { email }
}).promise();
if (existingUser.Item) {
return {
statusCode: 400,
body: JSON.stringify({ error: 'Email already exists.' })
};
}
// 2. Hash password
const hashedPassword = await bcrypt.hash(password, 10);
// 3. Save to DynamoDB
await dynamodb.put({
TableName: 'Users',
Item: {
email,
password: hashedPassword,
name,
createdAt: new Date().toISOString()
}
}).promise();
return {
statusCode: 201,
body: JSON.stringify({ message: 'Signup successful' })
};
};
API Gateway Connection:
POST /signup → Lambda (signup function) → DynamoDB
For low-traffic APIs with this pattern, cost reductions from $50/month down to $2/month are reportedly common. The less regular your traffic, the more serverless works in your favor.
4. Real Lab: Automatic Thumbnail Generator
This is the most classic serverless example. Building this made me really understand "ah, this is serverless."
Scenario
When users upload profile pictures, automatically generate small thumbnails.
Traditional Approach (EC2)
- Image processing server must run 24/7.
- Costs money even when no users.
- If 10,000 users upload simultaneously, server crashes.
Serverless Approach (Lambda + S3)
Code (Node.js):
const sharp = require('sharp');
const aws = require('aws-sdk');
const s3 = new aws.S3();
exports.handler = async (event) => {
// 1. Extract file info from event
const bucket = event.Records[0].s3.bucket.name;
const key = event.Records[0].s3.object.key;
// 2. Download original from S3
const image = await s3.getObject({ Bucket: bucket, Key: key }).promise();
// 3. Resize (in-memory processing)
const resized = await sharp(image.Body).resize(200, 200).toBuffer();
// 4. Save back to S3
await s3.putObject({
Bucket: bucket,
Key: `thumbnails/${key}`,
Body: resized
}).promise();
};
Results:
- Cost: About $0.0002 per image. (If no uploads, $0).
- Scalability: Even if 1 million users upload simultaneously, AWS automatically spins up 1 million Lambdas for parallel processing.
When I first deployed this code, what amazed me was I never launched a server. I just uploaded code, and when files hit S3, it ran automatically. "Ah, this is event-driven," I understood.
5. Deployment Automation: SAM vs Serverless Framework
At first, I manually created Lambda functions in the AWS Console. But when functions grew to 10, then 20, it became hell. So I adopted Infrastructure as Code (IaC).
AWS SAM (Serverless Application Model)
AWS's official tool. Think of it as an extension of CloudFormation.
template.yaml:
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:
ThumbnailFunction:
Type: AWS::Serverless::Function
Properties:
Handler: index.handler
Runtime: nodejs18.x
MemorySize: 512
Timeout: 60
Events:
S3Event:
Type: S3
Properties:
Bucket: !Ref ImageBucket
Events: s3:ObjectCreated:*
Filter:
S3Key:
Rules:
- Name: prefix
Value: uploads/
ImageBucket:
Type: AWS::S3::Bucket
Deploy:
sam build
sam deploy --guided
What I appreciated about SAM:
- Manage Lambda, API Gateway, DynamoDB in one YAML file.
- Test locally with
sam local start-api. - CloudFormation-based, so perfect fit with AWS.
Serverless Framework (Third-party)
More concise and supports multi-cloud (AWS, GCP, Azure).
serverless.yml:
service: thumbnail-service
provider:
name: aws
runtime: nodejs18.x
region: ap-northeast-2
functions:
thumbnail:
handler: handler.thumbnail
events:
- s3:
bucket: my-image-bucket
event: s3:ObjectCreated:*
rules:
- prefix: uploads/
resources:
Resources:
ImageBucket:
Type: AWS::S3::Bucket
Deploy:
serverless deploy
I started with SAM, then switched to Serverless Framework when multi-cloud support became necessary. Both are good. If you're AWS-only, SAM works. For multi-cloud, Serverless Framework made sense to me.
6. The Fatal Flaw: Cold Start
Nothing's free. Serverless's biggest weakness is Cold Start. I pulled my hair out over this problem for a while.
What is Cold Start?
When a function hasn't run for a long time, AWS freezes the container to save resources. When a request comes in this state:
- Prepare new container (boot)
- Download code
- Initialize runtime (Node, Python)
- Execute code
This process takes 0.5 ~ 3 seconds. Users click and stare blankly for 3 seconds.
If you build a login API with Lambda, the first user to log in after a quiet period has to wait 3 seconds. They think "Is the server broken?" and refresh. Cold Start directly hurts UX — that's the trade-off you're making.
Solution 1: Provisioned Concurrency (Paid)
Pay extra to "always keep one warm." Cheaper than EC2 but not $0.
aws lambda put-provisioned-concurrency-config \
--function-name my-function \
--provisioned-concurrent-executions 1
This keeps one container warm at all times. Cost is about $0.015/hour (roughly $10/month).
Solution 2: Keep-Alive Ping (Free)
Run a bot that pokes the function every 5 minutes to keep it awake.
// CloudWatch Events (EventBridge) runs every 5 minutes
exports.handler = async () => {
console.log('Keep-alive ping');
return 'OK';
};
This method is free but not perfect. If multiple containers are running, some might still be cold.
Solution 3: Language Choice
Java is very slow due to JVM loading. Node.js, Go, and Python are better for Cold Start.
| Language | Average Cold Start |
|---|---|
| Node.js | ~200ms |
| Python | ~250ms |
| Go | ~150ms |
| Java | ~2000ms |
Seeing this table, I abandoned Java for Node.js. Language choice is also a cost issue, I realized.
7. The Stateless Principle
Lambda functions lose all memory when execution ends. I really struggled at first not knowing this.
Bad (code I actually wrote):
let count = 0;
exports.handler = async () => {
count++; // Expected: 1, 2, 3... Actual: might be 1 every time (new container starts at 0)
return count;
};
When I deployed this code and tested it, the first request returned 1, and the second also returned 1. "What?" I was confused, but turns out the second request ran in a new container. Lambda doesn't reuse the same container every time.
Good (store in DynamoDB):
const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB.DocumentClient();
exports.handler = async () => {
// 1. Read current count
const result = await dynamodb.get({
TableName: 'Counter',
Key: { id: 'global' }
}).promise();
const currentCount = result.Item ? result.Item.count : 0;
// 2. Increment by 1
await dynamodb.put({
TableName: 'Counter',
Item: { id: 'global', count: currentCount + 1 }
}).promise();
return currentCount + 1;
};
Now it returns the correct number every time. I learned the hard way: state must be stored in external storage (DynamoDB, Redis, S3).
8. Complex Workflows: Orchestration with Step Functions
Individual functions are simple, but connecting multiple functions gets complex. Consider an "order processing" workflow:
- Validate payment (Lambda 1)
- Check inventory (Lambda 2)
- Request shipping (Lambda 3)
- Send email (Lambda 4)
How do you connect these? At first, I had Lambda 1 call Lambda 2, Lambda 2 call Lambda 3, etc. But error handling becomes hell. If Lambda 2 errors, how do you rollback?
So I adopted AWS Step Functions.
State machine definition:
{
"Comment": "Order processing workflow",
"StartAt": "ValidatePayment",
"States": {
"ValidatePayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:validate-payment",
"Next": "CheckInventory",
"Catch": [{
"ErrorEquals": ["PaymentError"],
"Next": "PaymentFailed"
}]
},
"CheckInventory": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:check-inventory",
"Next": "RequestShipping",
"Catch": [{
"ErrorEquals": ["OutOfStock"],
"Next": "RefundPayment"
}]
},
"RequestShipping": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:request-shipping",
"Next": "SendEmail"
},
"SendEmail": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:send-email",
"End": true
},
"PaymentFailed": {
"Type": "Fail",
"Error": "PaymentError",
"Cause": "Payment validation failed"
},
"RefundPayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:refund",
"Next": "OrderFailed"
},
"OrderFailed": {
"Type": "Fail",
"Error": "OrderError",
"Cause": "Order processing failed"
}
}
}
With Step Functions, you can visualize workflows and automatically trigger rollback logic when errors occur. Using this made me realize "ah, serverless can handle complex logic too."
9. The Cost Bomb Danger (DDoS)
Serverless's "pay what you use" advantage is also its "pay whatever gets used" weakness. I almost got hit with a billing bomb from this.
Real-World Case: Bot Spamming the API
There are documented cases of this happening. A bot hits an API 1,000 times per second. Lambda faithfully spins up 1,000 functions, and by the end of the day the bill is $500 — for a service that normally costs about $10/month.
If it were EC2, the server would just crash. Lambda treats every request as legitimate and scales without limit. That's the other side of infinite scalability.
Defense: API Gateway Throttling
You must set rate limiting (Throttling) on API Gateway.
# serverless.yml
provider:
apiGateway:
throttle:
rateLimit: 100 # Max 100 requests per second
burstLimit: 200 # Max burst 200 requests
After this setting, no matter how much the bot called, only 100/second were processed and the rest got 429 Too Many Requests. Billing returned to normal.
This was it: Serverless can scale infinitely, which means infinite billing is possible. You must set limits.
10. Cost Comparison: EC2 vs Lambda (Real Case)
Here's a cost comparison for an "image resizing API" running on EC2 vs Lambda.
Traffic Pattern
- Normal: 1,000 requests/day
- Marketing events: 100,000 requests/day (3 days per month)
EC2 Cost (t3.small, $0.023/hour)
- Base cost: $0.023 × 24 hours × 30 days = $16.56/month
- Event response: Scale up to 10 instances needed → $11.04 extra for 3 days
- Total: $27.60/month
Lambda Cost
- Normal: 1,000 requests × 27 days = 27,000 requests
- Events: 100,000 requests × 3 days = 300,000 requests
- Total: 327,000 requests
- Cost: (327,000 - 1 million free tier) = 0 requests (within free tier)
- Execution time: 200ms average, 512MB memory
- Total: $0.80/month
For services with this kind of irregular traffic pattern, Lambda is overwhelmingly better on cost. The more uneven the traffic spikes, the stronger the case for serverless.
11. Lambda@Edge: Run Functions at CDN Edge
The last thing I want to introduce is Lambda@Edge. This runs Lambda at CloudFront (CDN) edge locations.
Use Case: A/B Testing
When users visit the website, I wanted 50% to see version A and 50% to see version B.
Lambda@Edge Code:
exports.handler = async (event) => {
const request = event.Records[0].cf.request;
const headers = request.headers;
// If no variant cookie, randomly assign
if (!headers.cookie || !headers.cookie[0].value.includes('variant=')) {
const variant = Math.random() < 0.5 ? 'A' : 'B';
request.headers['x-variant'] = [{ key: 'X-Variant', value: variant }];
}
return request;
};
Connect this code to CloudFront and it processes before reaching the origin server at the edge. Latency is nearly zero.
What struck me: "Lambda isn't just in a central region, it can be distributed worldwide."
12. Container-Based Lambda: Docker Support
In 2020, AWS added Container Image Support for Lambda. You can now deploy Lambda functions as Docker containers up to 10GB.
Why This Matters
Before this, Lambda had strict limits on deployment package size (250MB unzipped). For ML models or apps with heavy dependencies, this was a nightmare. Now you can package everything in a Docker image.
Example: Running a PyTorch Model in Lambda
FROM public.ecr.aws/lambda/python:3.9
# Install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt
# Copy model and code
COPY model.pth .
COPY app.py .
CMD ["app.handler"]
app.py:
import torch
import json
model = torch.load('model.pth')
def handler(event, context):
input_data = json.loads(event['body'])
prediction = model(input_data)
return {
'statusCode': 200,
'body': json.dumps({'prediction': prediction.tolist()})
}
Deploy:
docker build -t my-ml-function .
docker tag my-ml-function:latest 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-ml-function:latest
docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-ml-function:latest
aws lambda create-function --function-name my-ml-function --package-type Image --code ImageUri=123456789012.dkr.ecr.us-east-1.amazonaws.com/my-ml-function:latest --role arn:aws:iam::123456789012:role/lambda-role
This opened up serverless for ML inference workloads. I was skeptical at first (won't Cold Start kill performance?), but with Provisioned Concurrency, it works surprisingly well.
13. Wrap-Up: Serverless Isn't a Silver Bullet
After studying serverless and building with it, my conclusion is this:
Serverless is Great For:
- Services with irregular traffic (event ticketing, marketing campaigns)
- Background jobs (image processing, data ETL)
- Microservices architecture
- Fast prototyping
EC2 is Better For:
- Always-steady traffic (Netflix streaming)
- Long-running connections (WebSocket)
- Tasks taking over 15 minutes
- Real-time services where Cold Start is fatal (game servers)
A hybrid approach makes the most sense. APIs on Lambda, WebSocket chat on EC2, data processing on Lambda, ML training on EC2. Mixing them based on workload characteristics is how you get the best of both on cost and performance.
This was the lesson: You don't need "everything on serverless." Use the right tool for the job.
14. Glossary
- FaaS (Function as a Service): Cloud service model where you deploy and run individual functions. (AWS Lambda, Google Cloud Functions).
- BaaS (Backend as a Service): Services providing backend features (DB, Auth) as APIs. (Firebase, Supabase).
- Cold Start: Delay caused by initialization when a dormant function runs for the first time.
- Vendor Lock-in: State where you're too dependent on specific cloud (AWS) features to easily migrate to other clouds (Azure, GCP).
- Provisioned Concurrency: Feature to pre-warm containers to prevent Cold Start (paid).
- Idempotency: Same request sent multiple times produces same result. (Important because network errors might cause Lambda to execute twice).
- Step Functions: Orchestration service connecting multiple Lambda functions in workflows.
- Lambda@Edge: Lambda functions running at CloudFront edge locations.
15. FAQ
- Q: Does Netflix use serverless?
- A: Yes, but mainly as "glue". Video encoding and main streaming servers run on EC2, while file management, log processing, backups use Lambda heavily. You don't need everything serverless. Hybrid is the answer.
- Q: What's the 15-minute limit?
- A: AWS Lambda has a maximum 15-minute execution time per invocation. It forcibly terminates after 15 minutes. So it's unsuitable for long video transcoding or deep learning training. (Use AWS Batch or EC2 instead).
- Q: Can you use Docker containers in Lambda?
- A: Yes, since 2020 Container Image Support was added. You can upload images up to 10GB. Useful for complex dependencies (ML libraries, etc).
- Q: How does pricing actually work?
- A: You pay for number of requests (first 1 million/month free, then $0.20 per million) and compute time (charged per 1ms of execution, price varies by memory allocated). For example: 512MB function running 200ms = $0.00001667 per invocation.