The End of "It Works on Localhost"
The most feared phrase in development might be "It works on localhost," but I found one that's scarier: "It works on Vercel."
I encountered this nightmare while migrating a Next.js service from Vercel to AWS. In the Vercel days, development was smooth. git push was all we needed for deployment, domain connection, SSL, and CDN. No dedicated DevOps engineer required.
Next.js's killer feature, ISR (Incremental Static Regeneration), felt like magic. It kept the speed of a Static Site while automatically serving fresh pages when data changed (Dynamic).
export async function getStaticProps() {
const res = await fetch('https://api.example.com/posts');
const posts = await res.json();
return {
props: { posts },
revalidate: 60, // Refresh every 60 seconds!
};
}
Just this one line, revalidate: 60, was enough. Time to focus solely on business logic.
But as the service grew and traffic spiked, the Vercel bill became heavy. Programs like AWS Activate offer generous credits for startups — a natural trigger for the "let's move to AWS to save costs and own our infra!" decision.
EC2 instances, Docker images, Application Load Balancer (ALB), CI/CD pipelines. Everything looked perfect.
Until a critical report arrived 3 hours before launch.
"I updated the announcement in the admin panel, but the homepage still shows the old content. If I keep refreshing, sometimes the new content appears, and sometimes the old one returns."
Excuse me? Sometimes it works, sometimes it doesn't? This wasn't a DB issue or a frontend bug. It was a classic "Ghost" bug.
Hunting the Culprit: The Vanishing Update
At first, I thought it was a browser cache issue. I checked 'Disable cache' in DevTools. No change. Then I suspected the CDN (CloudFront) and ran an Invalidation. Still the same.
I checked the server logs. The ISR regeneration was definitely triggered (Generating static pages...). The DB query returned the correct new data. So why were users seeing data from 3 days ago?
To understand this phenomenon, I had to dig into the "Physical Reality" of how Next.js handles ISR.
The Magic of Vercel: Serverless & Global Cache
When using Vercel, ISR utilizes a "Global Cache."
- User requests a page.
- Vercel Edge Network (CDN) receives it.
- If the cache is stale (
revalidatetime passed), a Background Serverless Function runs. - It generates new HTML/JSON.
- This result overwrites Vercel's Centralized Data Store.
- Edge Nodes worldwide pull the new file from this central store.
In Vercel, "State is Centralized." Developers don't need to worry about it.
The Reality of Docker: Local File System
But running a Docker container on AWS EC2 is a different story. When you run a Next.js server (npm start) inside Docker, it defaults to using the Local File System for caching.
Specifically, the .next/cache/ and .next/server/pages/ directories.
When ISR triggers, the Node.js process fetches data from the DB, bakes a new HTML file, and writes it to its own hard disk.
Here lies the critical difference we overlooked.
The Tragedy of Distributed Systems: "Servers with Split Memories"
For High Availability (HA), we had 2 servers running behind an ALB (Auto Scaling Group).
Let's reconstruct the scenario.
- 1:00 PM: I change the announcement title from "Maintenance Notice" to "Maintenance Complete".
- 1:01 PM: User A visits. The Load Balancer sends traffic to Server 1.
- Server 1: "Cache expired (60s passed). Fetching from DB and baking." (ISR runs)
- Server 1's Disk:
index.html(Title: "Maintenance Complete") -> Updated ✅ - User A: "Oh, maintenance is over."
- Server 1's Disk:
- 1:02 PM: User B visits. The Load Balancer sends traffic to Server 2 this time (Round Robin).
- Server 2: "My cache on disk is still valid." (Or maybe it hasn't been triggered yet, so it holds the old file).
- Server 2's Disk:
index.html(Title: "Maintenance Notice") -> Old Data ❌ - User B: "Still under maintenance?"
- Server 2's Disk:
What if User A refreshes?
- Refresh 1 -> Hits Server 1 -> "Maintenance Complete"
- Refresh 2 -> Hits Server 2 -> "Maintenance Notice"
From the user's perspective, they are stuck in a time loop, jumping between the past and present. This was the identity of the "sometimes works, sometimes doesn't" ghost bug.
There is a worse scenario: Deployment.
Docker-based deployments usually follow an Immutable Infrastructure strategy. A new version kills the old container and starts a new one from a fresh image.
When the container dies, the file system (.next/cache) inside it evaporates.
All the fresh data generated by ISR via user visits? Reset to zero. Back to the Build Time snapshot.
Exploring Solutions: "Sharing the Diary"
The core problem is "Servers are keeping separate diaries (caches)." The solution is simple: "Use a Shared Diary (Shared Storage)."
We needed an external storage accessible by all servers.
Attempt 1: AWS EFS (Elastic File System) - FAIL
First thought was EFS, a network file system. Multiple EC2s can share a folder. Why not mount .next/cache to EFS?
I tried it.
Result? Too slow. EFS works over the network, so I/O latency is higher than local disk. Next.js build and runtime performance dropped significantly. Plus, configuration is messy.
Attempt 2: Sticky Session - FAIL
Configuring the Load Balancer to "Send User A always to Server 1." User A sees the fresh version. But this isn't a root fix. Server 2 might never update. If Server 1 dies, User A is thrown into the past (Server 2).
Attempt 3: Redis (SUCCESS)
The answer was Redis, an In-Memory Data Store. Fast, accessible by all servers, and its Key-Value structure is perfect for caching.
Fortunately, Next.js (13.4+) provides an official API called CacheHandler to customize internal cache logic. We can use this to tell Next.js: "Read and write to Redis instead of the file system."
Implementation: Next.js CacheHandler with Redis
No need to write from scratch. There is an excellent open-source library @neshca/cache-handler.
1. Install Packages
npm install @neshca/cache-handler redis
2. Configure Handler (cache-handler.mjs)
Create this file in your project root.
import { CacheHandler } from '@neshca/cache-handler';
import { createClient } from 'redis';
CacheHandler.onCreation(async () => {
// Get Redis URL from env. Default to localhost for dev.
const redisUrl = process.env.REDIS_URL ?? 'redis://localhost:6379';
const client = createClient({
url: redisUrl,
});
client.on('error', (err) => {
console.error('Redis connection error:', err);
});
// Connect to Redis
await client.connect();
console.log('Redis connected for Next.js ISR Cache');
// Create Redis Handler
const redisHandler = await createRedisHandler({
client,
keyPrefix: 'nextjs-isr-cache:', // Prefix to identify keys
timeoutMs: 5000,
});
return {
handlers: [redisHandler],
// In distributed systems, turn off memory cache or keep it very short for consistency.
};
});
export default CacheHandler;
3. Next.js Config (next.config.js)
Now tell Next.js to use this handler. The most critical option here is cacheMaxMemorySize: 0.
/** @type {import('next').NextConfig} */
const nextConfig = {
// ... other configs
// Point to our handler file
cacheHandler: require.resolve('./cache-handler.mjs'),
// IMPORTANT!! Disable Memory Cache (0 bytes)
// If not disabled, even if Redis has fresh data,
// a server might serve stale data from its own RAM.
cacheMaxMemorySize: 0,
};
module.exports = nextConfig;
With this setup, the architecture changes:
- Server 1 triggers ISR -> Saves result (HTML) to Redis.
- Server 2 receives request -> Ignores local disk -> Fetches data from Redis.
- Server 2 serves fresh data immediately.
Even after deployment (restart), Redis data persists (if configured for persistence), so cache doesn't vanish.
Deep Dive: The Trap of On-Demand Revalidation
It's not over. Next.js supports On-Demand Revalidation (res.revalidate('/path')) via API calls, not just time-based.
Suppose saving a post in the CMS triggers a webhook to update the page.
Webhook hits Server 1.
Server 1 runs res.revalidate('/products/1'). It writes fresh data to Redis.
But if we didn't set cacheMaxMemorySize: 0?
Server 2 still holds the old data in its RAM (L1 Cache). It doesn't know Redis updated.
To solve this perfectly, you need a Pub/Sub model.
- Server 1 updates.
- Publishes "I updated
/products/1" to Redis Pub/Sub channel. - Server 2 (Subscriber) receives message -> Evicts that key from its memory cache.
@neshca/cache-handler supports this experimentally, but it's complex.
The pragmatic alternative is simply disabling memory cache (cacheMaxMemorySize: 0). Redis is fast enough (sub-ms response). No need to double-cache in RAM. Accepting the tiny Network RTT cost solves all consistency headaches.
Conclusion: Comfort Has a Cost
On Vercel, I didn't have to think about this for a second. For $20/month, I enjoyed infrastructure optimized by Vercel engineers.
Moving to AWS gave us "Control over Infrastructure," but also "Responsibility to Maintain Infrastructure."
We added Redis costs (about $15/mo for ElastiCache t4g.micro) and spent my expensive engineering hours debugging and setting up CacheHandler.
But through this journey, I understood Next.js to its core. I realized that features I thought were "Magic" were actually sophisticated combinations of File I/O and Caching Strategies.
Advice to anyone running Next.js outside Vercel: "If you plan to use ISR, bring Redis. Or be prepared for the Time Loop."