
Why ISR Breaks on AWS/Docker (and How to Fix It)
ISR works perfectly on Vercel, but fails on AWS/Docker. Let's dig into the file system cache trap and how to solve it.

ISR works perfectly on Vercel, but fails on AWS/Docker. Let's dig into the file system cache trap and how to solve it.
How to deploy without shutting down servers. Differences between Rolling, Canary, and Blue-Green. Deep dive into Database Rollback strategies, Online Schema Changes, AWS CodeDeploy integration, and Feature Toggles.

Solving server waste at dawn and crashes at lunch. Understanding Auto Scaling vs Serverless through 'Taxi Dispatch' and 'Pizza Delivery' analogies. Plus, cost-saving tips using Spot Instances.

Why your server isn't hacked. From 'Packet Filtering' checking ports/IPs to AWS Security Groups. Evolution of Firewalls.

Why would Netflix intentionally shut down its own production servers? Explore the philosophy of Chaos Engineering, the Simian Army, and detailed strategies like GameDays and Automating Chaos to build resilient distributed systems.

The most feared phrase in development might be "It works on localhost," but I found one that's scarier: "It works on Vercel."
I encountered this nightmare while migrating a Next.js service from Vercel to AWS. In the Vercel days, development was smooth. git push was all we needed for deployment, domain connection, SSL, and CDN. No dedicated DevOps engineer required.
Next.js's killer feature, ISR (Incremental Static Regeneration), felt like magic. It kept the speed of a Static Site while automatically serving fresh pages when data changed (Dynamic).
export async function getStaticProps() {
const res = await fetch('https://api.example.com/posts');
const posts = await res.json();
return {
props: { posts },
revalidate: 60, // Refresh every 60 seconds!
};
}
Just this one line, revalidate: 60, was enough. Time to focus solely on business logic.
But as the service grew and traffic spiked, the Vercel bill became heavy. Programs like AWS Activate offer generous credits for startups — a natural trigger for the "let's move to AWS to save costs and own our infra!" decision.
EC2 instances, Docker images, Application Load Balancer (ALB), CI/CD pipelines. Everything looked perfect.
Until a critical report arrived 3 hours before launch."I updated the announcement in the admin panel, but the homepage still shows the old content. If I keep refreshing, sometimes the new content appears, and sometimes the old one returns."
Excuse me? Sometimes it works, sometimes it doesn't? This wasn't a DB issue or a frontend bug. It was a classic "Ghost" bug.
At first, I thought it was a browser cache issue. I checked 'Disable cache' in DevTools. No change. Then I suspected the CDN (CloudFront) and ran an Invalidation. Still the same.
I checked the server logs. The ISR regeneration was definitely triggered (Generating static pages...). The DB query returned the correct new data. So why were users seeing data from 3 days ago?
To understand this phenomenon, I had to dig into the "Physical Reality" of how Next.js handles ISR.
When using Vercel, ISR utilizes a "Global Cache."
revalidate time passed), a Background Serverless Function runs.In Vercel, "State is Centralized." Developers don't need to worry about it.
But running a Docker container on AWS EC2 is a different story. When you run a Next.js server (npm start) inside Docker, it defaults to using the Local File System for caching.
Specifically, the .next/cache/ and .next/server/pages/ directories.
When ISR triggers, the Node.js process fetches data from the DB, bakes a new HTML file, and writes it to its own hard disk.
Here lies the critical difference we overlooked.
For High Availability (HA), we had 2 servers running behind an ALB (Auto Scaling Group).
Let's reconstruct the scenario.
index.html (Title: "Maintenance Complete") -> Updated ✅index.html (Title: "Maintenance Notice") -> Old Data ❌What if User A refreshes?
From the user's perspective, they are stuck in a time loop, jumping between the past and present. This was the identity of the "sometimes works, sometimes doesn't" ghost bug.
There is a worse scenario: Deployment.
Docker-based deployments usually follow an Immutable Infrastructure strategy. A new version kills the old container and starts a new one from a fresh image.
When the container dies, the file system (.next/cache) inside it evaporates.
All the fresh data generated by ISR via user visits? Reset to zero. Back to the Build Time snapshot.
The core problem is "Servers are keeping separate diaries (caches)." The solution is simple: "Use a Shared Diary (Shared Storage)."
We needed an external storage accessible by all servers.
First thought was EFS, a network file system. Multiple EC2s can share a folder. Why not mount .next/cache to EFS?
I tried it.
Result? Too slow. EFS works over the network, so I/O latency is higher than local disk. Next.js build and runtime performance dropped significantly. Plus, configuration is messy.
Configuring the Load Balancer to "Send User A always to Server 1." User A sees the fresh version. But this isn't a root fix. Server 2 might never update. If Server 1 dies, User A is thrown into the past (Server 2).
The answer was Redis, an In-Memory Data Store. Fast, accessible by all servers, and its Key-Value structure is perfect for caching.
Fortunately, Next.js (13.4+) provides an official API called CacheHandler to customize internal cache logic. We can use this to tell Next.js: "Read and write to Redis instead of the file system."
No need to write from scratch. There is an excellent open-source library @neshca/cache-handler.
npm install @neshca/cache-handler redis
cache-handler.mjs)Create this file in your project root.
import { CacheHandler } from '@neshca/cache-handler';
import { createClient } from 'redis';
CacheHandler.onCreation(async () => {
// Get Redis URL from env. Default to localhost for dev.
const redisUrl = process.env.REDIS_URL ?? 'redis://localhost:6379';
const client = createClient({
url: redisUrl,
});
client.on('error', (err) => {
console.error('Redis connection error:', err);
});
// Connect to Redis
await client.connect();
console.log('Redis connected for Next.js ISR Cache');
// Create Redis Handler
const redisHandler = await createRedisHandler({
client,
keyPrefix: 'nextjs-isr-cache:', // Prefix to identify keys
timeoutMs: 5000,
});
return {
handlers: [redisHandler],
// In distributed systems, turn off memory cache or keep it very short for consistency.
};
});
export default CacheHandler;
next.config.js)Now tell Next.js to use this handler. The most critical option here is cacheMaxMemorySize: 0.
/** @type {import('next').NextConfig} */
const nextConfig = {
// ... other configs
// Point to our handler file
cacheHandler: require.resolve('./cache-handler.mjs'),
// IMPORTANT!! Disable Memory Cache (0 bytes)
// If not disabled, even if Redis has fresh data,
// a server might serve stale data from its own RAM.
cacheMaxMemorySize: 0,
};
module.exports = nextConfig;
With this setup, the architecture changes:
Even after deployment (restart), Redis data persists (if configured for persistence), so cache doesn't vanish.
It's not over. Next.js supports On-Demand Revalidation (res.revalidate('/path')) via API calls, not just time-based.
Suppose saving a post in the CMS triggers a webhook to update the page.
Webhook hits Server 1.
Server 1 runs res.revalidate('/products/1'). It writes fresh data to Redis.
But if we didn't set cacheMaxMemorySize: 0?
Server 2 still holds the old data in its RAM (L1 Cache). It doesn't know Redis updated.
To solve this perfectly, you need a Pub/Sub model.
/products/1" to Redis Pub/Sub channel.@neshca/cache-handler supports this experimentally, but it's complex.
The pragmatic alternative is simply disabling memory cache (cacheMaxMemorySize: 0). Redis is fast enough (sub-ms response). No need to double-cache in RAM. Accepting the tiny Network RTT cost solves all consistency headaches.
On Vercel, I didn't have to think about this for a second. For $20/month, I enjoyed infrastructure optimized by Vercel engineers.
Moving to AWS gave us "Control over Infrastructure," but also "Responsibility to Maintain Infrastructure."
We added Redis costs (about $15/mo for ElastiCache t4g.micro) and spent my expensive engineering hours debugging and setting up CacheHandler.
But through this journey, I understood Next.js to its core. I realized that features I thought were "Magic" were actually sophisticated combinations of File I/O and Caching Strategies.
Advice to anyone running Next.js outside Vercel: "If you plan to use ISR, bring Redis. Or be prepared for the Time Loop."