
Page Replacement Algorithms: FIFO, LRU, LFU
Desk is full. Which book to throw away? Oldest? Least used? OS's decision making.

Desk is full. Which book to throw away? Oldest? Least used? OS's decision making.
Why does my server crash? OS's desperate struggle to manage limited memory. War against Fragmentation.

Two ways to escape a maze. Spread out wide (BFS) or dig deep (DFS)? Who finds the shortest path?

Fast by name. Partitioning around a Pivot. Why is it the standard library choice despite O(N²) worst case?

Establishing TCP connection is expensive. Reuse it for multiple requests.

I used to think page replacement algorithms were overhyped. "If RAM is full, just fetch from disk, right?" Big mistake.
Think about your office desk. The desk (RAM) is tiny, but project documents pile up like mountains. You keep only what you need right now on the desk, and stuff the rest into a cabinet (HDD). But when you realize you need that document you just threw away, it's incredibly frustrating. Walking back and forth to the cabinet wastes so much time.
The OS faces the exact same problem. RAM is limited, but programs demand tons of memory pages. When a new page needs to be loaded (Page Fault) and there's no space, someone has to be sacrificed (Swap Out). "Who should we kick out?" That's the core of page replacement algorithms.
The goal is simple: "Don't evict the page we'll immediately need again." If we throw out a page and instantly need it back, disk I/O happens and the system grinds to a halt. Minimizing this is everything.
College textbooks and blog posts list algorithms like FIFO, LRU, and LFU. But honestly, my first impression was "How are these different?" Just evict whatever isn't being used, right?
Especially Belady's Anomaly made no sense. The book said "FIFO has this paradoxical phenomenon where increasing memory frames can actually increase Page Faults." This felt counterintuitive. Common sense says a bigger desk (more Frames) should mean fewer replacements, no?
And LRU (Least Recently Used) is called the "de facto standard," but why do other algorithms even exist? If LRU is so good, why not just use LRU everywhere?
The "aha moment" came when I encountered the refrigerator metaphor.
Your fridge (RAM) is full. To add new ingredients, you must throw something away. What do you do?
The logic is simple. Manage a queue of items by their entry order, and when space is needed, evict the front one. Easy to implement.
But there's a fatal flaw. Say you have milk from 2 weeks ago and cake from yesterday. FIFO always tosses the milk first. But maybe you only looked at the cake once, while milk is your daily essential. You throw it away and immediately need to buy it again.
Even funnier is Belady's Anomaly. You get a bigger fridge (more Frames), but somehow make more trips to the grocery store (more Page Faults). I understood this only after simulating it myself.
def fifo_page_replacement(pages, frame_count):
"""FIFO page replacement simulation"""
frames = []
page_faults = 0
for page in pages:
if page not in frames:
page_faults += 1
if len(frames) < frame_count:
frames.append(page)
else:
# Remove oldest page (front of queue)
frames.pop(0)
frames.append(page)
return page_faults
# Belady's Anomaly experiment
pages = [1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5]
print(f"3 Frames: {fifo_page_replacement(pages, 3)} Page Faults") # 9
print(f"4 Frames: {fifo_page_replacement(pages, 4)} Page Faults") # 10 (more!)
I was shocked by the results. Four frames caused more Page Faults than three. "More space makes it worse?" That's FIFO's stupidity.
This is way more rational. In your fridge, you drank milk yesterday and today, but haven't touched the cake in 2 weeks. Toss the cake. Temporal Locality principle. What you used recently will likely be used again soon, and what you haven't touched in a while probably won't be needed.
Real program behavior follows this pattern. Loops repeatedly reference the same variables, and in a function call stack, the most recent functions are most likely to be called again. That's why LRU performs best in practice.
from collections import OrderedDict
def lru_page_replacement(pages, frame_count):
"""LRU page replacement simulation"""
frames = OrderedDict()
page_faults = 0
for page in pages:
if page not in frames:
page_faults += 1
if len(frames) >= frame_count:
# Remove least recently used (front of OrderedDict)
frames.popitem(last=False)
frames[page] = True
else:
# Already exists, mark as recently used
frames.move_to_end(page)
return page_faults
# Same test
pages = [1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5]
print(f"3 Frames (LRU): {lru_page_replacement(pages, 3)} Page Faults") # 10
print(f"4 Frames (LRU): {lru_page_replacement(pages, 4)} Page Faults") # 8 (normal decrease)
LRU properly reduces Page Faults when you add frames. Makes sense.
But the downside is cost. You need to track "last access time" for every page. Every memory access updates a timestamp, creating significant overhead. It's like putting and changing date stickers on every fridge item constantly. Tedious.
The logic: If you've consumed milk 100 times and cake once, isn't cake less important? So use reference count (Frequency) as the criterion.
But this creates unfair situations. Fresh ingredients just added to the fridge have 0-1 reference counts. They can get evicted immediately even though old stuff eaten 100 times might be spoiled now. "New pages get instantly kicked out unfairly."
In practice, LRU is overwhelmingly preferred over LFU. Redis and cache servers default to LRU variants.
Here's another realization: "LRU is good, but too expensive to implement in actual hardware."
The CPU hates updating timestamps on every memory access. So real OSes (Linux, Windows) use a hack called the Clock Algorithm (Second-Chance Algorithm).
Put a sticker on each fridge item: "Recently eaten: ✅" or "Not eaten: ❌"
Not perfect LRU. But similar behavior, and 1/10th the cost. Instead of timestamp updates, just manage a 1-bit flag (Reference Bit).
def clock_algorithm(pages, frame_count):
"""Clock algorithm simulation"""
frames = [None] * frame_count
reference_bits = [0] * frame_count
pointer = 0
page_faults = 0
for page in pages:
# If already exists, just set Reference Bit
if page in frames:
idx = frames.index(page)
reference_bits[idx] = 1
continue
# Page Fault occurred
page_faults += 1
# If empty space available, insert directly
if None in frames:
idx = frames.index(None)
frames[idx] = page
reference_bits[idx] = 1
continue
# Clock algorithm: find victim
while True:
if reference_bits[pointer] == 0:
# Reference Bit is 0, replace
frames[pointer] = page
reference_bits[pointer] = 1
pointer = (pointer + 1) % frame_count
break
else:
# Reference Bit is 1, give Second Chance (change to 0, move on)
reference_bits[pointer] = 0
pointer = (pointer + 1) % frame_count
return page_faults
pages = [1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5]
print(f"Clock Algorithm: {clock_algorithm(pages, 3)} Page Faults") # Similar to LRU
This is what the Linux kernel actually uses. Not perfect, but the "price-to-performance ratio" makes it the de facto standard.
Studying further, I found the Optimal Algorithm (Belady's Algorithm). It theoretically guarantees the fewest Page Faults. The god-tier algorithm.
"Evict the page that won't be used for the longest time in the future."An algorithm that sees the future. Among fridge items, it perfectly knows "food I absolutely won't eat this week" and tosses it. Obviously optimal.
But there's a fatal flaw: "How do you know the future?"
You'd need to know in advance which pages the program will reference. Impossible in reality. So Optimal Algorithm is only used for "benchmarking". When evaluating other algorithms, it serves as the "theoretical maximum" baseline for comparison.
def optimal_algorithm(pages, frame_count):
"""Optimal algorithm simulation (assumes future knowledge)"""
frames = []
page_faults = 0
for i, page in enumerate(pages):
if page not in frames:
page_faults += 1
if len(frames) < frame_count:
frames.append(page)
else:
# Look into future and find page used farthest away
future_uses = {}
for frame in frames:
try:
# When will this page be used next?
future_uses[frame] = pages[i+1:].index(frame)
except ValueError:
# Never used again = infinity
future_uses[frame] = float('inf')
# Remove page with farthest future use
victim = max(future_uses, key=future_uses.get)
frames[frames.index(victim)] = page
return page_faults
pages = [1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5]
print(f"Optimal: {optimal_algorithm(pages, 3)} Page Faults") # Fewest faults
This code cheats by looking at pages[i+1:] to see future references. Impossible in reality. But since it's theoretically optimal, we use it for comparisons like "our LRU achieves 90% of Optimal efficiency."
To understand why page replacement matters, you need to know how slow a Page Fault is.
This entire process takes milliseconds. Memory access (nanoseconds) is 1 million times faster. That's why minimizing Page Faults is critical for performance.
When I first deployed Redis for my service, the config file had a maxmemory-policy option. The list included values like allkeys-lru and allkeys-lfu.
"Why are LRU and LFU here?" I wondered. Turns out, it's the exact same logic as page replacement algorithms.
Redis also needs to evict someone when memory fills up. Deciding which Key to delete is the Eviction Policy.
| Policy | Description | Page Replacement Equivalent |
|---|---|---|
allkeys-lru | Remove least recently used among all keys | LRU |
allkeys-lfu | Remove least frequently used among all keys | LFU |
volatile-lru | LRU among keys with TTL | LRU (conditional) |
allkeys-random | Random eviction | Similar to FIFO |
That was it. "Cache and pages both face the same problem: choosing the optimal victim from limited space." The essence is identical.
In production, allkeys-lru is most common. The reason matches page replacement: temporal locality fits real workload patterns well.
Looking at Linux kernel code, page replacement uses a Two-List Strategy (Active/Inactive List). A variation of the Clock algorithm.
When a page is referenced, it moves to the Active List. Over time, it drops to the Inactive List. Replacement victims are chosen from the Inactive List.
More sophisticated than simple Clock with one Reference Bit. It gives "Multiple Chances" instead of "Second Chance." Frequently used pages stay in Active List, only truly unused pages fall to Inactive and get sacrificed.
// Simplified Linux kernel pseudocode
struct page {
unsigned long flags; // PG_active, PG_referenced, etc.
// ...
};
// On page access
mark_page_accessed(page) {
if (PageReferenced(page)) {
// Already Referenced, promote to Active
activate_page(page);
} else {
// First reference, just mark Referenced
SetPageReferenced(page);
}
}
// Select replacement candidate (from Inactive List)
shrink_inactive_list() {
for_each_page_in_inactive_list(page) {
if (PageReferenced(page)) {
// Second Chance
ClearPageReferenced(page);
continue;
}
// Referenced is 0, victim found
evict_page(page);
}
}
This structure lets Linux approach LRU performance while minimizing overhead.
No matter how well you design a page replacement algorithm, Thrashing will kill the system.
Thrashing is "Page Faults happen so frequently that more time is spent replacing pages than doing actual work." It occurs when RAM is smaller than the process's working set.
In fridge terms: Tonight's dinner needs 10 ingredients, but the fridge only fits 5. Take ingredient out → put another in → need the one you just put away → repeat infinitely. No cooking gets done, just endless grocery trips.
The solution is simple: "Add more RAM, or reduce the number of processes." Algorithm improvements can't fix it. The fundamental problem is insufficient memory.
Studying page replacement algorithms taught me "there's no perfect algorithm."
In practice, we use "reasonably good and cheaply implementable" Clock/LRU variants. We choose realistic compromises over theoretical perfection.
And understanding that Redis and cache servers apply these same principles made me realize OS concepts aren't disconnected from real work. Ultimately, the essence is the same: deciding what to sacrifice from limited resources.
Next, I want to dive deeper into Working Set and Thrashing. I'm curious how to adjust process count when RAM is insufficient.