
Memory Management: Contiguous vs Non-Contiguous Allocation
Why does my server crash? OS's desperate struggle to manage limited memory. War against Fragmentation.

Why does my server crash? OS's desperate struggle to manage limited memory. War against Fragmentation.
Two ways to escape a maze. Spread out wide (BFS) or dig deep (DFS)? Who finds the shortest path?

Fast by name. Partitioning around a Pivot. Why is it the standard library choice despite O(N²) worst case?

Establishing TCP connection is expensive. Reuse it for multiple requests.

Foundation of DB Design. Splitting tables to prevent Anomalies. 1NF, 2NF, 3NF explained simply.

Consider a scenario where everything runs on a single AWS t2.micro instance (1GB RAM): web server, database, log collector—all on one machine. The outcome is predictable: the server crashes.
The logs would show something like this:
Out of memory: Killed process 1234 (node) total-vm:850MB, anon-rss:780MB
The Out of Memory (OOM) Killer massacred the processes. But the math seems off. Web server: 100MB, DB: 300MB, Logger: 50MB. That's 450MB total. Why does 1GB run out?
This leads to a deeper question: "How does 8GB RAM run 100 Chrome tabs? Each tab uses about 200MB on average—that's 20GB right there." It turned out the problem isn't 'how much' but 'how to fit'. Memory isn't an infinite bucket; it's a very strict Tetris board.
Early computers used simple memory management: "Give contiguous space equal to program size." My mental model was the same. A 100MB program arrives, OS finds a 100MB empty block, and shoves it in.
It's like shelving books in a library:
Clean, right? Easy to implement. Each program remembers its starting address (Base) and size (Limit). When the CPU asks "read address 5," the OS calculates: "Program A starts at 1, so 1+5=6."
Now, Program A terminates. Slots 1-3 are free. Then Program D (4 books thick) arrives.
Current state:
Total free space: 93 slots. But we cannot fit Program D (4 slots). Why? The 1-3 hole is only 3 slots. It could fit in 11-100, but then the 1-3 hole is wasted forever.
This is External Fragmentation. Plenty of free memory, but it's fragmented into unusable pieces. The "Free Memory Available but Allocation Failed" error in my logs was exactly this scenario.
At first, I didn't get it. "Just move Program B backward! Put D in 1-4, move B to 5-9..." But in practice, this is insanely expensive. To move a program in memory, you must pause execution, copy hundreds of MBs, and recalculate every internal pointer. This is called Compaction, and it's so slow it's impractical.
External fragmentation isn't the only problem. There's also Internal Fragmentation. OS usually allocates memory in fixed-size chunks (e.g., 4KB). But programs rarely need exactly 4KB.
For example, a 3.2KB program arrives. The OS gives it a full 4KB block. 0.8KB is wasted. This is Internal Fragmentation—wasted space inside allocated blocks.
My mental model: External fragmentation is "gaps between books on a shelf." Internal fragmentation is "empty space in a shelf slot because the book is smaller." Both waste space, but the causes differ.
To solve this, we need a paradigm shift. "Why keep the program in one piece? Can't we rip it apart and stuff it anywhere?"
Imagine ripping pages from a book (sorry, just a metaphor) and storing them individually:
Now, if any slot is free, we can run the program. External fragmentation disappears. This "rip and stuff" technique is the core of modern OS: Paging and Segmentation.
But a new problem arises: "If the program is scattered, how does the CPU find address 5?" The OS creates a Page Table—an address book.
| Logical Address | Physical Address |
|---|---|
| Page 0 | Slot 1 |
| Page 1 | Slot 2 |
| Page 2 | Slot 3 |
| Page 3 | Slot 11 |
When the CPU requests "logical address 3," the OS checks the page table: "Oh, that's actually physical address 11." This is Address Translation.
This concept was confusing at first, but thinking of it as a "phone book" clicked for me. You look up "John" (logical address) in the phone book (page table) and get "555-1234" (physical address).
Before non-contiguous allocation, OS had strategies for choosing which hole to fill when multiple gaps existed. This is memory allocation algorithms.
Use the first hole that's big enough. Fastest approach. Scan from the start, find "Oh, this fits?" and place it.
Memory: [10 free][A][5 free][B][20 free]
Need 15 -> Allocate in 20-slot hole (first sufficient space)
Fast, but small holes accumulate at the front. Like Tetris—filling from the bottom leaves a messy top.
Find the perfect-sized hole. Check all holes and decide: "Need 15, there's a 16-slot hole—least waste."
Memory: [10 free][A][16 free][B][20 free]
Need 15 -> Allocate in 16-slot hole (tightest fit)
Theoretically space-efficient, but in practice, it generates tons of tiny useless holes. Putting 15 in 16 leaves a 1-slot hole—almost useless later.
Use the largest hole. The idea: "If I leave a big remaining space, I can reuse it later."
Memory: [10 free][A][16 free][B][20 free]
Need 15 -> Allocate in 20-slot hole (largest)
Counterintuitive, but performance is surprisingly decent. The remaining space (5 slots) is big enough to reuse.
I initially thought "Best Fit must be optimal, right?" But experimental results showed First Fit and Best Fit perform similarly, and Worst Fit isn't bad either. I accepted that theory and practice differ. Memory management isn't a math problem—it's statistics and probability.
Modern Linux kernels use the Buddy System. This was genuinely brilliant.
Memory is managed only in power-of-2 sizes: 1KB, 2KB, 4KB, 8KB, 16KB... A 6KB program arrives? Give it 8KB. 2KB is wasted as internal fragmentation, but management becomes insanely simple.
How it works:
Deallocation is also easy. When a 32KB block is freed, check if its "buddy" (the other 32KB from the same parent) is also free. If both are free, coalesce them into a 64KB block. Repeat recursively.
Why is this good? Fragmentation automatically reduces. Small blocks merge into big blocks. Like Tetris—when a line completes, it disappears.
I initially wondered, "Why powers of two?" Then I realized bit operations make calculations extremely fast. Address calculations are just bit shifts. That was the key: performance optimization.
Another memory management trick is Swapping. Just before my server crashed, I set up a swapfile to put out the fire.
RAM (desk) is full, but you need to open a new book? Move an unused book to the bookshelf (hard disk). This is Swap Out. Bring it back when needed (Swap In).
Linux uses LRU (Least Recently Used) to pick victims. "Swap out the least-recently-used page." Makes sense—less loss if you evict something unused.
Hard disks (even SSDs) are hundreds of thousands of times slower than RAM:
Swapping means you stop studying, run to the archive, swap books, and run back. The server becomes extremely slow. Better than crashing, but not healthy.
When swapping happens frequently on my server, top shows skyrocketing wa (wait I/O). That's my signal: "Oh, RAM is insufficient."
Theory alone didn't click. Only after writing C code did I feel like I understood it.
#include <stdio.h>
#include <stdlib.h>
int main() {
// Dynamic allocation: Ask OS for 100 bytes
int *arr = (int *)malloc(100 * sizeof(int));
if (arr == NULL) {
printf("Memory allocation failed!\n");
return 1;
}
// Use memory
for (int i = 0; i < 100; i++) {
arr[i] = i * i;
}
// Free memory: Return to OS
free(arr);
return 0;
}
Calling malloc() makes the OS find memory in the heap. Internally, it uses algorithms like First Fit or Best Fit. Calling free() marks that space as a "free hole" again.
What if you forget free()?
int main() {
for (int i = 0; i < 1000000; i++) {
int *leak = (int *)malloc(1024);
// free(leak); <- Forget this?
}
return 0;
}
This program allocates 1GB (1024 * 1,000,000 bytes) and never returns it. From the OS's perspective, "This program is using 1GB." Even though it's not actually using it. This is a Memory Leak.
Java and Python have garbage collectors to clean up, but C/C++ or poorly written Node.js code can leak. OS can't help. It's like Tetris blocks piling up forever.
My favorite tool is valgrind. It automatically detects memory leaks.
gcc -g memory_leak.c -o leak
valgrind --leak-check=full ./leak
Output:
==12345== HEAP SUMMARY:
==12345== in use at exit: 1,024,000,000 bytes in 1,000,000 blocks
==12345== total heap usage: 1,000,000 allocs, 0 frees, 1,024,000,000 bytes allocated
==12345==
==12345== LEAK SUMMARY:
==12345== definitely lost: 1,024,000,000 bytes in 1,000,000 blocks
It clearly tells me "1GB lost." Without this tool, I would've spent weeks debugging.
In production, I monitor memory in real-time with top or htop.
htop
Important columns:
For example, Chrome might have VIRT=2GB but RES=500MB. "Reserved 2GB, but actually using only 500MB." This is possible because of Memory Overcommit.
Linux defaults to "Promise first, check later." Chrome asks for "10GB," OS says "OK"—even if physical RAM is only 8GB.
Why? Most programs don't use all allocated memory. Like hotels overbooking rooms by 120%. "Not everyone will check in, right?" It's a gamble.
/proc/sys/vm/overcommit_memory settings:
I use 0 for production servers. 1 risks OOM Killer rampages, and 2 refuses allocation even when memory is available.
Modern OS uses clever tricks to save memory. Copy-on-Write (COW) is genius.
When you fork() a process in Linux, the child process shares the parent's memory. Only when one tries to write does the OS make a copy. "If you're just reading, why waste memory copying?"
pid_t pid = fork();
if (pid == 0) {
// Child: Shares parent's memory until it writes
printf("I'm the child\n");
}
This is why you can run 100 Chrome tabs. Each tab is a separate process, but they share the Chrome executable code. Only per-tab data (DOM, JavaScript variables) is separate.
Another trick: mmap(). Instead of loading a file into RAM, map it directly to the address space. "If you need page 5 of the file, OS loads only that page."
int fd = open("bigfile.dat", O_RDONLY);
char *data = mmap(NULL, filesize, PROT_READ, MAP_PRIVATE, fd, 0);
// Now 'data' points to the file without loading it all into RAM
This is how databases (PostgreSQL, MySQL) handle huge files without exhausting RAM.
On multi-CPU servers, memory isn't uniform. NUMA (Non-Uniform Memory Access) means each CPU has its own local memory. Accessing local memory is fast, but accessing another CPU's memory is slow.
CPU0 --fast--> RAM0
--slow--> RAM1
CPU1 --slow--> RAM0
--fast--> RAM1
If a process running on CPU0 accesses RAM1, performance tanks. Linux's scheduler tries to keep processes on the same CPU to maximize locality. This is why numactl exists—to pin processes to specific CPUs and memory.
Running a high-performance database without NUMA awareness can significantly hurt performance. Pinning processes with numactl to local memory is a known optimization that can improve throughput noticeably.
When setting up Linux, allocating Swap equal to 2x physical RAM is a good safety net.
# Create 2GB swap file
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
# Persist after reboot
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
This saved me from 3 AM wakeup calls. Even if OOM Killer arrives, swap buys time.
I send Slack alerts when memory usage exceeds 80%.
#!/bin/bash
USAGE=$(free | grep Mem | awk '{print ($3/$2) * 100}')
THRESHOLD=80
if (( $(echo "$USAGE > $THRESHOLD" | bc -l) )); then
curl -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"⚠️ Memory usage: ${USAGE}%\"}" \
YOUR_SLACK_WEBHOOK_URL
fi
I run this via cron every 5 minutes. Catch problems before the server crashes.
When OOM happens, check /var/log/syslog or dmesg.
dmesg | grep -i "killed process"
Output:
Out of memory: Killed process 5678 (java) total-vm:2048MB
From this log, I know: "Oh, the Java process ate 2GB and got killed."
Key takeaways from my memory management journey:
Contiguous Allocation: Easy to implement, but external fragmentation (wasted gaps) kills efficiency. Like a clean bookshelf, but you can't use the gaps.
Non-Contiguous Allocation: Break programs into pieces and stuff them anywhere. (Paging, Segmentation). The modern OS standard. Like ripping Tetris blocks into individual squares and filling gaps.
Allocation Algorithms: First Fit, Best Fit, Worst Fit each have pros/cons, but in practice, differences are marginal. Theory and practice diverge.
Buddy System: Managing in power-of-2 sizes reduces fragmentation and speeds calculations. There's a reason Linux kernels use it.
Swapping: Borrow HDD when RAM runs out. (Slow but better than crashing). But frequent swapping destroys performance.
Memory Leaks: OS can't help if code doesn't free(). Use tools like Valgrind to catch them.
Memory Overcommit: Linux defaults to "promise first." If memory runs out later, OOM Killer shows up.
Copy-on-Write: Share memory until someone writes. This is why 100 Chrome tabs don't need 20GB.
Memory-Mapped Files: Load files on-demand, page by page. Databases use this to handle huge files.
NUMA: On multi-CPU servers, local memory is fast, remote memory is slow. Pinning processes matters.
Memory management is the art of efficiently slicing limited resources. When RAM is tight, understanding these principles helps you survive. Now I fully grasp how 100 Chrome tabs run on 8GB RAM. It all clicked.