Memory Management: Contiguous vs Non-Contiguous Allocation

Prologue: 1GB RAM and the OOM Killer

Consider a scenario where everything runs on a single AWS t2.micro instance (1GB RAM): web server, database, log collector—all on one machine. The outcome is predictable: the server crashes.

The logs would show something like this:

Out of memory: Killed process 1234 (node) total-vm:850MB, anon-rss:780MB

The Out of Memory (OOM) Killer massacred the processes. But the math seems off. Web server: 100MB, DB: 300MB, Logger: 50MB. That's 450MB total. Why does 1GB run out?

This leads to a deeper question: "How does 8GB RAM run 100 Chrome tabs? Each tab uses about 200MB on average—that's 20GB right there." It turned out the problem isn't 'how much' but 'how to fit'. Memory isn't an infinite bucket; it's a very strict Tetris board.

The Struggle: What is Contiguous Allocation Anyway?

Early computers used simple memory management: "Give contiguous space equal to program size." My mental model was the same. A 100MB program arrives, OS finds a 100MB empty block, and shoves it in.

It's like shelving books in a library:

Program A (3 books thick): Slots 1-3.
Program B (5 books thick): Slots 4-8.
Program C (2 books thick): Slots 9-10.

Clean, right? Easy to implement. Each program remembers its starting address (Base) and size (Limit). When the CPU asks "read address 5," the OS calculates: "Program A starts at 1, so 1+5=6."

External Fragmentation: Plenty of Free Space, But Can't Use It

Now, Program A terminates. Slots 1-3 are free. Then Program D (4 books thick) arrives.

Current state:

Slots 1-3: Free (3 spaces)
Slots 4-8: Program B (5 spaces)
Slots 9-10: Program C (2 spaces)
Slots 11-100: Free (90 spaces)

Total free space: 93 slots. But we cannot fit Program D (4 slots). Why? The 1-3 hole is only 3 slots. It could fit in 11-100, but then the 1-3 hole is wasted forever.

This is External Fragmentation. Plenty of free memory, but it's fragmented into unusable pieces. The "Free Memory Available but Allocation Failed" error in my logs was exactly this scenario.

At first, I didn't get it. "Just move Program B backward! Put D in 1-4, move B to 5-9..." But in practice, this is insanely expensive. To move a program in memory, you must pause execution, copy hundreds of MBs, and recalculate every internal pointer. This is called Compaction, and it's so slow it's impractical.

Internal Fragmentation: Wasted Space Inside Allocations

External fragmentation isn't the only problem. There's also Internal Fragmentation. OS usually allocates memory in fixed-size chunks (e.g., 4KB). But programs rarely need exactly 4KB.

For example, a 3.2KB program arrives. The OS gives it a full 4KB block. 0.8KB is wasted. This is Internal Fragmentation—wasted space inside allocated blocks.

My mental model: External fragmentation is "gaps between books on a shelf." Internal fragmentation is "empty space in a shelf slot because the book is smaller." Both waste space, but the causes differ.

The 'Aha!' Moment: Non-Contiguous Allocation—Rip and Stuff

To solve this, we need a paradigm shift. "Why keep the program in one piece? Can't we rip it apart and stuff it anywhere?"

Imagine ripping pages from a book (sorry, just a metaphor) and storing them individually:

Program D, Page 1 -> Slot 1.
Page 2 -> Slot 2.
Page 3 -> Slot 3.
Page 4 -> Slot 11.

Now, if any slot is free, we can run the program. External fragmentation disappears. This "rip and stuff" technique is the core of modern OS: Paging and Segmentation.

But a new problem arises: "If the program is scattered, how does the CPU find address 5?" The OS creates a Page Table—an address book.

Logical Address	Physical Address
Page 0	Slot 1
Page 1	Slot 2
Page 2	Slot 3
Page 3	Slot 11

When the CPU requests "logical address 3," the OS checks the page table: "Oh, that's actually physical address 11." This is Address Translation.

This concept was confusing at first, but thinking of it as a "phone book" clicked for me. You look up "John" (logical address) in the phone book (page table) and get "555-1234" (physical address).

Deep Dive: Memory Allocation Algorithms—Which Hole to Fill?

Before non-contiguous allocation, OS had strategies for choosing which hole to fill when multiple gaps existed. This is memory allocation algorithms.

First Fit

Use the first hole that's big enough. Fastest approach. Scan from the start, find "Oh, this fits?" and place it.

Memory: [10 free][A][5 free][B][20 free]
Need 15 -> Allocate in 20-slot hole (first sufficient space)

Fast, but small holes accumulate at the front. Like Tetris—filling from the bottom leaves a messy top.

Best Fit

Find the perfect-sized hole. Check all holes and decide: "Need 15, there's a 16-slot hole—least waste."

Memory: [10 free][A][16 free][B][20 free]
Need 15 -> Allocate in 16-slot hole (tightest fit)

Theoretically space-efficient, but in practice, it generates tons of tiny useless holes. Putting 15 in 16 leaves a 1-slot hole—almost useless later.

Worst Fit

Use the largest hole. The idea: "If I leave a big remaining space, I can reuse it later."

Memory: [10 free][A][16 free][B][20 free]
Need 15 -> Allocate in 20-slot hole (largest)

Counterintuitive, but performance is surprisingly decent. The remaining space (5 slots) is big enough to reuse.

I initially thought "Best Fit must be optimal, right?" But experimental results showed First Fit and Best Fit perform similarly, and Worst Fit isn't bad either. I accepted that theory and practice differ. Memory management isn't a math problem—it's statistics and probability.

Deep Dive 2: Buddy System—Splitting by Powers of Two

Modern Linux kernels use the Buddy System. This was genuinely brilliant.

Memory is managed only in power-of-2 sizes: 1KB, 2KB, 4KB, 8KB, 16KB... A 6KB program arrives? Give it 8KB. 2KB is wasted as internal fragmentation, but management becomes insanely simple.

How it works:

Start with one 128KB block.
Request 30KB -> Split 128KB into two 64KB. Split one 64KB into two 32KB. Give one 32KB block.
Request 10KB -> Split remaining 64KB into two 32KB. Split one 32KB into two 16KB. Give one 16KB block.

Deallocation is also easy. When a 32KB block is freed, check if its "buddy" (the other 32KB from the same parent) is also free. If both are free, coalesce them into a 64KB block. Repeat recursively.

Why is this good? Fragmentation automatically reduces. Small blocks merge into big blocks. Like Tetris—when a line completes, it disappears.

I initially wondered, "Why powers of two?" Then I realized bit operations make calculations extremely fast. Address calculations are just bit shifts. That was the key: performance optimization.

Deep Dive 3: Swapping—Borrow from HDD When RAM Runs Out

Another memory management trick is Swapping. Just before my server crashed, I set up a swapfile to put out the fire.

RAM (desk) is full, but you need to open a new book? Move an unused book to the bookshelf (hard disk). This is Swap Out. Bring it back when needed (Swap In).

Linux uses LRU (Least Recently Used) to pick victims. "Swap out the least-recently-used page." Makes sense—less loss if you evict something unused.

Warning: Swap is Slow

Hard disks (even SSDs) are hundreds of thousands of times slower than RAM:

RAM: Nanoseconds (ns). 1/1,000,000,000 second.
SSD: Microseconds (μs). 1/1,000,000 second.
HDD: Milliseconds (ms). 1/1,000 second.

Swapping means you stop studying, run to the archive, swap books, and run back. The server becomes extremely slow. Better than crashing, but not healthy.

When swapping happens frequently on my server, top shows skyrocketing wa (wait I/O). That's my signal: "Oh, RAM is insufficient."

Real World: Memory Allocation and Leaks in C Code

Theory alone didn't click. Only after writing C code did I feel like I understood it.

#include <stdio.h>
#include <stdlib.h>

int main() {
    // Dynamic allocation: Ask OS for 100 bytes
    int *arr = (int *)malloc(100 * sizeof(int));
    if (arr == NULL) {
        printf("Memory allocation failed!\n");
        return 1;
    }

    // Use memory
    for (int i = 0; i < 100; i++) {
        arr[i] = i * i;
    }

    // Free memory: Return to OS
    free(arr);

    return 0;
}

Calling malloc() makes the OS find memory in the heap. Internally, it uses algorithms like First Fit or Best Fit. Calling free() marks that space as a "free hole" again.

Memory Leak

What if you forget free()?

int main() {
    for (int i = 0; i < 1000000; i++) {
        int *leak = (int *)malloc(1024);
        // free(leak); <- Forget this?
    }
    return 0;
}

This program allocates 1GB (1024 * 1,000,000 bytes) and never returns it. From the OS's perspective, "This program is using 1GB." Even though it's not actually using it. This is a Memory Leak.

Java and Python have garbage collectors to clean up, but C/C++ or poorly written Node.js code can leak. OS can't help. It's like Tetris blocks piling up forever.

Catching Leaks with Valgrind

My favorite tool is valgrind. It automatically detects memory leaks.

gcc -g memory_leak.c -o leak
valgrind --leak-check=full ./leak

Output:

==12345== HEAP SUMMARY:
==12345==     in use at exit: 1,024,000,000 bytes in 1,000,000 blocks
==12345==   total heap usage: 1,000,000 allocs, 0 frees, 1,024,000,000 bytes allocated
==12345==
==12345== LEAK SUMMARY:
==12345==    definitely lost: 1,024,000,000 bytes in 1,000,000 blocks

It clearly tells me "1GB lost." Without this tool, I would've spent weeks debugging.

Real World 2: Monitoring with htop

In production, I monitor memory in real-time with top or htop.

htop

Important columns:

RES (Resident): How much physical memory is actually used?
VIRT (Virtual): How much logical memory is allocated?
SHR (Shared): Memory shared with other processes.

For example, Chrome might have VIRT=2GB but RES=500MB. "Reserved 2GB, but actually using only 500MB." This is possible because of Memory Overcommit.

Linux Memory Overcommit (vm.overcommit_memory)

Linux defaults to "Promise first, check later." Chrome asks for "10GB," OS says "OK"—even if physical RAM is only 8GB.

Why? Most programs don't use all allocated memory. Like hotels overbooking rooms by 120%. "Not everyone will check in, right?" It's a gamble.

/proc/sys/vm/overcommit_memory settings:

0 (default): Heuristic. Only allow reasonable requests.
1 (Always): Always allow. Risky but maximizes memory use.
2 (Never): Never overcommit. Safe but wastes memory.

I use 0 for production servers. 1 risks OOM Killer rampages, and 2 refuses allocation even when memory is available.

Real World 3: Copy-on-Write (COW) and Memory-Mapped Files

Modern OS uses clever tricks to save memory. Copy-on-Write (COW) is genius.

When you fork() a process in Linux, the child process shares the parent's memory. Only when one tries to write does the OS make a copy. "If you're just reading, why waste memory copying?"

pid_t pid = fork();
if (pid == 0) {
    // Child: Shares parent's memory until it writes
    printf("I'm the child\n");
}

This is why you can run 100 Chrome tabs. Each tab is a separate process, but they share the Chrome executable code. Only per-tab data (DOM, JavaScript variables) is separate.

Memory-Mapped Files

Another trick: mmap(). Instead of loading a file into RAM, map it directly to the address space. "If you need page 5 of the file, OS loads only that page."

int fd = open("bigfile.dat", O_RDONLY);
char *data = mmap(NULL, filesize, PROT_READ, MAP_PRIVATE, fd, 0);
// Now 'data' points to the file without loading it all into RAM

This is how databases (PostgreSQL, MySQL) handle huge files without exhausting RAM.

Real World 4: NUMA Architecture and Memory Locality

On multi-CPU servers, memory isn't uniform. NUMA (Non-Uniform Memory Access) means each CPU has its own local memory. Accessing local memory is fast, but accessing another CPU's memory is slow.

CPU0 --fast--> RAM0
     --slow--> RAM1

CPU1 --slow--> RAM0
     --fast--> RAM1

If a process running on CPU0 accesses RAM1, performance tanks. Linux's scheduler tries to keep processes on the same CPU to maximize locality. This is why numactl exists—to pin processes to specific CPUs and memory.

Running a high-performance database without NUMA awareness can significantly hurt performance. Pinning processes with numactl to local memory is a known optimization that can improve throughput noticeably.

Application: Applying to My Server

1. Setting Up Swap

When setting up Linux, allocating Swap equal to 2x physical RAM is a good safety net.

# Create 2GB swap file
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

# Persist after reboot
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

This saved me from 3 AM wakeup calls. Even if OOM Killer arrives, swap buys time.

2. Memory Monitoring Script

I send Slack alerts when memory usage exceeds 80%.

#!/bin/bash
USAGE=$(free | grep Mem | awk '{print ($3/$2) * 100}')
THRESHOLD=80

if (( $(echo "$USAGE > $THRESHOLD" | bc -l) )); then
    curl -X POST -H 'Content-type: application/json' \
    --data "{\"text\":\"⚠️ Memory usage: ${USAGE}%\"}" \
    YOUR_SLACK_WEBHOOK_URL
fi

I run this via cron every 5 minutes. Catch problems before the server crashes.

3. Checking OOM Killer Logs

When OOM happens, check /var/log/syslog or dmesg.

dmesg | grep -i "killed process"