Paging: Splitting Memory into Fixed-Size Blocks

Why Fixed-Size Pages?

When I first studied memory management, the biggest question was: "How do we load programs of different sizes into memory?" Segmentation seemed logical at first—splitting programs into code, data, and stack segments made sense. But it led to external fragmentation.

Imagine trying to load a 100MB program when you have 50MB free here and 50MB free there, but no contiguous 100MB block. It's like a parking lot where cars are parked randomly—plenty of space, but no room for a new car. Compaction could solve this by rearranging memory, but copying entire programs around is too expensive.

That's when I understood paging. The idea is brutally simple: "Cut memory into fixed chunks, cut programs into the same-sized chunks, and stick them wherever there's space." My first reaction was skepticism. If you tear a program apart, how does it even run? The CPU needs to read instructions sequentially. If pieces are scattered everywhere, won't it crash?

Turns out, this approach became the standard for modern operating systems. Once I grasped how it works, I had that "aha, so that's why paging" moment.

What Confused Me at First

The hardest concept to grasp was the difference between logical address and physical address. Textbooks said "the CPU sees logical addresses, but RAM stores physical addresses." Honestly, I thought they were the same thing.

When I write int a = 10; in my code, a gets stored somewhere in memory. Isn't that address the "real" address? Why split it into "logical" and "physical"?

The breakthrough came when I realized: programs think they own all of memory. Process A thinks its memory starts at address 0. Process B also thinks its memory starts at address 0. If both actually accessed RAM address 0, they'd collide and corrupt each other's data.

So the OS gives each process a "virtual address space." Process A's logical address 0 maps to physical frame 52 in RAM. Process B's logical address 0 maps to physical frame 150. This way, programs live in peaceful isolation, each believing they start at address 0.

Who does this translation? The MMU (Memory Management Unit).

Aha Moment: MMU and Page Tables

When the CPU says "read logical address 1024," the MMU intercepts it. The MMU consults a map called the page table.

The page table looks like this:

Logical Page → Physical Frame
0 → 52
1 → 150
2 → 37
3 → 200
...

When the CPU requests "logical address 1024," the MMU splits it into two parts:

Page Number: 1024 ÷ 4096 = 0 (assuming 4KB pages)
Offset: 1024 mod 4096 = 1024

Then it looks up "logical page 0 maps to physical frame 52." The final physical address is:

Physical address = (52 × 4096) + 1024 = 213,024

The CPU has no idea this happened. It just asked for address 1024, and the MMU fetched data from address 213,024. This is the core of paging.

Here's the metaphor that clicked for me: checking out a book at the library. I ask for "Harry Potter Volume 1" (logical address). The librarian checks the catalog (page table) and finds "It's on floor 3, shelf B-52" (physical address), then retrieves it. I don't need to know where the book actually lives. The librarian (MMU) handles it.

Why 4KB Pages?

How big should a page be? Most operating systems use 4KB as the default. Why exactly 4KB?

What if pages were too small? Say we used 1KB pages. A 1GB program would be split into 1,048,576 pages. The page table would have over a million entries, eating up tons of memory just to store the map itself.

What if pages were too large? Say we used 1MB pages. A tiny 10KB program would occupy an entire 1MB page, wasting 990KB. This is internal fragmentation.

4KB is the sweet spot. Memory waste is minimal, and the page table size is manageable.

Recently, with gigabytes of RAM becoming common, Huge Pages (2MB, 1GB) are used more often. Programs like databases and virtual machines that consume massive amounts of memory benefit from Huge Pages because the page table shrinks, speeding up lookups.

In Linux, you might have seen Transparent Huge Pages (THP) tuning. MongoDB docs often recommend disabling THP. Why? Because MongoDB manages memory in small chunks, and if the OS merges them into huge pages, performance actually degrades.

What If the Page Table Is Too Big?

Even with 4KB pages, on a 64-bit system, the virtual address space per process is enormous (theoretically 2^64 bytes). A single-level page table would be gigantic.

That's why real systems use multi-level page tables. It's like a hierarchical table of contents.

For a 2-level page table, address translation works like this:

Logical Address: | Top 10 bits | Next 10 bits | Offset 12 bits |
                 └─ Level 1 Index ─ Level 2 Index ─ Page Offset ─┘

Use top 10 bits to index the Page Directory
That entry points to a Page Table, use next 10 bits to index it
Final frame number + offset = physical address

This way, unused address space doesn't require intermediate tables. It saves tons of memory.

Modern x86-64 uses 4-level page tables (PML4 → PDPT → PD → PT). Sounds complex, but it's efficient because tables are only created for actively used memory regions.

TLB: Speeding Up Translation

Every memory access requires page table lookups. With multi-level tables, that's up to 4 memory reads. Wouldn't that slow things down?

Yes, so there's the TLB (Translation Lookaside Buffer). Think of it as a cache for the page table.

Before consulting the page table, the MMU checks the TLB. "What's the physical frame for logical page 5?" If it's in the TLB (TLB Hit), translation is instant. If not (TLB Miss), the MMU walks the page table, finds the answer, and caches it in the TLB.

Programs exhibit locality. If you access a memory location once, you'll likely access nearby locations soon. That's why TLB hit rates are typically over 90%. Most translations never touch the page table.

On Linux, cat /proc/cpuinfo | grep -i tlb shows your CPU's TLB info. My MacBook has 64-entry L1 data TLB and 128-entry instruction TLB.

Simulating Page Translation in Code

Writing code to simulate logical-to-physical address translation made it crystal clear.

PAGE_SIZE = 4096  # 4KB

# Page table: logical page → physical frame
page_table = {
    0: 52,
    1: 150,
    2: 37,
    3: 200,
}

def translate_address(logical_address):
    # Split logical address into page number and offset
    page_number = logical_address // PAGE_SIZE
    offset = logical_address % PAGE_SIZE

    # Look up page table
    if page_number not in page_table:
        raise Exception(f"Page Fault: Page {page_number} not in memory")

    frame_number = page_table[page_number]

    # Calculate physical address
    physical_address = frame_number * PAGE_SIZE + offset

    return physical_address

# Test
logical = 1024
physical = translate_address(logical)
print(f"Logical {logical} → Physical {physical}")
# Output: Logical 1024 → Physical 213024

logical = 5000  # Inside logical page 1
physical = translate_address(logical)
print(f"Logical {logical} → Physical {physical}")
# Output: Logical 5000 → Physical 614792 (150 * 4096 + 904)

Seeing this code made me realize: "Oh, it's just division and multiplication." The MMU is this logic implemented in hardware.

What Is a Page Fault?

In the code above, requesting a page not in page_table throws an exception. In real operating systems, this is called a page fault.

When a page fault occurs, the OS steps in:

Check if the requested page is on disk (swap space)
If so, read it from disk into a free frame
Update the page table
Resume the program

This is the foundation of virtual memory. It's why you can run programs larger than RAM. Frequently used pages stay in RAM; unused pages get swapped to disk.

Of course, page faults involve disk I/O, so they're slow. Too many page faults cause thrashing—the program spends more time swapping pages than doing actual work. This is why your computer crawls when RAM runs out.

How malloc Relates to Paging

When I call malloc(1024) in C, memory gets allocated. How does this connect to paging?

malloc is a user-space library function. For small requests, it divides up a pre-allocated heap. For large requests, it issues system calls.

On Linux, mmap() or brk() system calls are invoked. The OS finds free pages in the process's virtual address space and says "you can use these." At this point, no physical memory is allocated yet.

Physical memory is only allocated when you first write to those pages. This is called lazy allocation or copy-on-write. Since many programs allocate memory they never use, the OS delays physical allocation until actually needed.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
    printf("Process started. PID: %d\n", getpid());
    getchar();  // Before allocation

    // Request 100MB
    char *ptr = malloc(100 * 1024 * 1024);
    printf("100MB malloc done\n");
    getchar();  // After malloc (no physical memory yet)

    // Actually write
    for (int i = 0; i < 100 * 1024 * 1024; i++) {
        ptr[i] = 'A';
    }
    printf("100MB write done\n");
    getchar();  // After writing (physical memory used)

    free(ptr);
    return 0;
}

Run this program and check ps aux | grep [process] in another terminal. Right after malloc, RSS (resident set size, actual physical memory) barely increases. Once writing starts, RSS jumps.

Paging in Production

When I first encountered an OOM (Out of Memory) error on a server, I thought "Wait, there's plenty of memory, why did it crash?" Turns out, swap space was also full, so no new pages could be allocated.

Running Docker containers, you can limit memory with --memory. This uses cgroups to restrict the number of pages a container can use.

In Kubernetes, when monitoring Pod memory usage, there's a metric called working set. This represents the actively used pages, not the total allocated memory. It's what really matters.

Understanding paging gave me clarity on what to check when facing "out of memory" errors and where to tune.

Is Internal Fragmentation Really a Problem?

The common critique of paging is internal fragmentation. The last page might not be fully used, wasting space.

A 10KB program occupies 4KB × 3 = 12KB, wasting 2KB. That's 20%. Seems serious.

But think about it. Modern programs are MB or GB in size. For a 100MB program, the average waste is 2KB. That's 0.002%. Negligible.

In contrast, segmentation's external fragmentation was brutal. If you have 100MB free split into two 50MB chunks, a 70MB program can't load at all. That was the real problem.

That's why modern operating systems chose paging. "Not perfect, but the most practical choice." I accepted this trade-off.

Wrapping Up

Paging splits memory into fixed-size chunks (frames) and programs into same-sized chunks (pages).

The core insight is separating logical and physical addresses. Programs believe they start at address 0, oblivious to where they're actually stored in RAM. The MMU translates addresses using the page table.

Page size is typically 4KB. Too small bloats the page table; too large increases internal fragmentation. Nowadays, Huge Pages (2MB, 1GB) are used for large programs.

The TLB caches page table lookups, speeding up translation. Thanks to locality, hit rates are high.

Understanding paging illuminated how malloc works, why OOM errors happen, and how to tune memory. All OS memory management revolves around paging. I can't do systems programming without understanding this. I've accepted that.