User Mode vs Kernel Mode: Dual Protection

Memories of Blue Screen

Let me recall the Windows 95 era. Blue Screens appeared daily.

The reason was simple. A buggy application accidentally touched OS memory areas. One app's mistake killed the whole computer. When one program triggered a bug, the entire system crashed. Unsaved documents? Gone.

After experiencing this disaster, I understood it this way. Without boundaries between programs, the system collapses.

To prevent this catastrophe, CPUs implemented Modes at the hardware level. Not in software, but security mechanisms etched into silicon.

Difference in Power: Citizen vs Police

I didn't understand this concept initially. "Why can't programs directly read the disk?" I thought imposing restrictions was inefficient.

But this analogy clicked for me.

Imagine a city.

Regular citizens can do anything in their own homes. Cook, watch TV, write code. But they cannot break into the police station and directly modify the criminal record database. To do that, they must go to the police station counter and follow "official procedures."

1. User Mode

Status: Civilian.
Power: Limited. Can only touch its own allocated memory.
Trait: All code we write (Hello World, Web Browsers, VS Code, Games) runs here.
Restriction: Absolutely NO direct access to hardware (Disk, Network Card, USB).
Ring Level: Ring 3 (lowest privilege)

2. Kernel Mode

Status: Police / Administrator.
Power: Unlimited (Privileged). Can execute all CPU instructions and access all memory.
Trait: Only OS Kernel runs here.
Privileged Instructions: HLT (halt CPU), CLI (disable interrupts), I/O port control
Ring Level: Ring 0 (highest privilege)

This was it. Privilege Separation. Physically isolating untrusted code from trusted code.

CPU Protection Rings: 4 Security Layers

Initially, I thought there were only two modes: "User Mode / Kernel Mode." I was wrong.

x86 CPUs actually define 4 Protection Rings.

Ring 0: Kernel (OS kernel)
  ↓
Ring 1: Device Drivers (theoretical, rarely used)
  ↓
Ring 2: Device Drivers (theoretical, rarely used)
  ↓
Ring 3: Applications (our programs)

Most modern OSes only use Ring 0 and Ring 3. Rings 1 and 2 are effectively abandoned territory. Because the security gain versus complexity wasn't worth it.

But when virtualization technology emerged, a new level appeared.

Ring -1: Hypervisor (VMware, KVM, Xen)
  ↓
Ring 0: Guest OS Kernel (Linux inside a VM)
  ↓
Ring 3: Apps inside VM

I summarized it this way. The lower the ring number, the closer to hardware and the greater the privilege. Closer to 0 means god-like powers, closer to 3 means prisoner.

Mode Switching Mechanism: Trap, Interrupt, Exception

I initially mistook system calls as "function calls."

I was wrong.

System calls are software interrupts.

Normal function call:

int result = add(3, 5); // executes at same privilege level

System call:

int fd = open("/etc/passwd", O_RDONLY); // CPU mode switch occurs

When you call the open() function, this happens internally.

mov eax, 5        ; syscall number (open = 5 in x86)
mov ebx, filename ; first argument
mov ecx, O_RDONLY ; second argument
int 0x80          ; <- This is key! Software interrupt

When the int 0x80 instruction executes:

CPU immediately switches from user mode to kernel mode
References the Interrupt Descriptor Table (IDT) to find the 0x80 handler
Jumps to the kernel's system_call() function
Kernel opens the file
Returns to user mode

This clicked for me. System calls are not jumps, they're "traps." Voluntarily entering prison to ask a favor from the warden (kernel).

Trap vs Interrupt vs Exception

I often confused these three. I accepted it this way.

Type	Trigger	Example
Trap	Program intentionally triggers	System call (`int 0x80`, `syscall`)
Interrupt	Hardware triggers	Keyboard input, timer, network packet arrival
Exception	CPU detects abnormal situation	Divide by zero, Page Fault, Segmentation Fault

All cause kernel mode transition. The difference is "who pulls the trigger."

Real System Calls: Tracing with strace

I knew the concept but had never seen it in action. So I tried strace.

strace ls

Output:

execve("/bin/ls", ["ls"], [/* env vars */]) = 0
brk(NULL)                               = 0x55b8f0a0e000
access("/etc/ld.so.preload", R_OK)     = -1 ENOENT
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=95788, ...}) = 0
mmap(NULL, 95788, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8c9c0a0000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0"..., 832) = 832
...
write(1, "file1.txt\nfile2.txt\n", 20)  = 20
close(1)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

I was shocked. A single ls command makes dozens of system calls.

openat(), fstat(), mmap(), read(), write(), close()... each one causes a user → kernel → user transition.

I understood it this way. Programs don't talk directly to hardware. They only talk through the kernel.

Cost of Mode Switching: Context Switch Overhead

Now I understood why system calls are expensive.

One mode switch consumes hundreds to thousands of CPU cycles.

Why?

Save Registers: Back up all user mode CPU registers to the stack
Switch Page Tables: Switch to kernel memory address space
Cache Invalidation: Some CPU cache (L1, L2) entries may be invalidated
Permission Check: Kernel verifies "Does this process have permission to open this file?"
Return Process: Restore registers when returning to user mode

Doing this once is fine. But doing it 1 million times per second?

// Inefficient code
for (int i = 0; i < 1000000; i++) {
    write(fd, &i, sizeof(i)); // 1 million system calls!
}

This code is terribly slow because every loop iteration causes a user → kernel → user transition.

Solution: Buffered I/O

// Efficient code
char buffer[4096];
int pos = 0;
for (int i = 0; i < 1000000; i++) {
    memcpy(&buffer[pos], &i, sizeof(i));
    pos += sizeof(i);
    if (pos >= 4096) {
        write(fd, buffer, pos); // system call only when buffer fills
        pos = 0;
    }
}

I summarized it this way. Reducing system calls is key to performance optimization.

Why Do Docker Containers Run in User Space?

While using Docker, I wondered: "Containers are isolated environments, so why are they faster than VMs?"

The answer was they share the kernel.

[VM Structure]
App A (Ring 3)
  ↓
Guest OS Kernel (Ring 0)
  ↓
Hypervisor (Ring -1)
  ↓
Host OS Kernel (Ring 0)
  ↓
Hardware

[Container Structure]
App A (Ring 3)
  ↓
Host OS Kernel (Ring 0) <- direct system call
  ↓
Hardware

VMs require 2-stage mode switching (Guest → Hypervisor → Host). Containers require 1 stage (App → Host Kernel).

That's why Docker is fast. But it's less secure. Because if there's a kernel vulnerability, container escape is possible.

I accepted it this way. Docker isn't isolation, it's an "illusion" using namespaces and cgroups. They actually share the same kernel.

Danger of Kernel Modules: Ring 0 Code Injection

I tried creating a Linux kernel module for the first time.

// hello.c - kernel module
#include <linux/module.h>
#include <linux/kernel.h>

int init_module(void) {
    printk(KERN_INFO "Hello Kernel!\n");
    return 0;
}

void cleanup_module(void) {
    printk(KERN_INFO "Bye Kernel!\n");
}

After compilation:

sudo insmod hello.ko

At this moment, my code executes at Ring 0.

I am now god. I can read and write all memory, kill any process, intercept keyboard input.

One wrong line:

*(int*)0 = 42; // NULL pointer dereference

Kernel Panic. Entire system dies.

I understood it this way. Kernel mode is absolute power, and absolute power is absolutely dangerous.

That's why Linux requires sudo to install kernel modules. Without administrator privileges, you cannot inject Ring 0 code.

Spectre and Meltdown: Mode Boundary Collapse

In 2018, I saw news about Spectre and Meltdown vulnerabilities. Initially, I didn't understand.

"What does a CPU bug matter?"

But this was serious. User mode programs could read kernel memory.

I simplified the principle.

// Spectre attack example (simplified)
char kernel_memory[4096]; // kernel area (inaccessible)
int secret = kernel_memory[0]; // <- should raise exception here

// But CPU's "Speculative Execution" already executed it
// Data loaded into cache before exception occurs

// Attacker can infer secret value through cache timing attacks

CPUs predict and execute ahead for performance. Later they realize "Oh, I shouldn't have executed this" and roll back, but traces remain in CPU cache.

This was it. Hardware optimization became a security hole.

Solution: KPTI (Kernel Page Table Isolation). When in user mode, kernel memory is completely removed from the page table. But this increased system call costs by 10-30%.

I accepted it this way. Security and performance are a trade-off.

/proc/interrupts: Traces of Interrupts

I was curious how often the kernel does mode switching.

cat /proc/interrupts

Output:

           CPU0       CPU1       CPU2       CPU3
  0:        142          0          0          0   IO-APIC   2-edge      timer
  1:          9          0          0          0   IO-APIC   1-edge      i8042
  8:          0          0          0          0   IO-APIC   8-edge      rtc0
  9:          0          0          0          0   IO-APIC   9-fasteoi   acpi
 12:        155          0          0          0   IO-APIC  12-edge      i8042
...
NMI:        123        456        789        101   Non-maskable interrupts
LOC:    5234567    5234568    5234569    5234570   Local timer interrupts

Look at LOC (Local timer interrupts). Over 5 million times.

Timer interrupts typically occur every 1ms (1000 Hz). This means the computer has been up for about 5000 seconds (1.5 hours).

1000 times per second, forced transition to kernel mode. Whether the program wants it or not.

I understood it this way. Interrupts are forced summons to the CPU.

Summary: This Was It

I summarized user mode and kernel mode this way.

User Mode (Ring 3): Prisoner. Monitored and can only act in restricted space.
Kernel Mode (Ring 0): Warden. Can open all doors, control all prisoners.
System Call: Prisoner submitting a request to the warden. Expensive but safe.
Mode Switch Cost: Consumes hundreds of cycles due to register saving, page table switching, cache invalidation.
Buffered I/O: Improve performance by reducing system call count.
Docker: Fast because it shares kernel, but exposed to kernel vulnerabilities.
Kernel Module: Injecting Ring 0 code. One line mistake kills entire system.
Spectre/Meltdown: CPU speculative execution became a security hole. Patches degraded performance.

This was it. Computers are a hierarchy of trust. Ring 3 trusts Ring 0, Ring 0 trusts hardware, and hardware... trusts its designers.

User Mode vs Kernel Mode: Dual Protection

Related Posts

Memory Management: Contiguous vs Non-Contiguous Allocation

BFS vs DFS: Graph Traversal

Browser Storage Guide: Cookies vs LocalStorage vs IndexedDB vs Cache API

Quick Sort: Divide and Conquer