System Call: How to Ask Kernel for Favors

System Call: The Bridge Between Your Code and the Metal

When I Couldn't Open a File

When I started my first startup, I built an image upload feature with Node.js. I knew fs.readFile() would read a file, but I had no idea what happened inside. I just assumed "Node handles it" and moved on.

Then production hit. File reads became so slow they bottlenecked the entire system. Why? I learned the hard way that fs.readFile() isn't just grabbing data from memory. It's asking the kernel for a favor, and that favor has a cost.

That favor mechanism is the System Call.

Why Can't My Code Touch the Hard Drive Directly?

Imagine if every program could move the hard drive's read head directly. Program A says "read here!" and Program B immediately moves the head somewhere else yelling "no, me first!" Data gets corrupted. Security becomes zero.

So operating systems split CPU privilege modes into two levels.

1. User Mode: Where normal applications run. Limited sandbox. No direct hardware access. Can only touch your own memory.

2. Kernel Mode: Where the OS kernel runs. Full control. Hardware manipulation, total memory access, I/O device control.

Think of it like a hotel system. Guests (User Mode) only have keys to their rooms. To use shared facilities, they call the front desk (Kernel). The staff (Kernel Mode) holds master keys to everything.

That "phone call" is the System Call.

The Mechanism: Trap Instruction

System calls are implemented with special CPU instructions.

x86 architecture: int 0x80 (legacy) or syscall (modern)
ARM architecture: svc (supervisor call)

When you execute this instruction, a Trap occurs. The CPU declares "switching to kernel mode now." At the hardware level, the privilege level changes and the CPU jumps to a predefined kernel address.

In Linux, this entry point is entry_SYSCALL_64 in assembly. Here, the kernel asks "what system call did you request?"

The System Call Table: Kernel's Menu

The kernel maintains a System Call Table, an array mapping numbers to function pointers. Like a restaurant menu, each number points to a specific function.

// Linux kernel's system call table (simplified)
const sys_call_ptr_t sys_call_table[] = {
    [0] = sys_read,
    [1] = sys_write,
    [2] = sys_open,
    [3] = sys_close,
    [57] = sys_fork,
    [59] = sys_execve,
    // ... 300+ more
};

When a user program invokes a system call, it puts the system call number into a register (on x86-64, that's rax). The kernel uses this number as an index to find and execute the corresponding function.

Example: The Flow of write() System Call

// C program
printf("Hello");

printf() internally calls the write() library function
write() is a wrapper provided by glibc
The wrapper sets registers:
- rax = 1 (sys_write number)
- rdi = 1 (file descriptor, stdout)
- rsi = address of "Hello"
- rdx = 5 (character count)
Execute syscall instruction → Switch from User Mode to Kernel Mode
Kernel looks up sys_call_table[1] → runs sys_write()
"Hello" appears on screen
sysret instruction → Return from Kernel Mode to User Mode

This entire process is called a Context Switch. The CPU's state (registers, stack pointer, etc.) must be saved and restored, which costs cycles.

Common System Calls You Actually Use

Linux has over 300 system calls, but the frequently used ones are predictable.

File I/O

int fd = open("/tmp/data.txt", O_RDWR | O_CREAT, 0644);
write(fd, "Hello", 5);
read(fd, buffer, 100);
close(fd);

open(): Opens a file, returns a file descriptor (integer)
read(): Reads data from file into buffer
write(): Writes buffer data to file
close(): Closes the file descriptor

Process Control

pid_t pid = fork();  // Clone current process
if (pid == 0) {
    // Child process
    execve("/bin/ls", args, env);  // Replace with new program
} else {
    // Parent process
    wait(NULL);  // Wait for child to finish
}

fork(): Duplicates the current process to create a child
exec(): Replaces current process with a different program
wait(): Waits for child process termination

Memory Management

void* ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
                 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
// Allocate a 4KB memory page
munmap(ptr, 4096);  // Free the memory

mmap(): Maps files into memory or allocates anonymous memory
brk()/sbrk(): Adjusts heap size (used internally by malloc)

Device Control

int fd = open("/dev/ttyUSB0", O_RDWR);
ioctl(fd, TIOCMGET, &status);  // Read serial port status

ioctl(): Sends device-specific commands (universal control interface)

POSIX Standard: Unified System Call Interface

The problem is that every OS has different system calls. Linux's open() and Windows's CreateFile() are completely different functions.

So the POSIX (Portable Operating System Interface) standard emerged. Unix-like OSes (Linux, macOS, BSD) agreed to provide the same system call interface.

For example, POSIX specifies the signature and behavior of functions like open(), read(), write(), fork(). Thanks to this, C code written on Linux can be recompiled on macOS and just work.

But Windows doesn't follow POSIX. Windows uses its own Win32 API system.

POSIX (Linux/Mac)	Win32 API (Windows)
`open()`	`CreateFile()`
`read()`	`ReadFile()`
`fork()`	`CreateProcess()`
`execve()`	`CreateProcess()`

That's why cross-platform programs typically go through libraries like libc to call OS-specific system calls.

libc: The System Call Wrapper

When we use printf() in C, we don't manually put system call numbers into registers. Instead, we call functions provided by libc (the C standard library).

libc provides wrapper functions around system calls.

// glibc's write() wrapper (simplified)
ssize_t write(int fd, const void *buf, size_t count) {
    ssize_t result;
    asm volatile (
        "mov $1, %%rax\n"      // sys_write number
        "mov %1, %%rdi\n"      // fd
        "mov %2, %%rsi\n"      // buf
        "mov %3, %%rdx\n"      // count
        "syscall\n"
        "mov %%rax, %0\n"
        : "=r" (result)
        : "r" (fd), "r" (buf), "r" (count)
        : "rax", "rdi", "rsi", "rdx"
    );
    return result;
}

Why use wrappers?

Portability: Same code works across different OSes
Convenience: Hides complex register manipulation
Error Handling: Sets errno when system calls fail

Peeking at System Calls with strace

Curious which system calls your program makes? Use strace.

strace ls

Output:

execve("/bin/ls", ["ls"], 0x7ffd...) = 0
brk(NULL)                               = 0x55a1b2000000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=123456, ...}) = 0
mmap(NULL, 123456, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8a...
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3...", 832)     = 832
...
write(1, "file1.txt\nfile2.txt\n", 20)  = 20
exit_group(0)                           = ?

Each line is a system call invocation. Running just the ls command triggers dozens of system calls.

To trace specific system calls only:

strace -e trace=open,read,write cat file.txt

You can even trace Node.js applications:

strace -e trace=read,write node app.js

That's how I first discovered that fs.readFile() internally calls openat(), fstat(), read(), and close() in sequence.

The Journey from Node.js to System Calls

What happens when you call fs.readFile() in Node.js?

const fs = require('fs');
fs.readFile('/tmp/data.txt', 'utf8', (err, data) => {
    console.log(data);
});

Internal flow:

JavaScript layer: Call fs.readFile()
Node.js binding: Pass to C++ layer (binding.cc)
libuv: Async I/O library submits file operation to thread pool
POSIX API: libuv calls open(), fstat(), read(), close()
libc wrapper: glibc wrapper sets system call numbers
System call: syscall instruction enters kernel
Kernel: Executes sys_openat(), sys_read(), etc. for actual file I/O
Return: Copies data to userspace buffer and returns
Callback: libuv registers callback in event loop, passes result to JavaScript

In other words, a simple fs.readFile() traverses multiple layers and ultimately resolves to kernel system calls. This process involves at least 4 User Mode ↔ Kernel Mode switches (open, fstat, read, close).

The Cost of System Calls: Context Switch Overhead

System calls aren't free. Every User Mode to Kernel Mode transition involves:

Saving current register state
Switching to kernel stack
Changing privilege level
Potentially flushing TLB (Translation Lookaside Buffer)
Increased cache misses

Typically, a single system call takes hundreds of nanoseconds. That's 100+ times slower than a function call (a few nanoseconds).

That's why high-performance applications minimize system calls.

Bad example:

for (int i = 0; i < 1000000; i++) {
    write(fd, &data[i], 1);  // Write 1 byte at a time → 1 million syscalls
}

Good example:

write(fd, data, 1000000);  // Write all at once → 1 syscall

This is exactly why buffering matters.

vDSO: Kernel Features Without System Calls

Linux introduced vDSO (virtual Dynamic Shared Object) to reduce overhead for frequently used system calls.

vDSO is a small shared library mapped by the kernel into userspace. It contains implementations of lightweight system calls like gettimeofday(), clock_gettime(), getcpu().

These functions execute directly in userspace without kernel mode switching. The kernel periodically updates data like current time in a memory region, and user programs just read it.

Regular system call:

User Mode → syscall → Kernel Mode → sysret → User Mode

vDSO system call:

User Mode → read memory → User Mode (no switching!)

The speed difference is 10x or more.

Windows Approach: Win32 API

Windows doesn't follow POSIX and has its own system call architecture.

User Mode: Win32 API (kernel32.dll, user32.dll, etc.)
Kernel Mode: Native API (ntdll.dll → actual system calls)

For example, reading a file:

// Win32 API
HANDLE hFile = CreateFile("file.txt", GENERIC_READ, ...);
DWORD bytesRead;
ReadFile(hFile, buffer, 100, &bytesRead, NULL);
CloseHandle(hFile);

Internally:

ReadFile() → calls NtReadFile() (Native API)
syscall instruction enters kernel
Windows kernel executes NtReadFile()

Windows doesn't publicly document system call numbers, and they can change between versions, so direct invocation is discouraged. Always go through Win32 API.

Closing: The Power of Understanding Abstraction Layers

When I first used fs.readFile(), I just thought "it's a function that reads files." But underneath, there are:

JavaScript engine
Node.js bindings
libuv event loop
POSIX wrappers
System call table
Kernel filesystem driver
Hardware controller

Multiple layers of abstraction stacked on top of each other.

Once I understood this, I started seeing why bottlenecks happen and where optimization is possible. Reading 100 files separately versus batching them, reducing system call counts with buffering.

System calls aren't just "how to talk to the kernel." They're the first gateway to understanding how your code actually moves hardware. And once you pass through that gateway, 90% of performance issues become explainable.