File System: Organizing Data

File System: The Interface Between Chaos and Order

When I first tried to understand file systems, I realized I'd been taking something completely for granted. The act of "saving a file" is the result of incredibly complex abstraction layers that I never questioned.

Without a file system, how would we store data? We'd have to do something like: "Write to sector 10,234,512 on the hard disk, 500 bytes." No filenames. No folders. Just raw physical addresses. Terrifying. That's the reality of a raw disk.

I truly understood file systems when I was studying Docker. I encountered OverlayFS, which was described as "layering one file system on top of another," and my mind was blown. That's when it clicked. File systems are just interfaces for data access—they're abstraction layers that can be stacked multiple levels deep.

What Confused Me Most

Three things really tripped me up when learning about file systems.

First: Is a file system part of the OS or part of the disk? The answer is both. The disk has the file system's "format" (metadata structure) physically written to it, and the OS has a "driver" that reads and interprets it. When you plug an NTFS-formatted disk into Linux, the Linux kernel's NTFS driver reads it. The disk is the storage medium, the OS is the translator.

Second: What's the difference between a block and a sector? A sector is a hardware unit (usually 512 bytes), while a block is a file system unit (usually 4KB). File systems don't work with sectors directly—they bundle multiple sectors into blocks for efficiency. Even a 1-byte file takes up at least one block (4KB). This is what causes internal fragmentation.

Third: What the hell is an inode? The explanation "file metadata" was too abstract for me at first. Here's how I understood it: an inode is a library catalog card. The book title (filename) is in the directory, and the catalog card contains "which shelf it's on" (block addresses), "how many pages" (file size), and "who can check it out" (permissions). The actual book content is on the shelves (blocks).

Core File System Architecture: Blocks, Inodes, Directories

To truly understand file systems, you need to nail down three concepts.

1. Blocks: The Actual Data Storage Units

Disks are divided into fixed-size units called blocks. In ext4, they're typically 4KB. File contents are stored in these blocks. Small files fit in one block, large files span multiple blocks.

The question is: "Which blocks should we use?" There are three approaches.

Contiguous Allocation: Store files in consecutive blocks. Fast, but causes severe external fragmentation. Old-school method.

Linked Allocation: Each block points to the address of the next block. FAT32 uses this. No fragmentation, but random access is slow.

Indexed Allocation: Store a list of block addresses in the inode. ext4 and NTFS use this. Fast and flexible. Most modern systems use this approach.

2. Inodes: File Metadata

An inode contains all information about a file: size, permissions, owner, timestamps, and most importantly, the list of data block addresses.

The critical thing: inodes don't know the filename. Filenames are managed by directories. This is how hard links work. Multiple filenames can point to the same inode.

In Linux, you can see inode numbers with ls -i.

$ ls -i myfile.txt
1234567 myfile.txt

1234567 is the inode number. This number is used to locate the file's actual metadata.

3. Directories: Name-to-Inode Mapping Tables

A directory is actually a special type of file. Its contents are a "filename → inode number" mapping table.

.          -> inode 2
..         -> inode 1
myfile.txt -> inode 1234567
subdir     -> inode 1234568

When opening a file, the OS does this:

Parse the path (/home/user/myfile.txt)
Read the root directory (/) inode
Find the inode number for home
Read the home directory to find the user inode
Read the user directory to find the myfile.txt inode
Read the data block addresses from that inode
Read the actual data from the blocks

This process is so slow that the OS uses a dentry cache (directory entry cache).

Free Space Management: How Do We Know What's Empty?

To save a file, we need to find empty blocks. How does a file system track free space? Two main approaches.

Bitmap: One bit per block. 0 means free, 1 means used. Simple and fast. ext4 uses this.

Blocks: 0  1  2  3  4  5  6  7
Bits:   1  1  0  1  0  0  0  1
        ↑     ↑     ↑  ↑  ↑
       used used free used free free free used

Linked List: Manage free blocks as a linked list. The first free block stores the address of the next free block. Simple but slow. Old-school approach.

Fragmentation and Journaling: Real-World Problems

External Fragmentation

As you create and delete files, free space gets scattered into fragments all over the place. If you want to store a 100MB file but your 200MB of free space is scattered into twenty 10MB chunks, you can't store it. That's external fragmentation.

Old Windows systems had to do "disk defragmentation" periodically. It physically reorganized files to consolidate free space. Modern SSDs don't need this because random access is fast.

Journaling: Crash Protection

What happens if the power goes out while writing a file? If the data blocks are written but the inode isn't updated? The file system corrupts.

Journaling prevents this. Before writing data, it first writes "I'm about to do this" to a log (journal). After a crash and reboot, it reads the journal and either completes or rolls back incomplete operations.

ext4, NTFS, and APFS all support journaling. Same principle as WAL (Write-Ahead Logging) in databases.

# Check ext4 journal status
$ sudo dumpe2fs /dev/sda1 | grep "Filesystem features"
Filesystem features:      has_journal ext_attr resize_inode

File System Characteristics and Use Cases

Let me break down the file systems I encounter most often.

FAT32: The Compatibility King

Released in 1996. Ancient, but every OS supports it. Almost all USB drives are FAT32. The downsides are clear:

Maximum 4GB per file
No permission management
No journaling

But USB drives and SD cards still use FAT32 because they need to work with Windows, Mac, Linux, TVs, game consoles—everything.

NTFS: Windows Standard

New Technology File System. Been around since Windows NT.

Virtually no file size limit (16EB)
Supports permissions, encryption, compression
Journaling support

The problem is that write support on Mac and Linux is unstable. Linux's ntfs-3g driver is slow. For cross-platform external drives, exFAT is better.

ext4: Linux's Heart

Extended File System 4. De facto Linux standard. Most common on servers.

Extremely stable
Maximum 1EB volume, 16TB file
Journaling, online resizing support
Fast fsck (file system check)

When I set up a Linux server, I always use ext4. Battle-tested stability is king.

# Format as ext4
$ sudo mkfs.ext4 /dev/sdb1

# Check file system info
$ df -Th
Filesystem     Type      Size  Used Avail Use% Mounted on
/dev/sda1      ext4       20G   12G  7.2G  63% /

APFS: Apple's Future

Apple File System. Released in 2017. Optimized for SSDs.

Snapshot support (Time Machine backups became blazingly fast)
Cloning (Copy-on-Write)
Strong encryption
Space sharing (multiple volumes share space from the same pool)

The boot speed improvement on Macs is thanks to APFS. Metadata access is way faster than HFS+.

XFS and Btrfs: Server-Grade Modern Tech

XFS: Optimized for large files. Used in video editing servers and big data platforms. Red Hat pushes it.

Btrfs: Copy-on-Write, snapshots, built-in RAID. Called "the next-gen Linux file system" but still has stability concerns. SUSE uses it by default.

I haven't used Btrfs in production yet. ext4 is just too stable.

Practical Usage: Working with File Systems in the Terminal

Theory without practice is meaningless. Here are commands I use regularly.

Check Disk Usage

# Check overall file system capacity
$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        20G   12G  7.2G  63% /
/dev/sdb1       100G   45G   55G  45% /data

# Check specific directory size
$ du -sh /var/log
2.3G    /var/log

# Subdirectory sizes
$ du -h --max-depth=1 | sort -hr
12G     .
5.2G    ./node_modules
3.8G    ./build
2.1G    ./data

Mount File Systems

# Check mounted file systems
$ mount | grep "^/dev"
/dev/sda1 on / type ext4 (rw,relatime)
/dev/sdb1 on /data type ext4 (rw,relatime)

# Mount new disk
$ sudo mount /dev/sdc1 /mnt/backup

# Auto-mount on boot (edit fstab)
$ sudo vim /etc/fstab
# /dev/sdc1 /mnt/backup ext4 defaults 0 2

Create and Format File Systems

# Format as ext4
$ sudo mkfs.ext4 /dev/sdb1

# Format as XFS
$ sudo mkfs.xfs /dev/sdb1

# Check and repair file system
$ sudo fsck /dev/sdb1

# Check inode usage (important when you have tons of files)
$ df -i
Filesystem      Inodes  IUsed   IFree IUse% Mounted on
/dev/sda1      1310720 234567 1076153   18% /

Docker and OverlayFS: Layered File Systems

Studying Docker taught me the real power of file systems. Docker uses OverlayFS.

OverlayFS stacks multiple file systems to make them appear as one. The lower layer is read-only, the upper layer is read-write. When you modify a file, the lower layer stays intact and only the changes are saved to the upper layer. That's Copy-on-Write.

This explained why Docker images are stacked in layers and why base images can be shared. It's all thanks to file system abstraction.

# Check Docker image layers
$ docker image inspect nginx:latest | jq '.[0].RootFS.Layers'
[
  "sha256:abc123...",
  "sha256:def456...",
  "sha256:ghi789..."
]

# Check actual OverlayFS mount
$ mount | grep overlay
overlay on /var/lib/docker/overlay2/abc123/merged type overlay

Understanding this made it crystal clear why containers are fast and disk-efficient.

My Conclusion

What I learned from studying file systems is this: all abstraction is trade-off.

FAT32 is simple and compatible but lacks features. NTFS has tons of features but poor cross-OS support. ext4 is stable but lacks modern features (snapshots, CoW). Btrfs has great features but isn't mature yet.

There's no single answer. Choose based on use case is the key.

USB drives: FAT32 or exFAT
Windows system drive: NTFS
Linux server: ext4 or XFS
Mac: APFS
Docker storage: overlay2

File systems are the most fundamental OS component, but also the most complex and critical. They're the magic that transforms a massive blob of numbers (raw disk) into the familiar concept of "files." Thanks to this magic, we don't have to memorize sector numbers.

Now every time I run df -h, I think about the blocks, inodes, and journals behind those numbers. And I'm grateful I don't have to live in a world without file systems.