
Journaling File System: Safe Writing
What if pull the plug while copying files? OS's habit of logging to prevent corruption.

What if pull the plug while copying files? OS's habit of logging to prevent corruption.
Why does my server crash? OS's desperate struggle to manage limited memory. War against Fragmentation.

Two ways to escape a maze. Spread out wide (BFS) or dig deep (DFS)? Who finds the shortest path?

Fast by name. Partitioning around a Pivot. Why is it the standard library choice despite O(N²) worst case?

Establishing TCP connection is expensive. Reuse it for multiple requests.

Physical server environments have a well-known vulnerability: unexpected power loss. Datacenter incident reports follow a familiar pattern. The UPS batteries drain, servers die mid-operation, and on the next boot the filesystem is corrupted. The machine won't even start. Backups help, but any work in progress since the last checkpoint is gone.
The question is: "Isn't saving a file just... writing to disk? Why does a power loss corrupt everything?" It turns out saving a single file involves multiple steps across different disk locations. Interrupt the process midway, and you get inconsistent state. The solution modern filesystems use is called journaling, borrowed directly from database transaction logs.
Journaling is essentially Write-Ahead Logging (WAL) applied to filesystems. Before doing the actual work, you write a log saying "I'm about to do this." It's like making a photocopy of a contract before signing the original. If something goes wrong, you have a record of what you were trying to do.
When I first learned about filesystems, I naively thought write() was atomic. Just call it once, done. But reality is messier.
Creating a file and writing data involves several discrete steps:
Each of these is a separate disk write. Disks guarantee atomic sector writes, but file creation spans multiple sectors. If power dies during step 3, you get:
Contradictory state. Filesystem corruption.
Old filesystems (ext2, FAT32) had one solution: fsck (File System Check). This tool scans the entire disk, checking consistency between inodes, bitmaps, and directory structures. The problem? It's slow. A 1TB disk could take hours to check during every boot.
# Run fsck on ext2 filesystem (happens automatically on boot)
# Time scales with disk size - can take hours
fsck.ext2 /dev/sda1
I felt frustrated when I first encountered this. Filesystems seemed fundamentally fragile. But then I noticed: modern filesystems like ext4 and NTFS boot in seconds, even after crashes. How?
One day I was reading about databases and stumbled on Write-Ahead Logging (WAL). Before changing data, databases write a log describing what they're about to change. If a crash happens mid-transaction, they replay the log to recover.
The lightbulb moment: filesystems can do the same thing. Before performing file operations, write them to a journal - a special disk area that acts like a diary of "work I'm about to do."
Here's the journaling workflow:
If power dies after step 3 but during step 4? On reboot, the filesystem reads the journal. "Ah, transaction #1234 committed but wasn't applied yet." It replays the journal entries (Redo). If power died during step 2 before commit? No commit record exists, so it ignores the incomplete transaction (Undo).
The genius part: recovery time is independent of disk size. fsck scans the entire disk, but journaling only reads the journal (typically hundreds of MB). Recovery takes seconds, not hours.
# Check ext4 journal information
sudo dumpe2fs /dev/sda1 | grep -i journal
# Output:
# Journal inode: 8
# Journal backup: inode blocks
# Journal size: 128M
As I dug deeper, I learned there are different flavors of journaling. ext4 supports three modes:
Logs both metadata and actual data to the journal. You write data twice: once to journal, once to final location. Maximum safety, but performance hit.
# Mount ext4 in journal mode
sudo mount -o data=journal /dev/sda1 /mnt
Only logs metadata to journal, not data. But enforces ordering: write data first, then log metadata. This ensures metadata never points to garbage. If metadata recovery succeeds, file contents are consistent.
# Mount ext4 in ordered mode (default)
sudo mount -o data=ordered /dev/sda1 /mnt
Only logs metadata, no ordering guarantees. Data might be written after metadata. After a crash, metadata might recover but point to garbage blocks. Best performance, worst safety.
# Mount ext4 in writeback mode
sudo mount -o data=writeback /dev/sda1 /mnt
The journal operates like a circular buffer. When full, it overwrites the oldest entries (already checkpointed). Each journal entry looks roughly like this:
// Simplified journal transaction structure
struct journal_transaction {
uint32_t transaction_id; // Unique transaction ID
uint32_t sequence_num; // Sequence number in journal
uint32_t num_blocks; // Number of blocks to modify
block_update blocks[]; // The actual changes
uint32_t commit_record; // Commit marker (present = committed)
};
struct block_update {
uint32_t block_number; // Which block to modify
char data[4096]; // New block contents (full copy)
};
NTFS uses similar journaling via a special file called $LogFile. Unlike ext4, NTFS only journals metadata, never data.
There are two recovery strategies:
Most journaling filesystems use redo logging because it's simpler: just check if a commit record exists.
Newer filesystems like ZFS and Btrfs use Copy-on-Write (COW) instead of journaling. Instead of modifying data in place, they write new copies and atomically update pointers. No journal needed - the old version stays valid until the pointer switches.
# Create ZFS filesystem (COW-based)
zpool create mypool /dev/sdb
zfs create mypool/data
COW enables free snapshots and clones, but can cause fragmentation.
Now when I set up production servers, I always consider journaling modes. Most Linux distros default to ext4 ordered mode, but for critical workloads like databases, I consider journal mode.
Interestingly, databases have their own WAL, so you get double logging: filesystem journal + database WAL. Slight performance hit, but maximum safety. PostgreSQL has pg_wal/, MySQL has ib_logfile*.
# Check PostgreSQL WAL directory
ls -lh /var/lib/postgresql/14/main/pg_wal/
# Contains 16MB WAL segment files
# Check MySQL redo logs
ls -lh /var/lib/mysql/ib_logfile*
In cloud environments, even block storage like EBS uses internal journaling. Modern systems have journaling at multiple layers.
The speed difference between fsck and journal recovery is massive. A 1TB ext2 disk corruption can take 6 hours to fsck. After migrating to ext4, post-crash boot takes around 10 seconds.
One practical consideration: databases often recommend turning off filesystem journaling (or using writeback mode) because they handle consistency themselves. The double-write overhead isn't worth it when the database already guarantees ACID. But for general-purpose servers, I keep ordered mode enabled.
Another lesson: journaling isn't free. The journal area uses disk space (usually 128MB-1GB), and writing to the journal before actual data adds latency. For write-heavy workloads, this can be 5-10% slower. But the tradeoff is worth it - crash recovery that takes seconds instead of hours means better uptime.
I've also learned to monitor journal health. A corrupted journal is bad news. On ext4, you can check journal status:
# Check filesystem state including journal health
sudo tune2fs -l /dev/sda1 | grep -i journal
# Look for "Journal: healthy" or similar indicators
If the journal itself gets corrupted, you're back to full fsck. Thankfully, journals are small and written sequentially, so they rarely fail.
Journaling taught me a fundamental systems design principle: log your intent before taking action. This simple idea revolutionized filesystem reliability.
As a founder, the lesson I took away is that complexity is worth it for resilience. Journaling makes writes slightly slower, but recovery goes from hours to seconds. That's a massive win for business continuity.
Even though I mostly use cloud storage now, the journaling concept remains relevant. We apply it everywhere: database design, distributed system logging, even application-level operations. The pattern of "log before acting, recover from logs" is universal.
Understanding journaling also made me more confident deploying systems. I know that unexpected power loss won't destroy data, because modern filesystems keep careful diaries. It's one less thing to worry about, and in a startup, reducing operational anxiety is valuable.
The filesystem's diary habit became my own habit: always log what you're about to do, especially in production systems. It's saved me countless times.