
B-Tree: The Algorithm Behind Database Indexes
Binary Tree is for RAM. Disk is slow. B-Tree minimizes Disk I/O by being short and fat. Difference between B-Tree and B+Tree, and why databases love them.

Binary Tree is for RAM. Disk is slow. B-Tree minimizes Disk I/O by being short and fat. Difference between B-Tree and B+Tree, and why databases love them.
Why does my server crash? OS's desperate struggle to manage limited memory. War against Fragmentation.

Two ways to escape a maze. Spread out wide (BFS) or dig deep (DFS)? Who finds the shortest path?

Fast by name. Partitioning around a Pivot. Why is it the standard library choice despite O(N²) worst case?

Establishing TCP connection is expensive. Reuse it for multiple requests.

I still remember my first internship project. I was dealing with a database of about a million orders.
In my local environment with 1,000 rows, queries ran in 0.005 seconds. But on the production server with 1,000,000 rows, the same query took 2.5 seconds.
It was 500 times slower.
I urgently added an index:
CREATE INDEX idx_user_id ON orders(user_id);
The result? 0.003 seconds. It became magically fast.
What on earth did CREATE INDEX do?
Behind this magic lies a data structure called B-Tree.
Why don't databases use simple Arrays or Binary Search Trees (BST)?
O(1)), but inserting data requires shifting all elements. Write performance is terrible.O(log N)), but relies on pointers in memory.The critical problem is the physical limitation of the Hard Disk (HDD/SSD).
If you use a standard Binary Tree on a Disk, accessing a node means a disk seek. If the tree height is 20, you need 20 disk seeks. That's 0.2 seconds per query just for seeking. In a world where 0.003 seconds is the target, 0.2 seconds is unacceptable.
To solve the disk slowness, B-Tree adopts a strategy to minimize disk I/O.
A B-Tree with 1,000,000 items might only be depth 3. (Level 1: Root, Level 2: Branches, Level 3: Leaves). Finding any data takes only 3 Disk I/Os. This efficiency is why databases are fast.
Modern databases (MySQL InnoDB, Oracle) rarely use pure B-Trees. They use B+Trees. Understanding this distinction is often a "pass/fail" question in developer interviews.
| Feature | B-Tree | B+Tree (Winner) |
|---|---|---|
| Data Storage | Internal Nodes & Leaves | Leaves Only |
| Internal Nodes | Heavy (Key + Data) | Light (Key only) |
| Range Search | Slow (Tree traversal) | Fast (Linked List at leaves) |
O(log N)), and then just scan the list sideways (O(K)).This is practical knowledge you must know to optimize DB performance.
CREATE INDEX.Optimization Tip: Covering Index. If your query only asks for columns contained in the index (e.g., SELECT age FROM users WHERE age > 20), the DB doesn't need to look up the main table. It serves data directly from the index.
B-Trees are great for reads but expensive for writes because they require updating the sorted structure immediately (Random I/O). What if you need to handle massive write throughput, like chat logs or sensor data? Enter LSM Tree (used by Cassandra, HBase, Kafka).
Verdict:
Why does B-Tree try so hard to avoid random I/Os? Because HDD physics is brutal.
An HDD is like a vinyl record player. To read data:
To support transactions (ACID) without waiting for slow B-Tree updates, databases use WAL. Changes are first appended to a log file (Sequential Write = Fast). The complex B-Tree reorganization happens asynchronously (Checkpointing). This ensures both durability and performance.
SSDs have no moving parts, so random access is much faster than HDDs. However, SSDs still write in Pages. Writing 1 byte requires rewriting a whole 4KB page (Write Amplification). Thus, the B-Tree's page-aligned design remains highly efficient and relevant even in the SSD era.
B-Trees sort from left to right.
If you have a composite index on (Name, Age):
WHERE Name = 'Alice' -> Hit.WHERE Name = 'Alice' AND Age = 20 -> Hit.WHERE Age = 20 -> Miss.
You cannot look up a word in a dictionary by its second letter. The order matters.This is an optimization in modern MySQL (5.6+).
Ideally, the DB engine pushes the WHERE condition down to the storage engine. Even if an index can't fully sort the result (like ZIP_CODE LIKE '%99'), the storage engine checks the index content and filters out non-matches before reading the full table row. This significantly reduces disk I/O.
B-Tree is the spine of modern databases. It is an algorithm designed out of necessity—to bridge the million-fold speed gap between RAM and Disk. Understanding B-Tree isn't just about passing an interview; it's about understanding the physical reality of the hardware your software runs on. When your query is slow, don't just blame the server. Close your eyes and imagine the B-Tree. Are you traversing it efficiently?