SSD vs HDD: The Mechanics of Storage
"I tuned the query because the DB was slow"
Back when I was a junior, my service slowed down. My senior asked, "What storage is the DB running on?" "Probably... just a hard drive?" He looked at me with pity and told me to switch to an SSD. Like magic, all problems vanished. I had spent all night tuning SQL queries, but it was just hardware lag.
What is the difference between SSD and HDD that creates such a gap? It's the difference between a Vinyl Record and a USB Thumb Drive.
1. HDD fetches data by "walking" like a librarian
At first I thought HDD was just "a slower SSD." Same kind of storage, different speed. I was wrong. The mechanism itself was completely different.
HDD is a Vinyl Record (LP).
- Structure: A magnetic platter spins at 7,200 RPM (or 10,000 RPM). A needle (Head) reads the data.
- Problems:
- Must wait for the platter to spin (Rotational Latency).
- Must wait for the needle to move (Seek Time).
- To read random data, the needle has to dance frantically.
Because of this "Physical Movement," there is a definite speed limit.
I thought of a librarian analogy. An HDD is a librarian walking to fetch books from shelves. The farther the book, the slower. If 10 books are scattered across different shelves? The librarian has to walk back and forth 10 times.
That's why using an HDD for a DB server is a disaster. Databases often "read 1,000 tiny pieces of data one at a time." The HDD needle goes crazy trying to find all 1,000 locations.
2. SSD teleports
SSD is a USB Thumb Drive.
- Structure: Traps electrons in semiconductor chips (Flash Memory).
- Traits: No moving parts at all.
- Pros:
- Access speed is the same regardless of location (Random Access).
- No need to wait for a needle. Reads at the speed of electricity.
I imagined a teleporting librarian. The moment the librarian knows where a book is, they teleport there and grab it. 10 scattered books? No problem. Just teleport 10 times.
This is when it clicked. "SSD is fast at random access, HDD is fast at sequential access."
- Sequential Read: Reading a 1GB movie file from start to finish → HDD is OK (needle moves in one direction)
- Random Read: Reading 1,000 scattered database records → HDD hell, SSD heaven
3. NAND Flash: SSD isn't perfect either
SSD is not magic. It traps electrons in semiconductor chips, but this "electron prison" isn't eternal.
NAND Flash Types: How many bits per cell?
NAND Flash, the core of SSDs, comes in types. It depends on how many bits you store per cell.
| Type | Bits/Cell | Lifespan (P/E Cycles) | Speed | Price | Use Case |
|---|---|---|---|---|---|
| SLC (Single-Level Cell) | 1 | 100,000 | Fastest | Very expensive | Enterprise servers, industrial |
| MLC (Multi-Level Cell) | 2 | 10,000 | Fast | Expensive | High-end laptops, workstations |
| TLC (Triple-Level Cell) | 3 | 3,000 | Moderate | Affordable | Consumer SSDs (most of us) |
| QLC (Quad-Level Cell) | 4 | 1,000 | Slow | Cheap | High-capacity storage (read-heavy) |
P/E Cycles means Program/Erase Cycles. Simply put, "how many times can you write and erase?"
When I first saw these numbers, I panicked. "TLC can only handle 3,000 writes? Won't it break soon?"
Turns out Wear Leveling makes it fine.
Wear Leveling: Spread the wear evenly
SSD controllers are smart. They don't keep writing to the same cells. They use all cells evenly.
For example, say you write and delete a 10GB file daily on a 256GB SSD. The controller writes this file to a different physical location each time. It rotates through the entire 256GB.
Result: A TLC SSD (3,000 P/E Cycles, 256GB) can theoretically write 256GB × 3,000 = 768TB. If you write 10GB per day? You can use it for 210 years.
Now I get it. SSDs last way longer than I thought.
Write Amplification: You write more than you think
But SSDs have one more trap. If you write 1GB, you actually write 1.5GB.
Why? SSDs can't overwrite in place.
- HDD: Overwrites the same spot directly.
- SSD: Must erase existing data first. But "erase" only works at block level.
A block contains multiple pages.
- Write/Read: Page level (4KB ~ 16KB)
- Erase: Block level (256KB ~ 4MB)
To modify a 4KB file?
- Read the entire block (256KB).
- Modify only the changed part.
- Erase the block.
- Write the entire thing back.
You modified 4KB but wrote 256KB. This is Write Amplification.
This problem is mitigated by the TRIM command.
TRIM: OS tells the SSD "this data can be erased"
When you delete a file, the OS doesn't actually erase the data. It just marks the space as "free" in the file table.
Problem: The SSD doesn't know this. It still thinks valid data exists there. When you try to write new data to that space? You go through the "read → modify → erase → write" process.
TRIM command tells the SSD: "These blocks are no longer used. You can erase them in advance."
The SSD erases these blocks during idle time. Writing new data later becomes much faster.
Checking if TRIM is enabled on Linux:
# Check if TRIM schedule is active
systemctl status fstrim.timer
# Manually run TRIM (when SSD performance drops)
sudo fstrim -v /
I once had an SSD slow down after a few months because I didn't know this. One TRIM run brought it back to full speed.
4. Real World: IOPS (Input/Output Operations Per Second)
One of the most critical metrics in server performance is IOPS (Input/Output Operations Per Second). Simply put: "How many times can you read/write per second?"
HDD vs SSD Comparison
- HDD (7200 RPM): Approx. 100 ~ 200 IOPS
- SSD (SATA): Approx. 50,000 ~ 100,000 IOPS
- NVMe SSD: Approx. 500,000+ IOPS
Do you see it? 100 vs 50,000. Running a DB (which reads/writes tiny data constantly) on an HDD is like putting a Ferrari engine (CPU) on a car with millstones for wheels (HDD).
AWS EBS Volume Types: Buying IOPS with money
When you run a server on AWS, you have to choose an EBS (Elastic Block Store) volume. At first I didn't know what this was, so I just used the default (gp2).
Later I realized IOPS affects pricing dramatically.
| Type | IOPS | Throughput | Use Case | Price (approx.) |
|---|---|---|---|---|
| gp3 (General Purpose SSD) | 3,000 ~ 16,000 (configurable) | 125 ~ 1,000 MB/s | Most workloads | Cheap (default) |
| gp2 (Old General Purpose SSD) | 100 ~ 16,000 (scales with size) | 250 MB/s | Old default | More expensive than gp3 |
| io1 (Provisioned IOPS) | Up to 64,000 | 1,000 MB/s | DB servers, high performance | Very expensive |
| io2 (Next-gen io1) | Up to 64,000 | 1,000 MB/s | Mission-critical DB | Even more expensive (99.999% durability) |
| st1 (HDD) | 500 | 500 MB/s | Logs, big data | Cheap |
| sc1 (Cold HDD) | 250 | 250 MB/s | Archives, backups | Cheapest |
Key insight:
- gp3 lets you configure IOPS and throughput separately. gp2 automatically sets IOPS based on volume size.
- io1/io2 guarantees minimum IOPS. You pay more, but performance never drops.
In my experience, gp3 was enough for most cases. Unless your DB server handles tens of thousands of queries per second.
If RDS performance is slow, first check IOPS usage in CloudWatch.
# Monitor disk I/O on EC2 instance
iostat -x 1
# Check which process is using disk the most
sudo iotop
I didn't know these commands at first. When my server slowed down, I only checked CPU and memory, then discovered disk I/O was the bottleneck way too late.
Disk Benchmarking: Measuring IOPS with fio
"Is my server SSD actually fast?" I sometimes wonder. Especially with cloud servers, since they're invisible.
You can benchmark directly with fio (Flexible I/O Tester).
# Install fio (Ubuntu)
sudo apt install fio
# Random read IOPS test (4KB blocks)
fio --name=random-read \
--ioengine=libaio \
--iodepth=64 \
--rw=randread \
--bs=4k \
--direct=1 \
--size=1G \
--numjobs=4 \
--runtime=60 \
--group_reporting
# Random write IOPS test (4KB blocks)
fio --name=random-write \
--ioengine=libaio \
--iodepth=64 \
--rw=randwrite \
--bs=4k \
--direct=1 \
--size=1G \
--numjobs=4 \
--runtime=60 \
--group_reporting
Interpreting results:
- IOPS: Look for
read: IOPS=12345in the output. - If HDD: 100 ~ 200 IOPS
- If SATA SSD: 10,000 ~ 100,000 IOPS
- If NVMe SSD: 100,000+ IOPS
I ran this test once and realized: "Oh, my local MacBook SSD is genuinely fast. AWS gp3 is slower than I thought."
Cloud uses shared storage, so if a neighboring server uses disk heavily, my performance drops too. That's why provisioned IOPS (io1/io2) is expensive. It guarantees dedicated performance.
5. Hot Data vs Cold Data: Right storage for the right data
So is SSD always the answer? The problem is Cost. HDD is overwhelmingly cheaper per gigabyte.
- HDD: ~$20 ~ $30 per TB
- SSD: ~$100 ~ $150 per TB
5x difference.
So you have to separate storage based on "how often you use the data."
Hot Data: Frequently used data
- Examples: DB, Cache (Redis), OS, log files being analyzed
- Storage: Must use SSD
- Why: Heavy random access, needs high IOPS
Warm Data: Occasionally used data
- Examples: Last 3 months of logs, backup files (restored weekly)
- Storage: SSD (gp3) or slower SSD
- Why: Not used frequently, but needs fast reads when accessed
Cold Data: Rarely used data
- Examples: 3-year-old logs, CCTV footage, legally required archive data
- Storage: HDD (S3 Glacier, AWS st1/sc1)
- Why: Large capacity, almost never read
How I applied this in practice:
1. RDS DB Server
- Main server: io2 (Provisioned IOPS, 64,000 IOPS guaranteed)
- Read replicas: gp3 (cheaper)
2. Log Storage
- Last 7 days: EBS gp3 (ElasticSearch indexing)
- 8 ~ 90 days: S3 Standard
- 91+ days: S3 Glacier (legal retention)
3. Backups
- Daily backups (last 30 days): S3 Standard (fast recovery)
- Monthly backups (1 year): S3 Glacier (cheap)
This cut storage costs by 60%.
6. Summary: Rotating Things Are Slow
| Type | HDD | SSD |
|---|---|---|
| Analogy | Vinyl Record (LP) | Smartphone Memory |
| Mechanism | Physical Rotation (Motor) | Semiconductor (Chips) |
| IOPS | 100 ~ 200 | 50,000+ |
| Use Case | Backups, Video Archives | OS, DB, Web Server |
"Anything with moving parts is slow and prone to failure."
Remembering just this will prevent half of your server outages.
And one more thing. "Separate storage based on how often you use the data."
HDD is slow but cheap. Perfect for Cold Data. SSD is fast but expensive. Use it only for Hot Data to avoid cost explosion.
From my trial-and-error experience, following these two rules eliminates most storage problems.