
Moore's Law: The Prophecy of Semiconductor Evolution and Aftermath
The era of 'free lunch' where messy code ran fast thanks to hardware upgrades is over. I discuss the impact of the end of Moore's Law and new survival strategies for developers.

The era of 'free lunch' where messy code ran fast thanks to hardware upgrades is over. I discuss the impact of the end of Moore's Law and new survival strategies for developers.
Why does my server crash? OS's desperate struggle to manage limited memory. War against Fragmentation.

Two ways to escape a maze. Spread out wide (BFS) or dig deep (DFS)? Who finds the shortest path?

Fast by name. Partitioning around a Pivot. Why is it the standard library choice despite O(N²) worst case?

Establishing TCP connection is expensive. Reuse it for multiple requests.

"Just wait 2 years, and it gets faster automatically."
I heard this story over drinks from a senior developer with 20 years of experience whom I deeply respect.
"Back in my day, when code ran a bit slow, we didn't pull all-nighters optimizing it. We just told the boss, 'The server's getting old,' and held out for 2 years. When we bought a new server 2 years later, the speed really doubled like magic."
It sounds like a joke, but this was real life during the greatest prophecy and blessing in computer history. A time when hardware carried us by the collar no matter how inefficiently we wrote code. The golden age ruled by Moore's Law.
But sadly, here in 2025, the party's over and the bill has arrived. Today, I want to share what I've learned about this great law's rise and fall, and how it's affecting my salary and work right now.
In 1965, Gordon Moore, Intel's co-founder, wrote a short paper predicting the semiconductor industry's future.
"The number of transistors that can be integrated on a single semiconductor chip will double approximately every 24 months (2 years)."
This sentence looks simple, but it contains tremendous implications.
Companies like Samsung Electronics, Intel, and TSMC bled to make this graph a reality.
graph LR
Year1970[1970s] --> Tech1[Thousands (Intel 4004)]
Year1990[1990s] --> Tech2[Millions (Pentium)]
Year2010[2010s] --> Tech3[Billions (Core i7)]
Year2020[2020s] --> Tech4[Tens/Hundreds of Billions (Apple M, NVIDIA)]
Tech1 -.-> Tech2
Tech2 -.-> Tech3
Tech3 -.-> Tech4
As a result, the 1 million won iPhone in your hand right now is thousands of times faster than the computer bought for 3 million won in the 1990s. No other industry in human history has improved efficiency by 2x every year for 50 years. There's a joke that if cars had developed according to Moore's Law, a car would cost 100 won now and travel at Mach 10.
However, a limit came to this seemingly eternal law. From the mid-2000s, CPU clock speed (GHz) growth began to break.
Why? Because making things too small hit the wall of physics.
Eventually, around 2005, Microsoft's Herb Sutter wrote his famous column "The Free Lunch Is Over".
"Hardware advances no longer automatically make software faster. Now, performance improvements must be created not by gifts from hardware engineers, but by software developers' sweat and effort (concurrency programming)."
At first, this "free lunch" talk didn't click for me. Moore's Law is about increasing transistor count, so why couldn't clock speeds keep rising? More transistors should mean faster speeds, right?
The concept that solved this puzzle for me was Dennard Scaling.
In 1974, IBM's Robert Dennard proposed a law. In summary:
"If you make transistors about 30% smaller, you can make them 40% faster with the same power."
While Moore's Law says "count doubles every 2 years," Dennard Scaling means "smaller transistors give better performance per watt." So making transistors smaller meant:
When these two laws worked together, CPU manufacturers were literally eating a "free lunch." Just make things smaller and you get more transistors, higher speeds, and lower power consumption.
The problem: around 2005, Dennard Scaling broke. Making transistors smaller no longer proportionally reduced power consumption.
Why? Because of "leakage current" from physics class.
As transistor gate thickness thinned to just a few atoms, electrons leaked out even when switched off. Like a faucet that drips even when completely shut.
This leakage current generated enormous heat. Intel's Pentium 4 Prescott series pushed clocks to 3.8GHz but had to abandon the "NetBurst" architecture due to heat issues.
Eventually, the path to going faster was blocked. This is when it clicked for me. Even if Moore's Law lives on, once Dennard Scaling dies, the "free lunch" is over.
Even with Dennard Scaling dead, the semiconductor industry desperately tried to keep Moore's Law alive. That's the "5nm", "3nm" process race we hear about in the news.
At first, I had no idea what this meant. "5 nanometers? Atoms are 0.1nm, so transistors are about 50 atoms in size?" Yes, exactly. The 3nm chips TSMC and Samsung make today really have transistors the size of about 30 atoms. The smallest structures humans have ever built.
| Year | Process Node | Major Products | Notes |
|---|---|---|---|
| 2011 | 32nm | Intel Sandy Bridge | PC golden age |
| 2014 | 14nm | Intel Broadwell | Mobile era begins |
| 2017 | 10nm | Apple A11 | TSMC starts overtaking Intel |
| 2019 | 7nm | AMD Ryzen 3000, Apple A13 | AMD's comeback begins |
| 2020 | 5nm | Apple M1, A14 | ARM-based revolution |
| 2022 | 3nm | Apple M2 Pro/Max (TSMC) | Samsung struggles with yield |
| 2024 | 2nm | In development (TSMC) | Approaching physical limits |
This table shows something interesting. Intel got stuck at 10nm, while TSMC kept pushing down from 7nm. Samsung claimed 3nm production but struggled with yield issues.
Why such differences? This is when I realized "process node numbers are marketing."
For example, TSMC's 7nm and Intel's 10nm have similar actual density. Intel measured conservatively and called it 10nm, while TSMC aggressively called theirs 7nm. It was a numbers game.
But the results were clear. TSMC swept up major customers like Apple, AMD, and NVIDIA, while Intel only managed mass 10nm production in 2021. By then, TSMC had moved past 5nm to 3nm.
CPU manufacturers changed strategy. "Since there's a limit to making one guy smarter, let's put in several guys even if they're dumber."
This is the beginning of Multi-Core. Since making a single 4GHz core was impossible, they started putting in two (dual), four (quad), and eight (octa) 2GHz cores.
This hardware change gave us developers enormous homework.
In the past, writing code sequentially (single thread) ran at 3GHz. But I bought a new computer with four 2GHz cores. My code is still single-threaded, so it uses only 1 core. Result: my program got slower on the new computer! (3GHz -> 2GHz)
From this point, scary words started appearing in developer job postings. #Concurrency #Parallelism #Async #Thread-safety
Now, if developers don't split code and distribute it evenly across 4 cores, we live in an era where we can only use 25% of computer performance.
Words alone don't convey the impact. So I ran a simple performance test myself.
# Single thread: Finding prime numbers
import time
def is_prime(n):
if n < 2:
return False
for i in range(2, int(n ** 0.5) + 1):
if n % i == 0:
return False
return True
def find_primes_single(start, end):
return [n for n in range(start, end) if is_prime(n)]
# Find primes from 1 to 100,000 (single thread)
start = time.time()
primes = find_primes_single(1, 100000)
end = time.time()
print(f"Single thread: {end - start:.2f}s, Prime count: {len(primes)}")
# Result: ~3.2 seconds
# Multi-thread: Same task distributed across 4 cores
from concurrent.futures import ProcessPoolExecutor
import time
def find_primes_multi(start, end, workers=4):
chunk_size = (end - start) // workers
ranges = [(start + i * chunk_size, start + (i + 1) * chunk_size)
for i in range(workers)]
with ProcessPoolExecutor(max_workers=workers) as executor:
results = executor.map(lambda r: find_primes_single(*r), ranges)
return sum(results, [])
# Find primes from 1 to 100,000 (4 cores)
start = time.time()
primes = find_primes_multi(1, 100000, workers=4)
end = time.time()
print(f"Multi-thread (4 cores): {end - start:.2f}s, Prime count: {len(primes)}")
# Result: ~0.9 seconds
The results were stunning.
About 3.5x faster. Theoretically it should be 4x, but 3.5x happened due to inter-process communication overhead.
After running this code, it finally hit me. This is what "free lunch is over" means. Before, hardware would automatically turn 3.2 seconds into 1.6 seconds, but now I have to split the code to get 0.9 seconds.
I hit this same problem when running a Node.js server.
Node.js operates single-threaded. My server ran on AWS EC2 c5.xlarge (4 cores), but one Node process only uses 1 core. The other 3 cores sit idle.
So I used PM2 Cluster mode to replicate the process 4 times.
# Run with PM2 cluster mode
pm2 start app.js -i 4 # Create 4 processes
# Check PM2 status
pm2 status
# ┌─────┬────────┬─────────┬──────┬───────┬────────┐
# │ id │ name │ mode │ ↺ │ cpu │ memory │
# ├─────┼────────┼─────────┼──────┼───────┼────────┤
# │ 0 │ app │ cluster │ 0 │ 25% │ 120MB │
# │ 1 │ app │ cluster │ 0 │ 25% │ 118MB │
# │ 2 │ app │ cluster │ 0 │ 25% │ 121MB │
# │ 3 │ app │ cluster │ 0 │ 25% │ 119MB │
# └─────┴────────┴─────────┴──────┴───────┴────────┘
This pushed CPU usage from 25% to 100%, and throughput nearly tripled.
But honestly, I feel a bit cheated that this became "the developer's responsibility." Before, I just wrote code. Now I have to worry about clustering, load balancing, and session sharing.
Process scaling hit limits, and multi-core has limits too. (Even with 100 cores, most programs can't use them) So how do we increase performance now?
The semiconductor industry came up with new solutions: Chiplet Architecture and Heterogeneous Computing.
In 2017, AMD was getting crushed by Intel. Market share was in the low teens. But AMD had a genius idea.
"Making one big chip gives low yield and high cost. What if we make several small chips and connect them?"
This is the Chiplet strategy.
Traditional CPUs were one giant die. The problem: the bigger the chip, the exponentially higher the defect rate. One speck of dust on a wafer and all chips in that area must be scrapped.
AMD did this:
graph TD
IOD[I/O Die<br/>14nm] --> CCD1[CCD 1<br/>7nm<br/>8 Cores]
IOD --> CCD2[CCD 2<br/>7nm<br/>8 Cores]
IOD --> Memory[DDR4/DDR5 Memory]
IOD --> PCIe[PCIe Connection]
CCD1 -.-> CCD2
This strategy's advantages were tremendous.
As a result, AMD Ryzen 9 5950X with 16 cores at 105W TDP crushed Intel's 10-core CPUs. And cheaper too.
When I learned this strategy, I had an epiphany. "Engineering is ultimately about trade-offs. Rather than one perfect thing, connecting several good-enough things is the practical solution."
Apple went one step further. Announcing the M1 chip in 2020, they wrote the textbook on "heterogeneous computing."
One M1 chip contains:
The key philosophy here: "Each does only what they're best at."
For example, when I watch YouTube on my MacBook while coding in VS Code:
All running simultaneously, but total power consumption is under 20W. An Intel laptop would have fans screaming for this workload.
Understanding this architecture, I finally got why the M1 MacBook Air (with no fan!) is faster than Intel i7. It's not "one fast general-purpose core" but "multiple specialists in the right places."
Tech talk alone is boring, so let's talk money. One real reason Moore's Law ended is "economics".
| Process Node | Fab Construction Cost | Major Players |
|---|---|---|
| 28nm (2011) | $3 billion | Samsung, TSMC, Intel |
| 14nm (2014) | $5 billion | Samsung, TSMC, Intel |
| 7nm (2018) | $10 billion | TSMC, Samsung |
| 5nm (2020) | $15 billion | TSMC, Samsung |
| 3nm (2022) | $20 billion | TSMC (Samsung struggling) |
| 2nm (planned) | $28 billion estimated | TSMC |
Building one TSMC 3nm fab costs $20 billion (about 26 trillion won). In Korean money, 26 trillion is similar to Hyundai Motor's market cap. For one factory.
Plus, this factory's lifespan is 5-7 years. When the next generation process comes out, it becomes obsolete.
Because of this, semiconductor manufacturing is now a game only TSMC, Samsung Electronics, and Intel can play. GlobalFoundries gave up at 7nm, and even Intel now outsources production (foundry) to TSMC.
The U.S. government passed the CHIPS and Science Act in 2022, pouring $52.7 billion (about 70 trillion won) into the semiconductor industry.
Why? Because 92% of semiconductor production is concentrated in Asia (Korea, Taiwan, China). The automotive chip shortage during COVID-19 that forced GM and Ford to halt production was the final straw.
TSMC is investing $40 billion in Arizona to build a 5nm fab. Samsung Electronics invested $17 billion in Texas.
Watching this news, I realized something. Moore's Law ended not just from technical limits, but economic limits too. Beyond 2nm, even if technically possible, I wonder if any company will pour 30 trillion into one factory.
Sometimes looking at old code, I miss the simplicity of handling everything with one while loop. But we now live in the era of complex distributed systems.
Moore's Law is over, but developer value has actually risen. The work hardware used to do for free must now be solved with our architecture design skills.
I've come to see Moore's Law's end differently now. It's not sad—rather, it's an era where developers must truly understand hardware.
Is your code fully utilizing multi-cores right now? Are you running GPU-capable tasks on CPU? It's time to think about how to fill your plate at the buffet where the "free lunch" is over.