Moore's Law: The Prophecy of Semiconductor Evolution and Aftermath

Moore's Law: The Prophecy of Semiconductor Evolution and Its Aftermath

"Just wait 2 years, and it gets faster automatically."

I heard this story over drinks from a senior developer with 20 years of experience whom I deeply respect.

"Back in my day, when code ran a bit slow, we didn't pull all-nighters optimizing it. We just told the boss, 'The server's getting old,' and held out for 2 years. When we bought a new server 2 years later, the speed really doubled like magic."

It sounds like a joke, but this was real life during the greatest prophecy and blessing in computer history. A time when hardware carried us by the collar no matter how inefficiently we wrote code. The golden age ruled by Moore's Law.

But sadly, here in 2025, the party's over and the bill has arrived. Today, I want to share what I've learned about this great law's rise and fall, and how it's affecting my salary and work right now.

Gordon Moore's Prophecy: Exponential Semiconductor Growth

In 1965, Gordon Moore, Intel's co-founder, wrote a short paper predicting the semiconductor industry's future.

"The number of transistors that can be integrated on a single semiconductor chip will double approximately every 24 months (2 years)."

This sentence looks simple, but it contains tremendous implications.

Performance Doubles: Twice as many workers (transistors) fit into a chip of the same size, so performance explodes.
Price Halves: As more chips can be printed from one wafer, the unit cost per chip drops.
Size Shrinks: Computers shrink from room-sized (ENIAC) to desk-sized (PC) to palm-sized (smartphone).

The 50-Year Miracle

Companies like Samsung Electronics, Intel, and TSMC bled to make this graph a reality.

graph LR
    Year1970[1970s] --> Tech1[Thousands (Intel 4004)]
    Year1990[1990s] --> Tech2[Millions (Pentium)]
    Year2010[2010s] --> Tech3[Billions (Core i7)]
    Year2020[2020s] --> Tech4[Tens/Hundreds of Billions (Apple M, NVIDIA)]

    Tech1 -.-> Tech2
    Tech2 -.-> Tech3
    Tech3 -.-> Tech4

As a result, the 1 million won iPhone in your hand right now is thousands of times faster than the computer bought for 3 million won in the 1990s. No other industry in human history has improved efficiency by 2x every year for 50 years. There's a joke that if cars had developed according to Moore's Law, a car would cost 100 won now and travel at Mach 10.

The Free Lunch Is Over

However, a limit came to this seemingly eternal law. From the mid-2000s, CPU clock speed (GHz) growth began to break.

Why? Because making things too small hit the wall of physics.

Heat Problem: Packing transistors too tightly meant that passing electricity through them made chips hot enough to melt. (They were about to become fried eggs)
Quantum Tunneling Effect: As wire gaps narrowed to just a few atoms, microscopic physical phenomena occurred where electrons pierced through walls, causing leakage current.

Eventually, around 2005, Microsoft's Herb Sutter wrote his famous column "The Free Lunch Is Over".

"Hardware advances no longer automatically make software faster. Now, performance improvements must be created not by gifts from hardware engineers, but by software developers' sweat and effort (concurrency programming)."

Dennard Scaling's Collapse: What I Finally Understood

At first, this "free lunch" talk didn't click for me. Moore's Law is about increasing transistor count, so why couldn't clock speeds keep rising? More transistors should mean faster speeds, right?

The concept that solved this puzzle for me was Dennard Scaling.

In 1974, IBM's Robert Dennard proposed a law. In summary:

"If you make transistors about 30% smaller, you can make them 40% faster with the same power."

While Moore's Law says "count doubles every 2 years," Dennard Scaling means "smaller transistors give better performance per watt." So making transistors smaller meant:

You can fit more of them (Moore's Law)
Power consumption drops while speed increases (Dennard Scaling)

When these two laws worked together, CPU manufacturers were literally eating a "free lunch." Just make things smaller and you get more transistors, higher speeds, and lower power consumption.

So Why Did It End?

The problem: around 2005, Dennard Scaling broke. Making transistors smaller no longer proportionally reduced power consumption.

Why? Because of "leakage current" from physics class.

As transistor gate thickness thinned to just a few atoms, electrons leaked out even when switched off. Like a faucet that drips even when completely shut.

This leakage current generated enormous heat. Intel's Pentium 4 Prescott series pushed clocks to 3.8GHz but had to abandon the "NetBurst" architecture due to heat issues.

Eventually, the path to going faster was blocked. This is when it clicked for me. Even if Moore's Law lives on, once Dennard Scaling dies, the "free lunch" is over.

The Process War: The Nanometer Battle Reality

Even with Dennard Scaling dead, the semiconductor industry desperately tried to keep Moore's Law alive. That's the "5nm", "3nm" process race we hear about in the news.

At first, I had no idea what this meant. "5 nanometers? Atoms are 0.1nm, so transistors are about 50 atoms in size?" Yes, exactly. The 3nm chips TSMC and Samsung make today really have transistors the size of about 30 atoms. The smallest structures humans have ever built.

Process Race Timeline

Year	Process Node	Major Products	Notes
2011	32nm	Intel Sandy Bridge	PC golden age
2014	14nm	Intel Broadwell	Mobile era begins
2017	10nm	Apple A11	TSMC starts overtaking Intel
2019	7nm	AMD Ryzen 3000, Apple A13	AMD's comeback begins
2020	5nm	Apple M1, A14	ARM-based revolution
2022	3nm	Apple M2 Pro/Max (TSMC)	Samsung struggles with yield
2024	2nm	In development (TSMC)	Approaching physical limits

This table shows something interesting. Intel got stuck at 10nm, while TSMC kept pushing down from 7nm. Samsung claimed 3nm production but struggled with yield issues.

Why such differences? This is when I realized "process node numbers are marketing."

For example, TSMC's 7nm and Intel's 10nm have similar actual density. Intel measured conservatively and called it 10nm, while TSMC aggressively called theirs 7nm. It was a numbers game.

But the results were clear. TSMC swept up major customers like Apple, AMD, and NVIDIA, while Intel only managed mass 10nm production in 2021. By then, TSMC had moved past 5nm to 3nm.

Paradigm Shift: Not Faster, But More

CPU manufacturers changed strategy. "Since there's a limit to making one guy smarter, let's put in several guys even if they're dumber."

This is the beginning of Multi-Core. Since making a single 4GHz core was impossible, they started putting in two (dual), four (quad), and eight (octa) 2GHz cores.

The Bolt That Hit Developers

This hardware change gave us developers enormous homework.

In the past, writing code sequentially (single thread) ran at 3GHz. But I bought a new computer with four 2GHz cores. My code is still single-threaded, so it uses only 1 core. Result: my program got slower on the new computer! (3GHz -> 2GHz)

From this point, scary words started appearing in developer job postings. #Concurrency #Parallelism #Async #Thread-safety

Now, if developers don't split code and distribute it evenly across 4 cores, we live in an era where we can only use 25% of computer performance.

How Much Difference Does It Actually Make?

Words alone don't convey the impact. So I ran a simple performance test myself.

# Single thread: Finding prime numbers
import time

def is_prime(n):
    if n < 2:
        return False
    for i in range(2, int(n ** 0.5) + 1):
        if n % i == 0:
            return False
    return True

def find_primes_single(start, end):
    return [n for n in range(start, end) if is_prime(n)]

# Find primes from 1 to 100,000 (single thread)
start = time.time()
primes = find_primes_single(1, 100000)
end = time.time()
print(f"Single thread: {end - start:.2f}s, Prime count: {len(primes)}")
# Result: ~3.2 seconds

# Multi-thread: Same task distributed across 4 cores
from concurrent.futures import ProcessPoolExecutor
import time

def find_primes_multi(start, end, workers=4):
    chunk_size = (end - start) // workers
    ranges = [(start + i * chunk_size, start + (i + 1) * chunk_size)
              for i in range(workers)]

    with ProcessPoolExecutor(max_workers=workers) as executor:
        results = executor.map(lambda r: find_primes_single(*r), ranges)
        return sum(results, [])

# Find primes from 1 to 100,000 (4 cores)
start = time.time()
primes = find_primes_multi(1, 100000, workers=4)
end = time.time()
print(f"Multi-thread (4 cores): {end - start:.2f}s, Prime count: {len(primes)}")
# Result: ~0.9 seconds

The results were stunning.

Single thread: 3.2 seconds
Multi-thread (4 cores): 0.9 seconds

About 3.5x faster. Theoretically it should be 4x, but 3.5x happened due to inter-process communication overhead.

After running this code, it finally hit me. This is what "free lunch is over" means. Before, hardware would automatically turn 3.2 seconds into 1.6 seconds, but now I have to split the code to get 0.9 seconds.

Node.js and PM2 Cluster Mode

I hit this same problem when running a Node.js server.

Node.js operates single-threaded. My server ran on AWS EC2 c5.xlarge (4 cores), but one Node process only uses 1 core. The other 3 cores sit idle.

So I used PM2 Cluster mode to replicate the process 4 times.

# Run with PM2 cluster mode
pm2 start app.js -i 4  # Create 4 processes

# Check PM2 status
pm2 status
# ┌─────┬────────┬─────────┬──────┬───────┬────────┐
# │ id  │ name   │ mode    │ ↺    │ cpu   │ memory │
# ├─────┼────────┼─────────┼──────┼───────┼────────┤
# │ 0   │ app    │ cluster │ 0    │ 25%   │ 120MB  │
# │ 1   │ app    │ cluster │ 0    │ 25%   │ 118MB  │
# │ 2   │ app    │ cluster │ 0    │ 25%   │ 121MB  │
# │ 3   │ app    │ cluster │ 0    │ 25%   │ 119MB  │
# └─────┴────────┴─────────┴──────┴───────┴────────┘

This pushed CPU usage from 25% to 100%, and throughput nearly tripled.

But honestly, I feel a bit cheated that this became "the developer's responsibility." Before, I just wrote code. Now I have to worry about clustering, load balancing, and session sharing.

After Moore's Law: The Era of Chiplets and Heterogeneous Computing

Process scaling hit limits, and multi-core has limits too. (Even with 100 cores, most programs can't use them) So how do we increase performance now?

The semiconductor industry came up with new solutions: Chiplet Architecture and Heterogeneous Computing.

AMD's Comeback: The Chiplet Strategy

In 2017, AMD was getting crushed by Intel. Market share was in the low teens. But AMD had a genius idea.

"Making one big chip gives low yield and high cost. What if we make several small chips and connect them?"

This is the Chiplet strategy.

Traditional CPUs were one giant die. The problem: the bigger the chip, the exponentially higher the defect rate. One speck of dust on a wafer and all chips in that area must be scrapped.

AMD did this:

Make the CPU core part (CCD, Core Chiplet Die) small and numerous
Make the I/O part (IOD, I/O Die) separately
Connect them with a high-speed interconnect called Infinity Fabric

graph TD
    IOD[I/O Die<br/>14nm] --> CCD1[CCD 1<br/>7nm<br/>8 Cores]
    IOD --> CCD2[CCD 2<br/>7nm<br/>8 Cores]
    IOD --> Memory[DDR4/DDR5 Memory]
    IOD --> PCIe[PCIe Connection]

    CCD1 -.-> CCD2

This strategy's advantages were tremendous.

Improved Yield: Small chips have lower defect rates. If only some chips are defective, just scrap those.
Flexibility: High-end products use 2 CCDs, budget products use 1 CCD. Diverse lineup from same components.
Cost Reduction: I/O Die can use older 14nm process. Latest processes are expensive, so only use 7nm/5nm for cores.

As a result, AMD Ryzen 9 5950X with 16 cores at 105W TDP crushed Intel's 10-core CPUs. And cheaper too.

When I learned this strategy, I had an epiphany. "Engineering is ultimately about trade-offs. Rather than one perfect thing, connecting several good-enough things is the practical solution."

Apple Silicon: Heterogeneous Computing Pushed to the Limit

Apple went one step further. Announcing the M1 chip in 2020, they wrote the textbook on "heterogeneous computing."

One M1 chip contains:

High-performance cores (Firestorm) 4x: Heavy workload specialists
High-efficiency cores (Icestorm) 4x: Light tasks, power saving
GPU 7-8 cores: Graphics, machine learning
Neural Engine: AI inference only (16 cores)
Unified Memory: CPU/GPU share memory

The key philosophy here: "Each does only what they're best at."

For example, when I watch YouTube on my MacBook while coding in VS Code:

Video decoding: GPU + video encoder
Background music playback: Efficiency cores (0.5W)
Code compilation: Performance cores (15W)
Copilot AI autocomplete: Neural Engine

All running simultaneously, but total power consumption is under 20W. An Intel laptop would have fans screaming for this workload.

Understanding this architecture, I finally got why the M1 MacBook Air (with no fan!) is faster than Intel i7. It's not "one fast general-purpose core" but "multiple specialists in the right places."

The Money Problem: Semiconductor Fab Economics

Tech talk alone is boring, so let's talk money. One real reason Moore's Law ended is "economics".

Semiconductor Fab Construction Costs

Process Node	Fab Construction Cost	Major Players
28nm (2011)	$3 billion	Samsung, TSMC, Intel
14nm (2014)	$5 billion	Samsung, TSMC, Intel
7nm (2018)	$10 billion	TSMC, Samsung
5nm (2020)	$15 billion	TSMC, Samsung
3nm (2022)	$20 billion	TSMC (Samsung struggling)
2nm (planned)	$28 billion estimated	TSMC

Building one TSMC 3nm fab costs $20 billion (about 26 trillion won). In Korean money, 26 trillion is similar to Hyundai Motor's market cap. For one factory.

Plus, this factory's lifespan is 5-7 years. When the next generation process comes out, it becomes obsolete.

Because of this, semiconductor manufacturing is now a game only TSMC, Samsung Electronics, and Intel can play. GlobalFoundries gave up at 7nm, and even Intel now outsources production (foundry) to TSMC.

America's CHIPS Act: The Semiconductor War

The U.S. government passed the CHIPS and Science Act in 2022, pouring $52.7 billion (about 70 trillion won) into the semiconductor industry.

Why? Because 92% of semiconductor production is concentrated in Asia (Korea, Taiwan, China). The automotive chip shortage during COVID-19 that forced GM and Ford to halt production was the final straw.

TSMC is investing $40 billion in Arizona to build a 5nm fab. Samsung Electronics invested $17 billion in Texas.

Watching this news, I realized something. Moore's Law ended not just from technical limits, but economic limits too. Beyond 2nm, even if technically possible, I wonder if any company will pour 30 trillion into one factory.

Final Thoughts: We Must Get Smarter

Sometimes looking at old code, I miss the simplicity of handling everything with one while loop. But we now live in the era of complex distributed systems.

Because boosting single-server performance is blocked, scale-out (adding 10 servers) became standard.
Because making one core faster is blocked, parallel programming distributing work across 16 cores became essential.
From an era of doing everything with one general-purpose CPU, we've moved to combining specialists like GPU/NPU/TPU.

Moore's Law is over, but developer value has actually risen. The work hardware used to do for free must now be solved with our architecture design skills.

I've come to see Moore's Law's end differently now. It's not sad—rather, it's an era where developers must truly understand hardware.

Is your code fully utilizing multi-cores right now? Are you running GPU-capable tasks on CPU? It's time to think about how to fill your plate at the buffet where the "free lunch" is over.