I Realized Computers Can Only Add
When I first studied computer architecture, the most shocking fact was this: computers can actually only do addition. They show YouTube, run 3D games, and draw pictures with AI, but if you look at the deepest core, there's only one operation: Addition.
How do they subtract? They take the two's complement and add ($A + (-B)$). Multiply? Repeat addition ($A \times 3 = A + A + A$). Divide? Repeat subtraction, which is a variant of addition. Even complex operations like exponents or logarithms? Approximate with Taylor series and convert to addition and multiplication.
Ultimately, the most critical component determining computer performance is the Adder - the part that decides "who can add faster." In the CPU's heart, the ALU (Arithmetic Logic Unit), the adder takes up the most space.
When I first learned this, I wondered "How do you add with just a few logic gates?" But when I dug deeper, I found it was a truly sophisticated work of art.
Logic Gates: Tools for Cooking 0s and 1s
To design an adder, you need to perfectly understand three basic components of logic circuits. At first, I thought these were just abstract concepts, but they're actually physical circuits made from a few transistors.
XOR Gate: The Core of Addition
The most important gate. As the name suggests, it's "exclusive." "If you and I are different, output 1; if same, output 0".
- 0 XOR 0 = 0 (same, so 0)
- 1 XOR 1 = 0 (same, so 0)
- 0 XOR 1 = 1 (different, so 1)
At first I thought "Why is this important?" but when I realized this perfectly matches the "Sum" of 1-bit addition, it clicked. 1+1=0 (carry occurs, 1's place is 0) is exactly the XOR result!
AND Gate: The Core of Carry
A simple rule: "Both must be 1 to output 1." 1 AND 1 = 1, everything else is 0. This matches the "Carry" in addition. Only when both are 1 does a 1 carry to the upper digit.
OR Gate: Final Combination
"If either is 1, output 1." Used later to combine carries.
With just these three gates, we can build an addition circuit. I didn't believe it at first, but when I actually built it, it really worked.
Half Adder: Half a Calculator
Let's think about the simplest addition: adding two 1-bit numbers (A, B). If we make a truth table:
| A | B | Sum | Carry | Explanation |
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 + 0 = 0 |
| 0 | 1 | 1 | 0 | 0 + 1 = 1 |
| 1 | 0 | 1 | 0 | 1 + 0 = 1 |
| 1 | 1 | 0 | 1 | 1 + 1 = 2 (binary 10) |
Looking at this truth table, the pattern is clear. Sum is 1 only when A and B are different, so we use an XOR gate. Carry is 1 only when both A and B are 1, so we use an AND gate.
graph LR
InputA[Input A] --> XOR
InputB[Input B] --> XOR
InputA --> AND
InputB --> AND
XOR --> Sum[Sum]
AND --> Carry[Carry]
This is called a Half Adder. But why "Half"? At first I thought the name was strange, but there's a fatal flaw. There's no input pin to receive the "Carry In" from the lower digit.
Think about when we do decimal addition 15 + 28. From the ones place (5+8=13), a 1 carries over. When calculating the tens place (1+2), we need to add that carried 1 to get 1+1+2=4. But the half adder can't handle this "carried 1." So it's only useful for calculating the very last digit (LSB).
Full Adder: I Built a Real Addition Circuit
To be a true adder, you need 3 inputs: A (the number being added to), B (the number being added), C_in (the carry from the lower digit). You need to be able to handle all three to be a real adder.
There are 8 cases ($2^3$), and the key is "the number of 1s." If there's one 1 (e.g., 1+0+0): Sum=1, Carry=0. If there are two 1s (e.g., 1+1+0): Sum=0, Carry=1. If there are three 1s (e.g., 1+1+1): Sum=1, Carry=1 (result 3, binary 11).
The full adder isn't built from scratch - it's made by connecting two half adders. This was a truly genius idea.
- Step 1: Add A and B (Half Adder 1) → generates
intermediate sum,intermediate carry 1 - Step 2: Add
intermediate sumandC_in(Half Adder 2) →final Sumcomplete!, generatesintermediate carry 2 - Step 3: Combine
intermediate carry 1andintermediate carry 2with an OR gate →final Carry (C_out)complete!
With this one block, we can finally perform 1-bit addition perfectly. Now we just need to connect them.
Ripple Carry Adder: The Beauty of Connection, and Its Limits
To do 4-bit addition (1011 + 0011), we just line up 4 full adders. Adder 0 calculates the first bit and tosses the carry ($C_1$) to Adder 1. Adder 1 receives it, calculates, and tosses the carry ($C_2$) to Adder 2... and so on.
Like throwing a stone in a calm lake, the carry ripples through like a wave, which is why it's called a "Ripple Carry Adder (RCA)". The name is quite poetic.
Fatal Problem: Speed
This method is easy to build, but has a fatal flaw that makes it absolutely unusable in modern computers: Propagation Delay.
Imagine a 64-bit CPU. 64 full adders standing in a row. The very last 64th adder can't do anything until the 63rd adder gives it the carry. The 63rd waits for the 62nd, the 62nd for the 61st... Eventually, calculations from 1 to 64 must complete sequentially.
If one full adder takes 1 nanosecond (ns) to calculate, a 64-bit addition takes 64ns. Modern CPUs (3GHz) tick every 0.33ns, so if one addition takes 64ns, the CPU must wait about 200 clock cycles. Computer speed would regress to the 1980s.
When I first learned this, I thought "So how do they solve it?" Engineers found a genius solution.
Carry Lookahead Adder: The Speed Revolution
Engineers pondered: "Is there a way to know in advance if a carry will occur, without waiting for the previous digit's calculation to finish?"
A genius idea emerged: the CLA (Carry Lookahead Adder). Just by looking at each digit's input values (A, B), you can know the fate of the carry without actually calculating.
Two Fates
-
Generate (G): "I definitely create a carry."
- If $A=1, B=1$, regardless of what comes from below, it's definitely $1+1=2$ or more, so a carry occurs
- $G = A \cdot B$ (AND gate)
-
Propagate (P): "If a carry comes, I pass it along."
- If $A=1, B=0$ or $A=0, B=1$, the sum is 1. If a carry 1 comes from below, it becomes $1+1=2$, generating a carry. In other words, it tosses (propagates) what it received
- $P = A \oplus B$ (XOR gate)
The Magic Formula
Using this logic, you can predict the carry for the 4th digit ($C_4$) in one shot with a mathematical formula without going through the 1st, 2nd, and 3rd adders.
$$C_4 = G_3 + P_3(G_2 + P_2(G_1 + P_1(G_0 + P_0C_0)))$$
It looks complex, but the key is "parallel processing". Without waiting, all digit carries are calculated simultaneously. Of course, the circuit becomes incredibly complex and uses more power, but the speed increases dramatically.
Modern Intel/AMD CPUs use even more advanced Prefix Adder (Kogge-Stone, Brent-Kung) algorithms to complete 64-bit addition in about 1 clock cycle. I was truly amazed when I first learned this.
Two's Complement: How Computers Subtract
If you tear apart a CPU design, there's no such thing as a Subtractor component. So how does it calculate 5 - 3? Computers convert subtraction to "adding a negative": 5 + (-3).
This is where the genius concept of Two's Complement comes in. At first I didn't understand why it worked, but when I calculated it myself, it was really amazing.
Calculating 5 - 3 (4-bit basis)
- 3 in binary:
0011 - One's complement (flip):
1100(invert all bits) - Two's complement (add 1):
1101(this is what the computer recognizes as -3) - Perform addition:
0101 (5) + 1101 (-3) ------- 10010 - Discard overflow: Since we're a 4-bit computer, the leading
1has no storage space and disappears - Final result:
0010(decimal 2)
Isn't it amazing? Without needing to build a separate subtraction circuit, just by adding a "NOT gate" to the existing adder, subtraction is handled for free. This is the efficiency of computer engineering.
Hardware Engineer's Coding: Verilog
When you think "circuit design," do you imagine soldering? 21st century hardware design is coding. Write code in languages like Verilog or VHDL, and the compiler automatically converts it to transistor circuit diagrams (Synthesis).
Full Adder code example:
module FullAdder (
input a, b, cin,
output sum, cout
);
// Get sum with two XORs
assign sum = a ^ b ^ cin;
// Carry occurs if (A&B) or (C_in & (A^B))
assign cout = (a & b) | (cin & (a ^ b));
endmodule
While software developers write if-else, hardware engineers assign the flow of bits. It was unfamiliar at first, but logically it's the same.
Quantum Computer Addition: Can It Be Reversed?
As a side note, let's talk about the future technology of Quantum Computers. Traditional AND, OR, XOR gates have a fatal characteristic: Information Destruction (Irreversible).
Looking only at the result A AND B = 0, can you tell what A and B originally were? It could be (0,0), (0,1), or (1,0), but you can't know exactly. Information was lost. Information loss means heat generation (Landauer's principle).
In quantum computers, information can't be destroyed to maintain superposition. So they use Reversible Gates like the Toffoli Gate, where you can 100% backtrack the input values just by looking at the output. This is a completely different dimension of design from traditional semiconductors.
AI Era Adders: Trading Precision for Speed
Recently, in AI-specific chips like NVIDIA H100, an interesting phenomenon is happening. While traditional CPUs obsessed over 64-bit precision (double), AI chips are regressing to 8-bit (INT8, FP8) addition.
Why Regress?
Deep learning models (like GPT-4) have hundreds of billions of parameters. Calculating with 64 bits is slow and memory-intensive. Research found that AI neural network operations "can be approximate". No need for 15 decimal places of accuracy - as long as the big picture is right, it won't mistake a cat photo for a dog.
The Emergence of FP8
So the latest GPUs are loaded with dedicated 8-bit floating point (FP8) adders. They can process 4 times more data at once than 32-bit (4x throughput increase), and power consumption decreases. As a result, inference speed for large language models like ChatGPT increases dramatically.
The history of adders goes 4-bit (Intel 4004) → 64-bit (Modern CPU) → 8-bit (AI GPU) in a circle. Of course, it's only 8-bit on the surface - internally thousands of cores run in parallel in a Systolic Array structure.
This trend toward low-precision computation interestingly connects to Neuromorphic Computing. It's similar to how the human brain processes synaptic signals using appropriate analog signal sums rather than perfect digital accuracy. Ultimately, AI hardware pursuing extreme efficiency is increasingly resembling the structure of the human brain.
Wrapping Up: The Art of 1 Nanosecond
That single line of code we comfortably write: total = price + tax. The moment we press Enter, billions of transistors inside the CPU simultaneously open and close gates, moving electrons. Thanks to the fierce relay race of full adders, or the genius parallel prediction of CLA, we get accurate results in 0.000000001 seconds.
The adder isn't just a simple calculator. It's the most beautiful crystallization of engineering, where humanity implemented logic as physical reality. At first I thought "It's just an addition circuit," but the deeper I dug, the more I realized what a sophisticated work of art it is.
One XOR gate, one AND gate come together to create the magic of 1+1=2, and this repeats billions of times to power all the digital devices we use. This simplicity within complexity, complexity within simplicity - I think that's the beauty of computer engineering.