
CAP Theorem: The Impossible Trinity of Distributed Systems
Why you can't have it all. Consistency vs Availability in the face of Partitions. Explaining CP (MongoDB) vs AP (Cassandra) and the extended PACELC theorem.

Why you can't have it all. Consistency vs Availability in the face of Partitions. Explaining CP (MongoDB) vs AP (Cassandra) and the extended PACELC theorem.
Foundation of DB Design. Splitting tables to prevent Anomalies. 1NF, 2NF, 3NF explained simply.

Integrating a payment API is just the beginning. Idempotency, refund flows, and double-charge prevention make payment systems genuinely hard.

Why would Netflix intentionally shut down its own production servers? Explore the philosophy of Chaos Engineering, the Simian Army, and detailed strategies like GameDays and Automating Chaos to build resilient distributed systems.

Why physical distance kills speed. Consistent Hashing, Edge Computing, Cache Purge strategies, and how CDNs defend against DDoS attacks.

When studying distributed systems, you eventually run into a deceptively simple question.
"In a system where Seoul and New York servers need to stay in sync in real-time—what happens when the network goes down?"
At first it sounds abstract. But dig into it, and you realize it cuts to the heart of distributed system design. In fact, since 2008, Google has officially acknowledged shark-related undersea cable damage as a real-world failure mode. Network partitions aren't hypothetical—they happen.
When they do, the system must make a choice.
Option 1: Shut it down If Seoul and New York have different balance data, there's a risk of double-withdrawal fraud. So the system halts all transactions until the network is restored. Customers see "System Maintenance" at ATMs.
Option 2: Keep going Each node processes transactions with whatever balance data it has locally. New York's withdrawals might not reflect in Seoul, but the service stays up. Data reconciles when the network comes back.
For a bank? Most choose Option 1. For social media? Option 2 every time.
This is the CAP Theorem. And the hardest truth it forces you to accept is: a perfect distributed system is physically impossible.
When I first encountered CAP theorem, honestly, it didn't click.
"The data must always be accurate (Consistency), the service can never go down (Availability), and networks will obviously fail so we need to handle that (Partition Tolerance)... why can't we have all three? Can't we just add more servers?"
The explanations felt abstract. But when I worked through what actually happens when a network goes down—step by step—it finally made sense. The moment the network splits, the system has to sacrifice either consistency or availability. There's no way to keep both.
In 2000, UC Berkeley's Eric Brewer proposed the CAP theorem. In 2002, MIT's Gilbert and Lynch proved it mathematically. Here's what it comes down to:
A distributed system can guarantee at most 2 out of 3: Consistency, Availability, Partition Tolerance.
"All nodes see the same data at the same time."
Like twin brothers sharing identical memories. If I deposit $1,000 in Seoul, then 0.001 seconds later someone checks in New York, they must see exactly that $1,000 increase.
If even one node hasn't updated? Throw an error. Better to say "can't verify right now" than give wrong information.
// Strong Consistency example
async function withdraw(accountId, amount) {
// Synchronously write to all replicas
const results = await Promise.all([
db.primary.update(accountId, -amount),
db.replica1.update(accountId, -amount),
db.replica2.update(accountId, -amount)
]);
// If any fails, rollback everything
if (results.some(r => !r.success)) {
await rollback(accountId, amount);
throw new Error("Consistency violation - aborting");
}
return { success: true, balance: results[0].newBalance };
}
"Living nodes always respond."
Restaurant analogy: if the head chef is sick, you don't close the restaurant. You bring in the sous chef and serve food anyway. Might taste slightly different, but customers don't go hungry.
In distributed systems, even if certain nodes die or networks fail, surviving nodes must always respond to client requests. That response might not be the latest data. It's like saying "this is what I know—can't promise it's current."
// High Availability example
async function getLikeCount(postId) {
try {
// Try primary DB
return await db.primary.query(postId);
} catch (primaryError) {
console.warn("Primary down, using replica");
try {
// Try Replica 1
return await db.replica1.query(postId);
} catch (replica1Error) {
// Try Replica 2 (might be stale)
return await db.replica2.query(postId);
}
}
// Only error if ALL nodes fail
// → But this breaks Availability!
}
"System keeps running even when the network splits."
Seoul-New York connection is severed. But Seoul users should still access Seoul DB, New York users their local DB. Each operates like an isolated island, then merges when the bridge is restored.
Critical point: In distributed systems, partitions don't "might happen"—they WILL happen. Undersea cable cuts, router failures, even GC pauses causing hundreds of milliseconds freeze—from another node's perspective, that's a network partition.
So here's the reality:
P is not optional. You must accept it.Which means the real choice is between CP vs AP.
Imagine withdrawing money from an ATM. Your balance is $1,000, but due to network lag, one ATM thinks it's $1,000 while another thinks it's $500 (after a recent withdrawal). If you withdraw $1,000 from the inconsistent ATM? Your balance goes negative.
To prevent this, CP systems sacrifice availability. If data sync is uncertain, they shut down entirely.
# CP system example (MongoDB-like behavior)
def write_to_distributed_db(key, value):
# Write to primary node
primary_success = primary_node.write(key, value)
if not primary_success:
raise Exception("Primary unreachable - refusing write")
# Need majority replica ACKs
replicas_ack = 0
for replica in replica_nodes:
if replica.is_reachable():
if replica.write(key, value):
replicas_ack += 1
# If can't get majority, rollback and fail
if replicas_ack < len(replica_nodes) // 2:
primary_node.rollback(key)
raise Exception("Cannot guarantee consistency - aborting")
return "Write successful with strong consistency"
CP System Examples:
If my Instagram post shows 1,000 likes on Seoul's server but 995 on New York's server? No big deal. They'll sync in 30 seconds. But if users see "Service Unavailable"? They bounce.
AP systems sacrifice consistency. Each node accepts writes independently, reconciling data later.
# AP system example (Cassandra-like behavior)
def write_to_ap_db(key, value):
# Send writes to multiple nodes asynchronously
futures = []
for node in all_nodes:
# Don't wait for response (though not fire-and-forget)
future = node.async_write(key, value)
futures.append(future)
# Even one success is OK (actual systems use quorum settings)
success_count = 0
for future in futures:
try:
if future.get(timeout=0.1): # Wait only 100ms
success_count += 1
except TimeoutError:
pass # Skip slow nodes
if success_count > 0:
return "Write accepted (eventual consistency)"
else:
raise Exception("All nodes unreachable")
AP System Examples:
The most important criterion I learned:
"Does data inconsistency cause financial loss or legal issues?"CAP had one problem: it only considers failure scenarios (Partitions).
But our systems run normally 99.9% of the time. Even during peace, trade-offs exist. That's where PACELC comes in.
IF Partition occurs → choose Availability vs Consistency ELSE (normal state) → choose Latency vs Consistency
Say Seoul, New York, London are perfectly connected. A user writes data in Seoul.
Option 1: Synchronous ReplicationSeoul write → wait for New York → wait for London → all done → tell user OK
Seoul write → immediately tell user OK → (background replication to NY/London)
Even in peaceful times, you're choosing "speed vs accuracy". That's the ELC part of PACELC.
AP systems sometimes need minimal consistency guarantees. That's where Quorum comes in.
Example with N=3, W=2, R=2:
# Cassandra-style Quorum read/write
def quorum_write(key, value, N=3, W=2):
"""
Must succeed on 2 out of 3 nodes
"""
nodes = get_replica_nodes(key) # [node1, node2, node3]
success_count = 0
for node in nodes:
try:
if node.write(key, value, timeout=0.5):
success_count += 1
if success_count >= W:
return True # Return immediately when W reached
except TimeoutError:
continue
return False # Failed to get W acknowledgments
def quorum_read(key, N=3, R=2):
"""
Read from 2 out of 3 nodes, pick latest
"""
nodes = get_replica_nodes(key)
responses = []
for node in nodes:
try:
data = node.read(key, timeout=0.5)
responses.append(data)
if len(responses) >= R:
break # Got enough responses
except TimeoutError:
continue
# Pick newest by timestamp
return max(responses, key=lambda x: x.timestamp)
| W | R | Characteristics | Use Case |
|---|---|---|---|
| 1 | N | Fast writes, slow reads | Log collection |
| N | 1 | Slow writes, fast reads | Read-heavy cache |
| N/2+1 | N/2+1 | Balanced | General services |
In 2012, Google published the Spanner paper claiming a "practically CA system." How?
Key: TrueTime APIInconsistency in CAP fundamentally stems from "each server's clock being different." If Seoul thinks it's 10:00:00.100 and New York thinks it's 10:00:00.050, who's first is ambiguous.
Google's solution:
[earliest, latest] intervallatest to guarantee ordering# Spanner TrueTime concept (pseudocode)
def spanner_transaction(key, value):
# TrueTime returns an interval
tt_now = TrueTime.now()
# tt_now = [10:00:00.100, 10:00:00.107] # 7ms uncertainty
# Safely wait until latest
wait_until(tt_now.latest)
# Now this transaction's timestamp is definitely after all past transactions
commit_with_timestamp(key, value, tt_now.latest)
But strictly speaking, Spanner is still CP. During network partitions, if it can't get majority, writes fail. It just looks like CA with 99.999% availability.
Techniques AP systems use to "reconcile later":
1. Last-Write-Wins (LWW){Seoul: 3, New York: 2, London: 1}// CRDT counter example
class GCounter {
constructor(nodeId) {
this.nodeId = nodeId;
this.counts = {}; // {node1: 5, node2: 3}
}
increment() {
this.counts[this.nodeId] = (this.counts[this.nodeId] || 0) + 1;
}
// Merge with another node's counter
merge(other) {
for (let node in other.counts) {
this.counts[node] = Math.max(
this.counts[node] || 0,
other.counts[node]
);
}
}
value() {
return Object.values(this.counts).reduce((a, b) => a + b, 0);
}
}
"Our database guarantees perfect consistency!" — Many NoSQL vendors' marketing claims.
In 2013, Kyle Kingsbury created Jepsen, a testing framework to verify these claims.
; Jepsen test pseudocode
(deftest partition-test
; 1. Randomly split the network
(partition-network [node1 node2] [node3])
; 2. Write to both sides simultaneously
(parallel
(write! node1 :x 1)
(write! node3 :x 2))
; 3. Heal the network
(heal-network)
; 4. Verify consistency
(assert (= (read! node1 :x) (read! node3 :x))))
Lesson: "Trust but verify." Check Jepsen reports before production—that's what pros do.
When I first encountered CAP theorem, it felt like a constraint. Now I understand it's reality.
The speed of light is finite. Networks will fail. Clocks can't sync perfectly. Faced with these physical laws, we must choose:
"What matters more for my system? Accuracy or availability?"A financial system would choose CP. Wrong balances causing money to vanish is worse than a brief service interruption.
A social platform would choose AP. Like counts updating a second late is less critical than the service going down entirely.
Here's what it comes down to: CAP theorem isn't about "forcing a choice"—it's a tool to help you choose wisely.
Understanding your system's essence and making the right trade-off. That's the core insight this theorem gives you.
A: Single server yes, but replication puts you under CAP.
A: Depends on configuration.
min-replicas-to-write 2 → CP (refuses writes without replication)A: Typically AP. Allows forks (branches), chooses longest chain later (Eventual Consistency). But Bitcoin with 10-minute waits effectively has Finality (CP-like trait).
Q4: Is DynamoDB CP or AP?A: Default is AP. But ConsistentRead=true option makes it behave like CP. "Tunable Consistency."
A: No. Even Spanner is strictly CP. It just achieves 99.999% availability to look like CA. Can't beat physics.
Q6: How does this apply to microservices?A: Apply differently per service.