
Replication: Availability and Read Distribution
Understanding high availability and read performance improvement through database replication

Understanding high availability and read performance improvement through database replication
Understanding database connection pooling and performance optimization through practical experience

Understanding database transactions and ACID properties through practical experience

Understanding database sharding and handling massive traffic through practical experience

Understanding vector database principles and practical applications through project experience

If you only have one DB and it goes down, the whole service stops. That simple question is what led me to study Replication.
Reading through post-mortems and incident reports, the same pattern kept appearing: "We had a single DB setup. When it failed, the service was down for hours." The follow-up questions were obvious. Can't you just run multiple DBs? How do you keep the data in sync? Who takes over if the master dies?
Following those questions led me to Replication. The master DB's data is automatically copied to slave DBs, and if the master dies, a slave is promoted to master (Failover). On top of that, distributing read queries to slaves reduces the master's load.
When I first encountered replication, the most confusing part was "How does data synchronize?" When you write data to the master, does it automatically copy to slaves? Then how do you handle network latency?
Another confusion was the difference between "synchronous vs asynchronous replication." Synchronous seems safer, so why use asynchronous? For performance?
And I was curious "How does failover happen automatically?" Who detects that the master died, and who promotes a slave to master?
The decisive analogy that helped me understand replication was "Meeting Minutes."
During a meeting, one person (Master) writes the minutes. Others (Slaves) have copies of those minutes.
Synchronous Replication:Most production systems use Asynchronous for performance, accepting the slight lag risk.
The most basic and common structure.
┌─────────┐
│ Master │ ← Write (INSERT, UPDATE, DELETE)
└────┬────┘
│ Replication Stream
├────────┬────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Slave 1 │ │ Slave 2 │ │ Slave 3 │ ← Read (SELECT)
└─────────┘ └─────────┘ └─────────┘
Key Code Pattern:
// Write: Always to Master
async function createUser(userData) {
return await masterDb.query('INSERT INTO users ...', userData);
}
// Read: Load balance across Slaves (Round Robin)
let slaveIndex = 0;
async function getUser(userId) {
const slave = slaves[slaveIndex % slaves.length];
slaveIndex++;
return await slave.query('SELECT * FROM users WHERE id = ?', [userId]);
}
Multiple masters replicate each other.
┌─────────┐ ←──────→ ┌─────────┐
│ Master1 │ │ Master2 │
└─────────┘ └─────────┘
Pros:
Used by DynamoDB and Cassandra. There is no Master. Clients send writes to all replicas.
The Quorum Formula (W + R > N)If W + R > N, you are mathematically guaranteed to read the latest data (Pigeonhole Principle).
Pros: No Single Point of Failure (SPOF). Extreme availability. Cons: Complexity in handling conflicts (Read Repair).
Deep Dive: How to Resolve Conflicts? What if two users update the same record at the exact same millisecond?
[v1, v2]. On conflict, ask the app to merge them.The biggest headache in replication. A user updates their profile, refreshes the page, and still sees the old name. Why? Because the read went to a Slave that hasn't received the update yet.
If a user just wrote something, force their subsequent reads to go to the Master for a short duration.
class Database {
private lastWriteTime = 0;
private LAG_THRESHOLD = 2000; // 2 seconds
async write(query, params) {
const result = await masterDb.query(query, params);
this.lastWriteTime = Date.now();
return result;
}
async read(query, params) {
const timeSinceWrite = Date.now() - this.lastWriteTime;
if (timeSinceWrite < this.LAG_THRESHOLD) {
// Wrote recently? Read from Master to ensure consistency.
return await masterDb.query(query, params);
}
// Safe to read from Slave
return await slaveDb.query(query, params);
}
}
If the Master server crashes, the service can't accept writes. We need to promote a Slave to be the new Master.
Automatic Failover Steps (e.g., using Orchestrator):STOP SLAVE; RESET SLAVE ALL; on the chosen Slave.GROUP BY queries on a dedicated Analytics Slave. This prevents slowing down the Master for live users.Database replication is about copying data from a Master to Slaves to ensure High Availability and Scale Reads. While Asynchronous Replication is standard for performance, you must handle Replication Lag (e.g., Read-your-own-writes) and plan for Failover (Promoting a Slave when Master dies). It's the backbone of any scalable system.