
Designing a Payment System: The Weight of Code That Moves Money
Integrating a payment API is just the beginning. Idempotency, refund flows, and double-charge prevention make payment systems genuinely hard.

Integrating a payment API is just the beginning. Idempotency, refund flows, and double-charge prevention make payment systems genuinely hard.
Why is the CPU fast but the computer slow? I explore the revolutionary idea of the 80-year-old Von Neumann architecture and the fatal bottleneck it left behind.

ChatGPT answers questions. AI Agents plan, use tools, and complete tasks autonomously. Understanding this difference changes how you build with AI.

When you don't want to go yourself, the proxy goes for you. Hide your identity with Forward Proxy, protect your server with Reverse Proxy. Same middleman, different loyalties.

Why would Netflix intentionally shut down its own production servers? Explore the philosophy of Chaos Engineering, the Simian Army, and detailed strategies like GameDays and Automating Chaos to build resilient distributed systems.

A user sent me an email. They'd been charged twice. A $50 product, but $100 had left their card. I dug through the logs. One order. Two payment records. They'd clicked "Pay Now" twice, and my code had accepted both clicks.
I couldn't sleep that night. While processing the refund, it hit me. Integrating a payment API is just the beginning. Code that moves money is different. Even if click events fire twice, even if the network drops, even if the server restarts, money must move exactly once.
Building a payment system taught me something beyond technology. It taught me the weight of responsibility. This code represents someone's salary, a founder's operating budget, a grandmother's gift for her grandchild. When bugs happen, trust collapses. That's why payment systems are hard.
At first, I thought: whether it's Stripe or Toss Payments, can't I just use the SDK? Read the docs, call the charge() function, done?
No. Payments are a state machine. An order is created, a payment intent is formed, a charge happens, it's confirmed, and sometimes it's refunded. Each step can fail, and each failure needs different handling.
// My naive initial code
async function processPayment(orderId: string, amount: number) {
const result = await paymentClient.charge({
amount,
orderId,
});
if (result.success) {
await db.updateOrder(orderId, { status: 'paid' });
}
return result;
}
Counting the problems in this code, I found at least five:
charge() succeeds but updateOrder() fails? Money is gone but order status is unpaidPayment systems aren't simple CRUD. Like a conductor leading an orchestra, you must coordinate multiple systems, ensuring each instrument (orders, inventory, payment provider, notifications) plays at the exact right moment.
The solution to duplicate charges was idempotency. It's a math term, but simply put: "doing the same operation multiple times produces the same result as doing it once."
In payments, this is implemented with idempotency keys. Each payment request gets a unique key, and if another request comes with the same key, instead of charging again, you return the previous result.
// Payment flow with idempotency
async function createPayment(orderId: string, amount: number) {
// Generate idempotency key based on order ID
const idempotencyKey = `payment_${orderId}_${Date.now()}`;
// First create payment intent in DB
const paymentIntent = await db.createPaymentIntent({
orderId,
amount,
idempotencyKey,
status: 'pending',
});
// Check if already processed
const existing = await db.findPaymentByIdempotencyKey(idempotencyKey);
if (existing) {
return existing; // Duplicate request returns existing result
}
try {
// Actual payment provider API call
const result = await paymentClient.charge({
amount,
idempotencyKey, // Both Stripe and Toss support this
metadata: { orderId, paymentIntentId: paymentIntent.id },
});
// Update status on success
await db.updatePaymentIntent(paymentIntent.id, {
status: 'processing',
externalId: result.id,
});
return result;
} catch (error) {
// Record even on failure
await db.updatePaymentIntent(paymentIntent.id, {
status: 'failed',
errorMessage: error.message,
});
throw error;
}
}
Idempotency keys are like receipt numbers. You order at a café and get a receipt. The coffee doesn't come, so you go back to the counter. Showing the receipt doesn't make them prepare the same order twice. They tell you it's already being made, or give you the finished coffee. Money is collected only once.
Payments go through multiple states. Each transition needs clear definitions of what can happen and where failures can occur.
type PaymentStatus =
| 'pending' // Payment intent created
| 'processing' // Request sent to payment provider
| 'requires_action' // Additional auth needed (3D Secure, etc.)
| 'completed' // Payment completed
| 'failed' // Payment failed
| 'refunding' // Refund in progress
| 'refunded' // Refund completed
| 'partial_refunded'; // Partial refund
interface PaymentStateMachine {
current: PaymentStatus;
allowedTransitions: Record<PaymentStatus, PaymentStatus[]>;
}
const paymentStateMachine: PaymentStateMachine['allowedTransitions'] = {
pending: ['processing', 'failed'],
processing: ['requires_action', 'completed', 'failed'],
requires_action: ['processing', 'failed'],
completed: ['refunding'],
failed: ['pending'], // Allow retry
refunding: ['refunded', 'partial_refunded', 'failed'],
refunded: [], // Terminal state
partial_refunded: ['refunding'], // Additional refunds possible
};
function canTransition(from: PaymentStatus, to: PaymentStatus): boolean {
return paymentStateMachine[from].includes(to);
}
With a clear state machine, what needs to be recorded also became obvious. Log every state transition as an event:
interface PaymentEvent {
id: string;
paymentId: string;
type: 'status_changed' | 'webhook_received' | 'refund_requested';
fromStatus: PaymentStatus | null;
toStatus: PaymentStatus;
timestamp: Date;
metadata: Record<string, any>;
userId?: string;
}
// Always record events on status changes
async function transitionPaymentStatus(
paymentId: string,
toStatus: PaymentStatus,
metadata: Record<string, any> = {}
) {
const payment = await db.getPayment(paymentId);
if (!canTransition(payment.status, toStatus)) {
throw new Error(
`Invalid transition: ${payment.status} -> ${toStatus}`
);
}
await db.transaction(async (tx) => {
// Update status
await tx.updatePayment(paymentId, { status: toStatus });
// Record event
await tx.createPaymentEvent({
paymentId,
type: 'status_changed',
fromStatus: payment.status,
toStatus,
timestamp: new Date(),
metadata,
});
});
}
This is the airplane's black box. When something goes wrong, you can precisely reconstruct what happened at what moment. When a user says "I paid but it didn't work," you can look at the event log and know exactly where it got stuck.
Payments are asynchronous. Card issuer approval might be slow, 3D Secure authentication takes time. Payment providers notify results via webhooks.
The problem is webhooks aren't reliable. Networks can drop, our server can restart, and the same webhook can arrive multiple times.
// Webhook handler: idempotency is key
async function handlePaymentWebhook(
payload: WebhookPayload,
signature: string
) {
// 1. Verify signature (critical! prevents forgery)
const isValid = verifyWebhookSignature(payload, signature);
if (!isValid) {
throw new Error('Invalid webhook signature');
}
// 2. Check duplicates by webhook event ID
const eventId = payload.id;
const existing = await db.findWebhookEvent(eventId);
if (existing) {
// Already processed: return 200 OK (stops retries)
return { status: 'already_processed' };
}
// 3. Record webhook event (ensures idempotency)
await db.createWebhookEvent({
id: eventId,
type: payload.type,
data: payload.data,
processedAt: new Date(),
});
// 4. Handle by event type
switch (payload.type) {
case 'payment.completed':
await handlePaymentCompleted(payload.data);
break;
case 'payment.failed':
await handlePaymentFailed(payload.data);
break;
case 'refund.completed':
await handleRefundCompleted(payload.data);
break;
}
return { status: 'processed' };
}
async function handlePaymentCompleted(data: any) {
const paymentId = data.paymentId;
await db.transaction(async (tx) => {
// Update payment status
await transitionPaymentStatus(paymentId, 'completed', {
source: 'webhook',
externalId: data.id,
});
// Complete order
const payment = await tx.getPayment(paymentId);
await tx.updateOrder(payment.orderId, {
status: 'paid',
paidAt: new Date(),
});
// Deduct inventory, send emails, etc.
await fulfillOrder(payment.orderId);
});
}
Webhook retry logic matters too. Stripe retries for up to 3 days on failure. We can implement similarly:
// Queue for webhook retries
async function enqueueWebhookRetry(
webhookUrl: string,
payload: any,
attempt: number = 0
) {
const maxAttempts = 5;
const backoffMs = Math.pow(2, attempt) * 1000; // Exponential backoff
if (attempt >= maxAttempts) {
// Final failure: send alert
await notifyWebhookFailure(webhookUrl, payload);
return;
}
setTimeout(async () => {
try {
await sendWebhook(webhookUrl, payload);
} catch (error) {
// Retry on failure
await enqueueWebhookRetry(webhookUrl, payload, attempt + 1);
}
}, backoffMs);
}
Webhooks are like mail delivery. You send a letter (event) but nobody's home (server down), so they come back later. After several attempts, they hold it at the post office (Dead Letter Queue). And crucially, even if you receive the same letter multiple times, you must process it as if read only once (idempotency).
Refunds can be more complex than payments. You need to consider full refunds, partial refunds, refund fees, and refund deadlines.
interface RefundRequest {
paymentId: string;
amount?: number; // undefined = full refund
reason: 'requested_by_customer' | 'fraudulent' | 'duplicate' | 'other';
metadata?: Record<string, any>;
}
async function refundPayment(request: RefundRequest) {
const payment = await db.getPayment(request.paymentId);
// Check refundable status
if (payment.status !== 'completed') {
throw new Error('Only completed payments can be refunded');
}
// Validate refund amount
const refundAmount = request.amount ?? payment.amount;
const alreadyRefunded = await db.getTotalRefunded(request.paymentId);
if (alreadyRefunded + refundAmount > payment.amount) {
throw new Error('Refund amount exceeds payment amount');
}
// Create refund intent
const refund = await db.createRefund({
paymentId: request.paymentId,
amount: refundAmount,
reason: request.reason,
status: 'pending',
});
try {
// Call payment provider API
const result = await paymentClient.refund({
paymentId: payment.externalId,
amount: refundAmount,
reason: request.reason,
idempotencyKey: `refund_${refund.id}`,
});
// Change to refunding state
await transitionPaymentStatus(
request.paymentId,
refundAmount === payment.amount ? 'refunding' : 'partial_refunded',
{ refundId: refund.id }
);
await db.updateRefund(refund.id, {
status: 'processing',
externalId: result.id,
});
return refund;
} catch (error) {
await db.updateRefund(refund.id, {
status: 'failed',
errorMessage: error.message,
});
throw error;
}
}
Refunds are also processed asynchronously, with results delivered via webhooks. And importantly, refunds also need reconciliation.
The scariest thing in payment systems is "our DB says payment succeeded, but money never actually came in" or vice versa.
Reconciliation is the process of periodically comparing our DB records with the payment provider's:
async function reconcilePayments(date: Date) {
// 1. Get our payment records for the date
const ourPayments = await db.getPaymentsByDate(date);
// 2. Get payment provider records
const theirPayments = await paymentClient.listPayments({
created: { gte: date, lt: addDays(date, 1) },
});
// 3. Match
const ourIds = new Set(ourPayments.map(p => p.externalId));
const theirIds = new Set(theirPayments.map(p => p.id));
// Only in our DB (not in payment provider)
const onlyInOurDB = ourPayments.filter(
p => !theirIds.has(p.externalId)
);
// Only in payment provider (not in our DB)
const onlyInPaymentProvider = theirPayments.filter(
p => !ourIds.has(p.id)
);
// 4. Report discrepancies
const report = {
date,
totalOurs: ourPayments.length,
totalTheirs: theirPayments.length,
mismatches: {
onlyInOurDB,
onlyInPaymentProvider,
},
};
// 5. Send alerts
if (onlyInOurDB.length > 0 || onlyInPaymentProvider.length > 0) {
await alertFinanceTeam(report);
}
return report;
}
// Daily reconciliation at midnight for previous day
cron.schedule('0 0 * * *', async () => {
const yesterday = subDays(new Date(), 1);
await reconcilePayments(yesterday);
});
Reconciliation is like bank settlement. Checking if the ledger balance matches the actual cash in the vault. If they don't match, you need to find where money leaked or records were missed.
PCI DSS (Payment Card Industry Data Security Standard) is the security standard all systems handling card payments must follow. The core is simple: don't store card numbers on our server.
Instead, use tokenization:
// Frontend sends card info directly to payment provider
// (doesn't go through our server)
const stripe = Stripe('pk_live_...');
const cardElement = elements.create('card');
// Generate token
const { token } = await stripe.createToken(cardElement);
// Only send token to our server
const response = await fetch('/api/payments', {
method: 'POST',
body: JSON.stringify({
orderId: 'order_123',
paymentToken: token.id, // Token, not card number
}),
});
The server never sees the card number. It receives only the token and passes it to the payment provider:
async function chargeWithToken(orderId: string, token: string) {
const order = await db.getOrder(orderId);
const result = await paymentClient.charge({
amount: order.amount,
source: token, // Use token
metadata: { orderId },
});
// Store only payment result (no card info)
await db.createPayment({
orderId,
amount: order.amount,
externalId: result.id,
last4: result.card.last4, // Only last 4 digits
brand: result.card.brand, // Visa, MasterCard, etc.
});
}
This is like hiring a designated driver. We don't handle the car (card info) ourselves. We leave it to the professional driver (payment provider), and we just say "take me from A to B (process this payment)."
Building a payment system taught me a lot of technology. Idempotency, state machines, webhooks, reconciliation. But what I really learned was the weight of responsibility.
The user who got charged twice for clicking a button. The customer who sent an angry email because the refund was delayed. The teammate who stayed up for nights because reconciliation didn't match. All of this was because of my code.
Code that moves money is different. You write more tests. You log more details. You think harder about edge cases. Not "this should be good enough," but "is this truly safe?"
Payment systems aren't a technology problem. They're a trust problem. Users trust us enough to enter their card numbers. Protecting that trust is the developer's job. That's why payment code is heavy. It must be heavy.