News Feed System Design: Push vs Pull and the Fan-out Problem

Prologue: Hitting the Wall While Building a Feed

When I started building a social media project, the feed was the first thing I needed to tackle. "Just show recent posts from people I follow," I thought. Simple enough, right? But when I actually sat down to implement it, things got complicated fast.

My initial approach was naive. Every time a user opens their feed, fetch all posts from everyone they follow and sort by time. Done. And honestly, it worked fine when following 10 or 100 people.

But then scale hit me. What if a celebrity with 1 million followers posts? What if a user follows 1,000 people? Suddenly, database queries crawled to a halt and servers started groaning under the load.

That's when it clicked. A feed isn't just a simple read operation—it's a system design problem about choosing the right trade-off between reads and writes.

First Realization: Pull is Simple but Slow

My initial implementation was the Pull model. Fetch data in real-time whenever a user requests their feed.

-- Typical Pull model query
SELECT posts.*
FROM posts
WHERE user_id IN (
  SELECT followed_id
  FROM follows
  WHERE follower_id = 'current_user_id'
)
ORDER BY created_at DESC
LIMIT 20;

Think of it like going to a library. You walk in with a list of your favorite authors, then manually search for their latest books on the shelves. Every single visit, you repeat the entire search process.

Pull Model Advantages

Simple to implement: One query and you're done. No complex background jobs needed.
Low storage cost: No need for separate feed tables or caches.
Always fresh: Real-time fetching means no stale data or sync issues.

Pull Model Problems

But as my user base grew, the cracks started showing.

// Feed loading time for user following 1000 people
const startTime = Date.now();
const feed = await getFeedPull(userId);
const loadTime = Date.now() - startTime;

console.log(`Load time: ${loadTime}ms`); // 3000ms... way too slow!

Slow queries: The more people you follow, the slower and more complex the joins become.
Repeated computation: Request the same feed 10 times, recalculate 10 times. Wasteful.
Database strain: With many users, the database becomes the bottleneck.

I realized Pull alone couldn't scale to Instagram or Twitter levels.

Second Realization: Push is Fast but Complex

That's when I discovered the Push model. A completely different approach: pre-compute and insert posts into every follower's feed the moment they're created.

It's like newspaper delivery. When the press prints a new paper, they deliver it to every subscriber's mailbox in advance. Subscribers just open their mailbox—instant access.

// Push model: fan-out on write
async function createPost(userId: string, content: string) {
  // 1. Create the post
  const post = await db.posts.create({
    user_id: userId,
    content: content,
    created_at: new Date()
  });

  // 2. Fan-out to all followers
  const followers = await db.follows.findMany({
    where: { followed_id: userId }
  });

  // 3. Insert into each follower's feed table
  const feedInserts = followers.map(follower => ({
    user_id: follower.follower_id,
    post_id: post.id,
    created_at: post.created_at
  }));

  await db.feeds.createMany({ data: feedInserts });

  return post;
}

// Reading the feed is blazing fast
async function getFeedPush(userId: string) {
  return await db.feeds.findMany({
    where: { user_id: userId },
    include: { post: true },
    orderBy: { created_at: 'desc' },
    take: 20
  });
}

Push Model Advantages

Lightning-fast reads: Data is pre-computed. No complex joins needed.
Excellent scalability: Read speed stays constant regardless of how many people you follow.
Cache-friendly: Pre-computed data is perfect for caching in Redis.

Push Model Problems: The Fan-out Nightmare

But implementing Push exposed me to the fan-out problem. This was a real headache.

// What if a celebrity with 1M followers posts?
const celebrity = { id: 'celeb_123', followers: 1_000_000 };

await createPost(celebrity.id, 'Hello everyone!');
// Need to insert 1 MILLION database records...
// This could take minutes!

Fan-out is when a single post "explodes" into thousands or millions of feed entries. The more followers someone has, the more expensive each post becomes to write.

Storage is another concern. If 1 million followers each store 20 posts in their feed, that's 20 million records—most of which will never even be read.

Third Realization: The Answer was Hybrid

The real insight came here: "Why did I think I had to choose just one?"

Studying how Twitter actually works taught me this. They use different strategies for different user types.

// Hybrid model: strategy based on user type
async function createPostHybrid(userId: string, content: string) {
  const post = await db.posts.create({
    user_id: userId,
    content: content,
    created_at: new Date()
  });

  const user = await db.users.findUnique({ where: { id: userId } });

  // Strategy depends on follower count
  if (user.followers_count > 100_000) {
    // Celebrity: no push, handle with pull
    console.log('Celebrity post - will be pulled on demand');
    return post;
  }

  // Regular user: push (fan-out)
  const followers = await db.follows.findMany({
    where: { followed_id: userId }
  });

  const feedInserts = followers.map(follower => ({
    user_id: follower.follower_id,
    post_id: post.id,
    created_at: post.created_at
  }));

  await db.feeds.createMany({ data: feedInserts });

  return post;
}

// Hybrid feed retrieval
async function getFeedHybrid(userId: string) {
  // 1. Get pushed feed (posts from regular users)
  const pushedFeed = await db.feeds.findMany({
    where: { user_id: userId },
    include: { post: { include: { user: true } } },
    orderBy: { created_at: 'desc' },
    take: 50
  });

  // 2. Pull celebrity posts in real-time
  const celebritiesFollowed = await db.follows.findMany({
    where: {
      follower_id: userId,
      followed: { followers_count: { gte: 100_000 } }
    }
  });

  const celebrityPosts = await db.posts.findMany({
    where: {
      user_id: { in: celebritiesFollowed.map(f => f.followed_id) },
      created_at: { gte: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000) }
    },
    include: { user: true },
    orderBy: { created_at: 'desc' }
  });

  // 3. Merge and sort
  const merged = [...pushedFeed, ...celebrityPosts]
    .sort((a, b) => b.created_at.getTime() - a.created_at.getTime())
    .slice(0, 20);

  return merged;
}

The key insight is the Pareto Principle. Most users have few followers. Only 1% of celebrities have hundreds of thousands. So handle 99% efficiently with Push, and just 1% with Pull.

It's like a restaurant kitchen. Popular dishes are pre-prepped (Push), while special orders are made-to-order (Pull). You can't prep everything in advance, and you can't make everything to order.

Deep Dive: Implementation Details

1. Timeline Storage Structure

// Feed table design
interface FeedItem {
  id: string;
  user_id: string;        // Owner of this feed
  post_id: string;        // Post reference
  author_id: string;      // Post author
  created_at: Date;       // Post creation time
  inserted_at: Date;      // When inserted into feed
}

// Indexes are crucial
// CREATE INDEX idx_feeds_user_created ON feeds(user_id, created_at DESC);

Why store both inserted_at and created_at? Because you might need to regenerate feeds or adjust ordering later.

2. Ranking and Sorting Algorithms

Chronological order alone isn't enough. Like Instagram or Facebook, you need to consider engagement.

// Simple ranking score
function calculateFeedScore(post: Post, user: User): number {
  const ageInHours = (Date.now() - post.created_at.getTime()) / (1000 * 60 * 60);

  // Recency: exponential decay over time
  const recencyScore = Math.exp(-ageInHours / 24);

  // Popularity: likes, comments, shares
  const popularityScore =
    post.likes_count * 1.0 +
    post.comments_count * 2.0 +
    post.shares_count * 3.0;

  // Affinity: past interactions with author
  const affinityScore = calculateAffinity(user, post.author);

  // Weighted sum
  return (
    recencyScore * 0.5 +
    Math.log(1 + popularityScore) * 0.3 +
    affinityScore * 0.2
  );
}

// Sort feed by score
async function getRankedFeed(userId: string) {
  const rawFeed = await getFeedHybrid(userId);

  const user = await db.users.findUnique({ where: { id: userId } });

  return rawFeed
    .map(item => ({
      ...item,
      score: calculateFeedScore(item.post, user)
    }))
    .sort((a, b) => b.score - a.score)
    .slice(0, 20);
}

3. Pagination: Cursor vs Offset

I initially used offset-based pagination and ran into problems.

// Offset approach (bad)
async function getFeedOffset(userId: string, page: number) {
  return await db.feeds.findMany({
    where: { user_id: userId },
    orderBy: { created_at: 'desc' },
    skip: page * 20,
    take: 20
  });
}

// Problem: if new posts arrive while user is viewing page 2,
// page 3 will have duplicates or missing items!

The solution was cursor-based pagination.

// Cursor approach (good)
async function getFeedCursor(userId: string, cursor?: string) {
  const where: any = { user_id: userId };

  if (cursor) {
    // Only fetch items after the cursor (last item's created_at)
    where.created_at = { lt: new Date(cursor) };
  }

  const items = await db.feeds.findMany({
    where,
    orderBy: { created_at: 'desc' },
    take: 21 // Fetch one extra to check if more pages exist
  });

  const hasMore = items.length > 20;
  const feed = items.slice(0, 20);
  const nextCursor = hasMore ? feed[feed.length - 1].created_at.toISOString() : null;

  return { feed, nextCursor, hasMore };
}

4. Caching Strategy

Feeds are perfect for caching, especially in the Push model.

// Redis-based feed caching
import Redis from 'ioredis';
const redis = new Redis();

async function getCachedFeed(userId: string): Promise<FeedItem[]> {
  const cacheKey = `feed:${userId}`;

  // 1. Check cache
  const cached = await redis.get(cacheKey);
  if (cached) {
    return JSON.parse(cached);
  }

  // 2. Fetch from DB
  const feed = await getFeedHybrid(userId);

  // 3. Store in cache (5 min TTL)
  await redis.setex(cacheKey, 300, JSON.stringify(feed));

  return feed;
}

// Invalidate cache when new post is pushed
async function invalidateFeedCache(userId: string) {
  await redis.del(`feed:${userId}`);
}

// Better approach: Redis Sorted Set
async function pushToFeedCache(userId: string, postId: string, timestamp: number) {
  const cacheKey = `feed:${userId}`;

  // Add to sorted set (timestamp as score)
  await redis.zadd(cacheKey, timestamp, postId);

  // Keep only latest 1000 items
  await redis.zremrangebyrank(cacheKey, 0, -1001);

  // Set TTL
  await redis.expire(cacheKey, 3600);
}

async function getFeedFromCache(userId: string, limit: number = 20) {
  const cacheKey = `feed:${userId}`;

  // Get in reverse chronological order
  const postIds = await redis.zrevrange(cacheKey, 0, limit - 1);

  if (postIds.length === 0) {
    return null; // Cache miss
  }

  // Batch fetch post data
  const posts = await db.posts.findMany({
    where: { id: { in: postIds } },
    include: { user: true }
  });

  // Maintain original order
  const postMap = new Map(posts.map(p => [p.id, p]));
  return postIds.map(id => postMap.get(id)).filter(Boolean);
}

Using Redis Sorted Sets lets you manage feeds memory-efficiently. Store only post IDs, and batch-fetch actual data when needed.

Real-World Examples: Twitter and Instagram

Twitter's Approach

Twitter switched from pure Pull to Hybrid in 2012. Their engineering blog taught me a lot.

Regular users: Fan-out on write (Push)
Celebrities: Fan-out on read (Pull)
Threshold: Automatic classification by follower count
Cache: Redis Cluster for massive-scale caching

Interestingly, Twitter built a separate "Timeline Service" microservice. The feed generation logic became so complex they had to isolate it.

Instagram's Approach

Instagram is a bit different. With photo-heavy content, media loading optimization is more critical.

Feed Architecture: Cassandra + Redis
Ranking: ML-based personalization (interests, affinity)
Prefetching: Preload next page
Image CDN: Geographic caching

What's interesting about Instagram is they manage "Stories" as a completely separate system. The 24-hour expiration requires different strategies.

Summary: What I Learned

The key lessons from building feed systems:

No perfect solution: Neither Pull nor Push is optimal for all cases. Hybrid is the pragmatic choice.
Design for scale: Start simple, but if you plan to grow, think about your migration path early.
Measurement matters: Monitor feed load times, write latency, cache hit rates to find bottlenecks.
Caching is essential: Memory is cheap. Aggressive Redis usage solves many problems.
Use cursor-based pagination: Offset-based pagination breaks with real-time feeds.
Make fan-out async: Post creation can't be slow. Handle fan-out in background jobs.

Ultimately, feed systems are about finding the balance between reads and writes. Understand user behavior patterns, decide where to store data, and when to compute it.

The realistic path is: start with Pull for simplicity, gradually introduce Push as traffic grows, and evolve to Hybrid as you scale further.

Now when I build a feed in my next project, I'll design it with confidence.