Expert Case Studies

Designing Instagram: A Deep Dive into Photo Sharing at Scale

Analyze how to design a photo-sharing platform like Instagram. Cover storage, CDN, feed ranking, and how to scale to billions of photos.

case studyinstagramphoto sharingsocial mediafeed designscale

Why Study Instagram?

Instagram handles 2 billion monthly active users, 500 million daily active users, and over 95 million photos/videos uploaded daily. Understanding how it's built reveals how to design massive-scale social platforms.

Approach: We'll design a simplified version. In interviews, focus on identifying key scaling challenges and proposing reasonable solutions rather than implementing full Instagram.


Requirements Analysis

Functional Requirements

  1. User Management: Register, login, profile management
  2. Photo Upload: Upload photos with filters and captions
  3. Feed: See photos from followed users (ordered by recency or algorithmic)
  4. Social Graph: Follow/unfollow users
  5. Engagement: Like, comment, share
  6. Search: Find users and hashtags

Non-Functional Requirements

RequirementTarget
Scale2B photos/day upload rate at peak
StoragePetabytes of photo storage
LatencyFeed loads in < 500ms
Availability99.9% uptime
DurabilityPhotos never lost

High-Level Architecture


Photo Upload Flow

Step-by-Step

Key Design Decisions

DecisionChoiceRationale
Upload methodDirect to S3Reduces server load
Image storageS3 + CloudFrontDurable, CDN-backed
Image processingAsync workersDoesn't block upload
MetadataPostgreSQLRelational, ACID
💡

Why direct upload to S3? The API server becomes a bottleneck for uploads. By using presigned URLs, clients upload directly to storage, and the API just coordinates.


Feed Design

The Fan-Out Problem

When you post a photo, it needs to appear in your followers' feeds. If you have 1 million followers, that's 1 million entries to write.

Push vs Pull Model

ModelDescriptionProsCons
Push (Fan-out on write)Write to all followers' feeds on postFast readsExpensive for popular users
Pull (Fan-out on read)Compute feed when requestedCheaper storageSlow reads
HybridPush for small accounts, pull for largeBalancedComplex

How Instagram Actually Does It

Instagram uses a hybrid approach:

  1. Celebrity accounts (>10K followers): Don't fan out writes. Pull on read.
  2. Regular accounts: Fan out writes to followers' feeds.
  3. Cached feeds: Most users get served from cache, not computed fresh.

The feed retrieval checks the cache first. On a miss, it pulls posts from celebrity accounts and merges them with cached posts from regular follows. The combined posts are sorted by timestamp in descending order and paginated to the requested limit.


Database Schema

User Table

sql
CREATE TABLE users (
    id BIGINT PRIMARY KEY,
    username VARCHAR(30) UNIQUE NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    profile_photo_url VARCHAR(500),
    bio TEXT,
    follower_count INT DEFAULT 0,
    following_count INT DEFAULT 0,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_username ON users(username);
CREATE INDEX idx_follower_count ON users(follower_count);

Post Table

sql
CREATE TABLE posts (
    id BIGINT PRIMARY KEY,
    user_id BIGINT NOT NULL REFERENCES users(id),
    image_url VARCHAR(500) NOT NULL,
    caption TEXT,
    like_count INT DEFAULT 0,
    comment_count INT DEFAULT 0,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_user_id ON posts(user_id);
CREATE INDEX idx_created_at ON posts(created_at DESC);
CREATE INDEX idx_user_created ON posts(user_id, created_at DESC);

Follow Table

sql
CREATE TABLE follows (
    follower_id BIGINT NOT NULL REFERENCES users(id),
    following_id BIGINT NOT NULL REFERENCES users(id),
    created_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (follower_id, following_id)
);

CREATE INDEX idx_following ON follows(following_id);

Storage Architecture

Photo Storage at Scale

Storage Estimates

ContentSizeDailyStorage per Year
Photo (compressed)200KB50M3.6 TB
Thumbnail10KB50M180 GB
Video (avg)3MB5M5.5 TB
Metadata1KB55M20 GB
💡

Total daily storage growth: ~10 TB/day
After 1 year: ~3.6 PB (without considering deduplication and compression)


Caching Strategy

Cache Hierarchy

Cache Keys

Cache keys use consistent naming conventions for different data types. Feed cache stores post IDs for a user's personalized feed, profile cache stores user objects, post cache stores individual posts, and timeline cache stores celebrity post IDs for pull-based retrieval.


Search Architecture

Search Features

FeatureImplementation
User searchPrefix matching on username
Hashtag searchFull-text on hashtags
Location searchGeospatial queries
TrendingAggregation + time decay

Scaling Challenges & Solutions

ChallengeSolution
Photo upload bottleneckDirect upload to S3 via presigned URLs
Feed computationHybrid push/pull, heavy caching
Celebrity accountsDon't fan out; pull on read
Image resizingAsync processing with queues
Hot storage costsTier to cold storage after 90 days
Search performanceElasticsearch for full-text search

Key Takeaways

  1. Minimize writes at upload: Use presigned URLs for direct S3 upload
  2. Balance push vs pull: Hybrid model handles both small and large accounts
  3. Cache aggressively: Feed, profile, and post caches dramatically reduce DB load
  4. Async everything: Image processing, notifications, analytics - use queues
  5. Tier storage: Not all data needs to be hot; move old content to cold storage

Interview tip: When designing a social platform, always consider the "write amplification" problem. A single post might need to appear in thousands of feeds. Address this with fan-out control and caching.


Follow-Up Questions to Consider

  1. How would you handle video uploads (much larger files)?
  2. How would you implement the Explore page (algorithmic discovery)?
  3. How would you prevent spam and fake accounts?
  4. How would you design direct messaging?
  5. How would you handle real-time notifications?
💡

Real Instagram trivia: Instagram moved from Ruby to Python early on for better performance. They use Django for the web framework and React Native for mobile. Feed ranking is powered by ML models that consider engagement likelihood, not just recency.