Designing Instagram: A Deep Dive into Photo Sharing at Scale
Analyze how to design a photo-sharing platform like Instagram. Cover storage, CDN, feed ranking, and how to scale to billions of photos.
Why Study Instagram?
Instagram handles 2 billion monthly active users, 500 million daily active users, and over 95 million photos/videos uploaded daily. Understanding how it's built reveals how to design massive-scale social platforms.
Approach: We'll design a simplified version. In interviews, focus on identifying key scaling challenges and proposing reasonable solutions rather than implementing full Instagram.
Requirements Analysis
Functional Requirements
- User Management: Register, login, profile management
- Photo Upload: Upload photos with filters and captions
- Feed: See photos from followed users (ordered by recency or algorithmic)
- Social Graph: Follow/unfollow users
- Engagement: Like, comment, share
- Search: Find users and hashtags
Non-Functional Requirements
| Requirement | Target |
|---|---|
| Scale | 2B photos/day upload rate at peak |
| Storage | Petabytes of photo storage |
| Latency | Feed loads in < 500ms |
| Availability | 99.9% uptime |
| Durability | Photos never lost |
High-Level Architecture
Photo Upload Flow
Step-by-Step
Key Design Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Upload method | Direct to S3 | Reduces server load |
| Image storage | S3 + CloudFront | Durable, CDN-backed |
| Image processing | Async workers | Doesn't block upload |
| Metadata | PostgreSQL | Relational, ACID |
Why direct upload to S3? The API server becomes a bottleneck for uploads. By using presigned URLs, clients upload directly to storage, and the API just coordinates.
Feed Design
The Fan-Out Problem
When you post a photo, it needs to appear in your followers' feeds. If you have 1 million followers, that's 1 million entries to write.
Push vs Pull Model
| Model | Description | Pros | Cons |
|---|---|---|---|
| Push (Fan-out on write) | Write to all followers' feeds on post | Fast reads | Expensive for popular users |
| Pull (Fan-out on read) | Compute feed when requested | Cheaper storage | Slow reads |
| Hybrid | Push for small accounts, pull for large | Balanced | Complex |
How Instagram Actually Does It
Instagram uses a hybrid approach:
- Celebrity accounts (>10K followers): Don't fan out writes. Pull on read.
- Regular accounts: Fan out writes to followers' feeds.
- Cached feeds: Most users get served from cache, not computed fresh.
The feed retrieval checks the cache first. On a miss, it pulls posts from celebrity accounts and merges them with cached posts from regular follows. The combined posts are sorted by timestamp in descending order and paginated to the requested limit.
Database Schema
User Table
CREATE TABLE users (
id BIGINT PRIMARY KEY,
username VARCHAR(30) UNIQUE NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
profile_photo_url VARCHAR(500),
bio TEXT,
follower_count INT DEFAULT 0,
following_count INT DEFAULT 0,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_username ON users(username);
CREATE INDEX idx_follower_count ON users(follower_count);
Post Table
CREATE TABLE posts (
id BIGINT PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id),
image_url VARCHAR(500) NOT NULL,
caption TEXT,
like_count INT DEFAULT 0,
comment_count INT DEFAULT 0,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_user_id ON posts(user_id);
CREATE INDEX idx_created_at ON posts(created_at DESC);
CREATE INDEX idx_user_created ON posts(user_id, created_at DESC);
Follow Table
CREATE TABLE follows (
follower_id BIGINT NOT NULL REFERENCES users(id),
following_id BIGINT NOT NULL REFERENCES users(id),
created_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (follower_id, following_id)
);
CREATE INDEX idx_following ON follows(following_id);
Storage Architecture
Photo Storage at Scale
Storage Estimates
| Content | Size | Daily | Storage per Year |
|---|---|---|---|
| Photo (compressed) | 200KB | 50M | 3.6 TB |
| Thumbnail | 10KB | 50M | 180 GB |
| Video (avg) | 3MB | 5M | 5.5 TB |
| Metadata | 1KB | 55M | 20 GB |
Total daily storage growth: ~10 TB/day
After 1 year: ~3.6 PB (without considering deduplication and compression)
Caching Strategy
Cache Hierarchy
Cache Keys
Cache keys use consistent naming conventions for different data types. Feed cache stores post IDs for a user's personalized feed, profile cache stores user objects, post cache stores individual posts, and timeline cache stores celebrity post IDs for pull-based retrieval.
Search Architecture
Search Features
| Feature | Implementation |
|---|---|
| User search | Prefix matching on username |
| Hashtag search | Full-text on hashtags |
| Location search | Geospatial queries |
| Trending | Aggregation + time decay |
Scaling Challenges & Solutions
| Challenge | Solution |
|---|---|
| Photo upload bottleneck | Direct upload to S3 via presigned URLs |
| Feed computation | Hybrid push/pull, heavy caching |
| Celebrity accounts | Don't fan out; pull on read |
| Image resizing | Async processing with queues |
| Hot storage costs | Tier to cold storage after 90 days |
| Search performance | Elasticsearch for full-text search |
Key Takeaways
- Minimize writes at upload: Use presigned URLs for direct S3 upload
- Balance push vs pull: Hybrid model handles both small and large accounts
- Cache aggressively: Feed, profile, and post caches dramatically reduce DB load
- Async everything: Image processing, notifications, analytics - use queues
- Tier storage: Not all data needs to be hot; move old content to cold storage
Interview tip: When designing a social platform, always consider the "write amplification" problem. A single post might need to appear in thousands of feeds. Address this with fan-out control and caching.
Follow-Up Questions to Consider
- How would you handle video uploads (much larger files)?
- How would you implement the Explore page (algorithmic discovery)?
- How would you prevent spam and fake accounts?
- How would you design direct messaging?
- How would you handle real-time notifications?
Real Instagram trivia: Instagram moved from Ruby to Python early on for better performance. They use Django for the web framework and React Native for mobile. Feed ranking is powered by ML models that consider engagement likelihood, not just recency.