Performance Optimization: Profiling, Caching, and Latency Reduction
Learn techniques to optimize system performance including caching strategies, database optimization, CDN usage, and profiling tools.
Measuring What Matters
Before optimizing, measure. You can't improve what you don't measure.
Key Metrics
| Metric | Definition | Target |
|---|---|---|
| Latency | Time for a single request | P50 < 100ms |
| Throughput | Requests per second | Meet peak demand |
| Error rate | Failed requests % | < 0.1% |
| Availability | Uptime percentage | > 99.9% |
P99 matters more than P50: Fast P50 but slow P99 means some users have a terrible experience. Monitor both!
Latency Breakdown
Where Time Goes
Typical Latencies
| Operation | Latency | Notes |
|---|---|---|
| L1 cache reference | 1 ns | CPU cache |
| L2 cache reference | 4 ns | CPU cache |
| Main memory access | 100 ns | RAM |
| SSD read | 100 μs | NVMe SSD |
| HDD seek | 10 ms | Disk |
| Network: Same datacenter | 1 ms | LAN |
| Network: Cross-continent | 100 ms | Internet |
Caching Strategies
The Cache Hierarchy
Cache Patterns
| Pattern | Write | Read | Consistency | Use Case |
|---|---|---|---|---|
| Cache-Aside | DB only | Cache on miss | Eventual | General |
| Read-Through | DB only | Cache on miss | Eventual | Simplified code |
| Write-Through | DB + Cache | From cache | Strong | Critical data |
| Write-Behind | Cache only | From cache | Eventual | High write |
Cache Invalidation
Cache invalidation is hard: There are only two hard things in computer science: cache invalidation and naming things. Choose invalidation strategy based on your consistency requirements.
Database Optimization
Indexing Strategies
Query Optimization
-- Bad: SELECT *
SELECT * FROM orders WHERE user_id = 123;
-- Good: SELECT specific columns
SELECT id, total, status, created_at
FROM orders
WHERE user_id = 123
AND status = 'completed'
LIMIT 10;
Denormalization Trade-offs
| Normalized | Denormalized |
|---|---|
| Write efficiency | Read efficiency |
| No data duplication | Duplicated data |
| Complex joins | Simpler queries |
| Consistency guaranteed | Consistency burden |
Network Optimization
Connection Pooling
HTTP/2 and HTTP/3 Benefits
| Feature | HTTP/1.1 | HTTP/2 | HTTP/3 |
|---|---|---|---|
| Multiplexing | ❌ | ✅ | ✅ |
| Header compression | ❌ | ✅ | ✅ |
| Parallel requests | Multiple connections | Single connection | Single connection |
| QUIC (UDP) | ❌ | ❌ | ✅ |
Compression
Code-Level Optimization
Algorithm Complexity
The inefficient version uses nested loops causing O(n²) complexity. The optimized version sorts the data first (O(n log n)), then iterates once, improving overall complexity to O(n log n).
Avoiding N+1 Queries
The N+1 query problem occurs when fetching users then making a separate query for each user's posts. The improved version uses a single JOIN query to fetch all data at once, reducing database round trips from n+1 to 1.
Async I/O
The blocking approach waits for each HTTP request sequentially. The async approach uses aiohttp with asyncio.gather to fetch all URLs concurrently, dramatically reducing total wait time from the sum of all latencies to the slowest single request.
Monitoring and Profiling
Application Performance Monitoring (APM)
Popular Tools
| Category | Tools |
|---|---|
| APM | New Relic, Datadog, AWS X-Ray |
| Profiling | Pyroscope, async-profiler, Chrome DevTools |
| Logging | ELK Stack, Loki, CloudWatch |
| Metrics | Prometheus + Grafana |
What to Remember for Interviews
- Measure first: Optimize based on data, not assumptions
- Cache aggressively: Memory is cheaper than compute
- Database tuning: Index wisely, avoid N+1, consider denormalization
- Network efficiency: Use HTTP/2+, compress, keep connections alive
- P99 latency: Some slow requests affect all users
Practice: Profile your own web app. What's the P99 latency? Where are the bottlenecks? What's the cache hit rate? Start measuring before optimizing.