Scalability & Performance

Performance Optimization: Profiling, Caching, and Latency Reduction

Learn techniques to optimize system performance including caching strategies, database optimization, CDN usage, and profiling tools.

performanceoptimizationlatencycachingprofilingdatabase tuning

Measuring What Matters

Before optimizing, measure. You can't improve what you don't measure.

Key Metrics

MetricDefinitionTarget
LatencyTime for a single requestP50 < 100ms
ThroughputRequests per secondMeet peak demand
Error rateFailed requests %< 0.1%
AvailabilityUptime percentage> 99.9%

P99 matters more than P50: Fast P50 but slow P99 means some users have a terrible experience. Monitor both!


Latency Breakdown

Where Time Goes

Typical Latencies

OperationLatencyNotes
L1 cache reference1 nsCPU cache
L2 cache reference4 nsCPU cache
Main memory access100 nsRAM
SSD read100 μsNVMe SSD
HDD seek10 msDisk
Network: Same datacenter1 msLAN
Network: Cross-continent100 msInternet

Caching Strategies

The Cache Hierarchy

Cache Patterns

PatternWriteReadConsistencyUse Case
Cache-AsideDB onlyCache on missEventualGeneral
Read-ThroughDB onlyCache on missEventualSimplified code
Write-ThroughDB + CacheFrom cacheStrongCritical data
Write-BehindCache onlyFrom cacheEventualHigh write

Cache Invalidation

⚠️

Cache invalidation is hard: There are only two hard things in computer science: cache invalidation and naming things. Choose invalidation strategy based on your consistency requirements.


Database Optimization

Indexing Strategies

Query Optimization

sql
-- Bad: SELECT *
SELECT * FROM orders WHERE user_id = 123;

-- Good: SELECT specific columns
SELECT id, total, status, created_at 
FROM orders 
WHERE user_id = 123 
AND status = 'completed'
LIMIT 10;

Denormalization Trade-offs

NormalizedDenormalized
Write efficiencyRead efficiency
No data duplicationDuplicated data
Complex joinsSimpler queries
Consistency guaranteedConsistency burden

Network Optimization

Connection Pooling

HTTP/2 and HTTP/3 Benefits

FeatureHTTP/1.1HTTP/2HTTP/3
Multiplexing
Header compression
Parallel requestsMultiple connectionsSingle connectionSingle connection
QUIC (UDP)

Compression


Code-Level Optimization

Algorithm Complexity

The inefficient version uses nested loops causing O(n²) complexity. The optimized version sorts the data first (O(n log n)), then iterates once, improving overall complexity to O(n log n).

Avoiding N+1 Queries

The N+1 query problem occurs when fetching users then making a separate query for each user's posts. The improved version uses a single JOIN query to fetch all data at once, reducing database round trips from n+1 to 1.

Async I/O

The blocking approach waits for each HTTP request sequentially. The async approach uses aiohttp with asyncio.gather to fetch all URLs concurrently, dramatically reducing total wait time from the sum of all latencies to the slowest single request.


Monitoring and Profiling

Application Performance Monitoring (APM)

CategoryTools
APMNew Relic, Datadog, AWS X-Ray
ProfilingPyroscope, async-profiler, Chrome DevTools
LoggingELK Stack, Loki, CloudWatch
MetricsPrometheus + Grafana

What to Remember for Interviews

  1. Measure first: Optimize based on data, not assumptions
  2. Cache aggressively: Memory is cheaper than compute
  3. Database tuning: Index wisely, avoid N+1, consider denormalization
  4. Network efficiency: Use HTTP/2+, compress, keep connections alive
  5. P99 latency: Some slow requests affect all users

Practice: Profile your own web app. What's the P99 latency? Where are the bottlenecks? What's the cache hit rate? Start measuring before optimizing.