Performance Optimization: Profiling, Caching, and Latency Reduction

Learn techniques to optimize system performance including caching strategies, database optimization, CDN usage, and profiling tools.

performanceoptimizationlatencycachingprofilingdatabase tuning

Measuring What Matters

Before optimizing, measure. You can't improve what you don't measure.

Key Metrics

Metric	Definition	Target
Latency	Time for a single request	P50 < 100ms
Throughput	Requests per second	Meet peak demand
Error rate	Failed requests %	< 0.1%
Availability	Uptime percentage	> 99.9%

✅

P99 matters more than P50: Fast P50 but slow P99 means some users have a terrible experience. Monitor both!

Latency Breakdown

Where Time Goes

Typical Latencies

Operation	Latency	Notes
L1 cache reference	1 ns	CPU cache
L2 cache reference	4 ns	CPU cache
Main memory access	100 ns	RAM
SSD read	100 μs	NVMe SSD
HDD seek	10 ms	Disk
Network: Same datacenter	1 ms	LAN
Network: Cross-continent	100 ms	Internet

Caching Strategies

The Cache Hierarchy

Cache Patterns

Pattern	Write	Read	Consistency	Use Case
Cache-Aside	DB only	Cache on miss	Eventual	General
Read-Through	DB only	Cache on miss	Eventual	Simplified code
Write-Through	DB + Cache	From cache	Strong	Critical data
Write-Behind	Cache only	From cache	Eventual	High write

Cache Invalidation

⚠️

Cache invalidation is hard: There are only two hard things in computer science: cache invalidation and naming things. Choose invalidation strategy based on your consistency requirements.

Database Optimization

Indexing Strategies

Query Optimization

sql

-- Bad: SELECT *
SELECT * FROM orders WHERE user_id = 123;

-- Good: SELECT specific columns
SELECT id, total, status, created_at 
FROM orders 
WHERE user_id = 123 
AND status = 'completed'
LIMIT 10;

Denormalization Trade-offs

Normalized	Denormalized
Write efficiency	Read efficiency
No data duplication	Duplicated data
Complex joins	Simpler queries
Consistency guaranteed	Consistency burden

Network Optimization

Connection Pooling

HTTP/2 and HTTP/3 Benefits

Feature	HTTP/1.1	HTTP/2	HTTP/3
Multiplexing	❌	✅	✅
Header compression	❌	✅	✅
Parallel requests	Multiple connections	Single connection	Single connection
QUIC (UDP)	❌	❌	✅

Compression

Code-Level Optimization

Algorithm Complexity

The inefficient version uses nested loops causing O(n²) complexity. The optimized version sorts the data first (O(n log n)), then iterates once, improving overall complexity to O(n log n).

Avoiding N+1 Queries

The N+1 query problem occurs when fetching users then making a separate query for each user's posts. The improved version uses a single JOIN query to fetch all data at once, reducing database round trips from n+1 to 1.

Async I/O

The blocking approach waits for each HTTP request sequentially. The async approach uses aiohttp with asyncio.gather to fetch all URLs concurrently, dramatically reducing total wait time from the sum of all latencies to the slowest single request.

Monitoring and Profiling

Application Performance Monitoring (APM)

Popular Tools

Category	Tools
APM	New Relic, Datadog, AWS X-Ray
Profiling	Pyroscope, async-profiler, Chrome DevTools
Logging	ELK Stack, Loki, CloudWatch
Metrics	Prometheus + Grafana

What to Remember for Interviews

Measure first: Optimize based on data, not assumptions
Cache aggressively: Memory is cheaper than compute
Database tuning: Index wisely, avoid N+1, consider denormalization
Network efficiency: Use HTTP/2+, compress, keep connections alive
P99 latency: Some slow requests affect all users

✅

Practice: Profile your own web app. What's the P99 latency? Where are the bottlenecks? What's the cache hit rate? Start measuring before optimizing.

LLM Observability and Evaluation: Traces, Quality Metrics, and Experiments

Scaling Strategies: Horizontal vs Vertical, Sharding, and Auto-Scaling