Back to Insights
AI & Systems7 min read

From FastAPI to Microservices: Handling 10K Concurrent Requests

Our journey building production FastAPI microservices—reducing API response time by 40% and achieving predictable failure modes.

AP

Anshuman Parmar

August 2025

From FastAPI to Microservices: Handling 10K Concurrent Requests

Introduction

FastAPI has become my go-to framework for building Python backends. Its async-first design, automatic OpenAPI docs, and type safety make it perfect for high-performance APIs.

This article shares how we built FastAPI microservices handling 10K+ concurrent requests, reduced response times by 40%, and designed for predictable failure modes.

Why FastAPI?

Before FastAPI, we used Flask. The migration was driven by:

AspectFlaskFastAPI
Async supportBolted onNative
Type checkingOptionalBuilt-in
API docsManualAutomatic
Performance~1000 RPS~3000 RPS
ValidationExternalPydantic
------------------------
Type checkingOptionalBuilt-in
API docsManualAutomatic
Performance~1000 RPS~3000 RPS
ValidationExternalPydantic
Async supportBolted onNative
API docsManualAutomatic
Performance~1000 RPS~3000 RPS
ValidationExternalPydantic
Type checkingOptionalBuilt-in
Performance~1000 RPS~3000 RPS
ValidationExternalPydantic
API docsManualAutomatic
ValidationExternalPydantic
Performance~1000 RPS~3000 RPS

The performance difference alone justified the migration.

Architecture: From Monolith to Microservices

Before: The Monolith

text
1┌────────────────────────────────────────┐
2│ Flask Monolith │
3│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
4│ │ Auth │ │ User │ │ Task │ │ Data │ │
5│ └──────┘ └──────┘ └──────┘ └──────┘ │
6└────────────────────────────────────────┘

Problems:

  • Single point of failure
  • Can't scale components independently
  • Deployments affect everything

After: Microservices

text
1┌─────────────┐
2│ API Gateway │
3└──────┬──────┘
4
5┌──────┴──────┬──────────────┬──────────────┐
6│ │ │ │
7▼ ▼ ▼ ▼
8┌─────┐ ┌──────┐ ┌──────┐ ┌──────┐
9│Auth │ │ User │ │ Task │ │ Data │
10│ API │ │ API │ │ API │ │ API │
11└─────┘ └──────┘ └──────┘ └──────┘

Each service:

  • Scales independently
  • Has its own database
  • Can be deployed separately
  • Fails in isolation

Building High-Performance FastAPI Services

Async All The Way

The key to FastAPI performance is embracing async:

python
1from fastapi import FastAPI
2from httpx import AsyncClient
3from sqlalchemy.ext.asyncio import AsyncSession
4
5app = FastAPI()
6
7# Bad: Blocking database call
8@app.get("/users/{user_id}")
9def get_user(user_id: int, db: Session = Depends(get_db)):
10 return db.query(User).filter(User.id == user_id).first()
11
12# Good: Async database call
13@app.get("/users/{user_id}")
14async def get_user(user_id: int, db: AsyncSession = Depends(get_async_db)):
15 result = await db.execute(select(User).where(User.id == user_id))
16 return result.scalar_one_or_none()

Connection Pooling

Database connections are expensive. Pool them:

python
1from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
2from sqlalchemy.orm import sessionmaker
3
4engine = create_async_engine(
5 DATABASE_URL,
6 pool_size=20,
7 max_overflow=30,
8 pool_timeout=30,
9 pool_recycle=1800,
10)
11
12AsyncSessionLocal = sessionmaker(
13 engine,
14 class_=AsyncSession,
15 expire_on_commit=False
16)

Response Caching

Not everything needs to hit the database:

python
1from fastapi_cache import FastAPICache
2from fastapi_cache.backends.redis import RedisBackend
3from fastapi_cache.decorator import cache
4
5@app.on_event("startup")
6async def startup():
7 redis = aioredis.from_url("redis://localhost")
8 FastAPICache.init(RedisBackend(redis), prefix="api-cache")
9
10@app.get("/products/{product_id}")
11@cache(expire=300) # Cache for 5 minutes
12async def get_product(product_id: int):
13 # This result will be cached
14 return await fetch_product(product_id)

Handling 10K Concurrent Requests

Load Testing Results

Using Locust for load testing:

python
1# locustfile.py
2from locust import HttpUser, task, between
3
4class APIUser(HttpUser):
5 wait_time = between(0.1, 0.5)
6
7 @task(3)
8 def get_tasks(self):
9 self.client.get("/api/v1/tasks")
10
11 @task(1)
12 def create_task(self):
13 self.client.post("/api/v1/tasks", json={
14 "title": "Test task",
15 "priority": "high"
16 })

Results at 10K concurrent users:

MetricBefore OptimizationAfter Optimization
RPS2,5004,200
P50 Latency180ms95ms
P95 Latency850ms280ms
P99 Latency2.1s520ms
Error Rate2.3%0.1%
------------------------------------------------
P50 Latency180ms95ms
P95 Latency850ms280ms
P99 Latency2.1s520ms
Error Rate2.3%0.1%
RPS2,5004,200
P95 Latency850ms280ms
P99 Latency2.1s520ms
Error Rate2.3%0.1%
P50 Latency180ms95ms
P99 Latency2.1s520ms
Error Rate2.3%0.1%
P95 Latency850ms280ms
Error Rate2.3%0.1%
P99 Latency2.1s520ms

Key Optimizations

  1. Async database driver (asyncpg instead of psycopg2)
  2. Connection pooling (20 base, 30 overflow)
  3. Redis caching for read-heavy endpoints
  4. Pagination for list endpoints
  5. Query optimization (proper indexes, eager loading)

Predictable Failure Modes

Systems will fail. The goal is predictable, graceful failure.

Structured Error Responses

python
1from fastapi import HTTPException
2from pydantic import BaseModel
3
4class ErrorResponse(BaseModel):
5 error_code: str
6 message: str
7 details: dict | None = None
8
9@app.exception_handler(HTTPException)
10async def http_exception_handler(request, exc):
11 return JSONResponse(
12 status_code=exc.status_code,
13 content=ErrorResponse(
14 error_code=f"ERR_{exc.status_code}",
15 message=exc.detail,
16 ).dict()
17 )

Circuit Breakers

python
1from circuitbreaker import circuit
2
3@circuit(failure_threshold=5, recovery_timeout=30)
4async def call_external_service(data: dict):
5 async with httpx.AsyncClient() as client:
6 response = await client.post(EXTERNAL_URL, json=data)
7 response.raise_for_status()
8 return response.json()

Health Checks

python
1@app.get("/health")
2async def health_check():
3 checks = {
4 "database": await check_database(),
5 "redis": await check_redis(),
6 "external_api": await check_external_api(),
7 }
8
9 status = "healthy" if all(checks.values()) else "degraded"
10 return {"status": status, "checks": checks}

Graceful Degradation

python
1@app.get("/recommendations/{user_id}")
2async def get_recommendations(user_id: int):
3 try:
4 # Try personalized recommendations
5 return await ml_service.get_personalized(user_id)
6 except ServiceUnavailable:
7 # Fall back to popular items
8 return await get_popular_items()
9 except Exception:
10 # Ultimate fallback
11 return {"recommendations": [], "fallback": True}

Observability

Structured Logging

python
1import structlog
2
3logger = structlog.get_logger()
4
5@app.middleware("http")
6async def logging_middleware(request: Request, call_next):
7 request_id = str(uuid.uuid4())
8
9 with structlog.contextvars.bound_contextvars(
10 request_id=request_id,
11 path=request.url.path,
12 method=request.method,
13 ):
14 logger.info("request_started")
15
16 start = time.perf_counter()
17 response = await call_next(request)
18 duration = time.perf_counter() - start
19
20 logger.info(
21 "request_completed",
22 status_code=response.status_code,
23 duration_ms=round(duration * 1000, 2)
24 )
25
26 return response

Metrics

python
1from prometheus_fastapi_instrumentator import Instrumentator
2
3Instrumentator().instrument(app).expose(app)

This gives you automatic metrics for:

  • Request count by endpoint
  • Request latency histograms
  • Response status codes
  • In-flight requests

Results

After the migration and optimizations:

  • 40% reduction in average response time
  • 10K+ concurrent requests handled reliably
  • 99.5% deployment success rate with CI/CD
  • Zero-downtime deployments with rolling updates
  • Predictable failure modes with circuit breakers

Key Takeaways

  1. Go async: FastAPI's async support is its superpower—use it everywhere
  2. Pool connections: Database connections are expensive; pool aggressively
  3. Cache strategically: Redis caching can eliminate most database load
  4. Design for failure: Circuit breakers and graceful degradation are essential
  5. Observe everything: You can't optimize what you can't measure

FastAPI makes building high-performance Python APIs accessible. The key is understanding async patterns and designing for scale from the start.


Questions about FastAPI or microservices? Connect with me on LinkedIn or GitHub.

AP

WRITTEN BY

Anshuman Parmar

Senior Full Stack Developer specializing in AI systems, browser automation, and scalable web applications. Building production-grade solutions that deliver measurable business impact.

Enjoyed this article?

Explore more insights on AI, automation, and system design.

View All Insights