From FastAPI to Microservices: Handling 10K Concurrent Requests

Introduction

FastAPI has become my go-to framework for building Python backends. Its async-first design, automatic OpenAPI docs, and type safety make it perfect for high-performance APIs.

This article shares how we built FastAPI microservices handling 10K+ concurrent requests, reduced response times by 40%, and designed for predictable failure modes.

Why FastAPI?

Before FastAPI, we used Flask. The migration was driven by:

Aspect	Flask	FastAPI
Async support	Bolted on	Native
Type checking	Optional	Built-in
API docs	Manual	Automatic
Performance	~1000 RPS	~3000 RPS
Validation	External	Pydantic

--------	-------	---------
Type checking	Optional	Built-in
API docs	Manual	Automatic
Performance	~1000 RPS	~3000 RPS
Validation	External	Pydantic

Async support	Bolted on	Native
API docs	Manual	Automatic
Performance	~1000 RPS	~3000 RPS
Validation	External	Pydantic

Type checking	Optional	Built-in
Performance	~1000 RPS	~3000 RPS
Validation	External	Pydantic

API docs	Manual	Automatic
Validation	External	Pydantic

Performance	~1000 RPS	~3000 RPS

The performance difference alone justified the migration.

Architecture: From Monolith to Microservices

Before: The Monolith

text

1┌────────────────────────────────────────┐
2│            Flask Monolith              │
3│  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐  │
4│  │ Auth │ │ User │ │ Task │ │ Data │  │
5│  └──────┘ └──────┘ └──────┘ └──────┘  │
6└────────────────────────────────────────┘

Problems:

Single point of failure
Can't scale components independently
Deployments affect everything

After: Microservices

text

1┌─────────────┐
2│ API Gateway │
3└──────┬──────┘
4       │
5┌──────┴──────┬──────────────┬──────────────┐
6│             │              │              │
7▼             ▼              ▼              ▼
8┌─────┐   ┌──────┐    ┌──────┐    ┌──────┐
9│Auth │   │ User │    │ Task │    │ Data │
10│ API │   │ API  │    │ API  │    │ API  │
11└─────┘   └──────┘    └──────┘    └──────┘

Each service:

Scales independently
Has its own database
Can be deployed separately
Fails in isolation

Building High-Performance FastAPI Services

Async All The Way

The key to FastAPI performance is embracing async:

python

1from fastapi import FastAPI
2from httpx import AsyncClient
3from sqlalchemy.ext.asyncio import AsyncSession
4
5app = FastAPI()
6
7# Bad: Blocking database call
8@app.get("/users/{user_id}")
9def get_user(user_id: int, db: Session = Depends(get_db)):
10    return db.query(User).filter(User.id == user_id).first()
11
12# Good: Async database call
13@app.get("/users/{user_id}")
14async def get_user(user_id: int, db: AsyncSession = Depends(get_async_db)):
15    result = await db.execute(select(User).where(User.id == user_id))
16    return result.scalar_one_or_none()

Connection Pooling

Database connections are expensive. Pool them:

python

1from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
2from sqlalchemy.orm import sessionmaker
3
4engine = create_async_engine(
5    DATABASE_URL,
6    pool_size=20,
7    max_overflow=30,
8    pool_timeout=30,
9    pool_recycle=1800,
10)
11
12AsyncSessionLocal = sessionmaker(
13    engine,
14    class_=AsyncSession,
15    expire_on_commit=False
16)

Response Caching

Not everything needs to hit the database:

python

1from fastapi_cache import FastAPICache
2from fastapi_cache.backends.redis import RedisBackend
3from fastapi_cache.decorator import cache
4
5@app.on_event("startup")
6async def startup():
7    redis = aioredis.from_url("redis://localhost")
8    FastAPICache.init(RedisBackend(redis), prefix="api-cache")
9
10@app.get("/products/{product_id}")
11@cache(expire=300)  # Cache for 5 minutes
12async def get_product(product_id: int):
13    # This result will be cached
14    return await fetch_product(product_id)

Handling 10K Concurrent Requests

Load Testing Results

Using Locust for load testing:

python

1# locustfile.py
2from locust import HttpUser, task, between
3
4class APIUser(HttpUser):
5    wait_time = between(0.1, 0.5)
6
7    @task(3)
8    def get_tasks(self):
9        self.client.get("/api/v1/tasks")
10
11    @task(1)
12    def create_task(self):
13        self.client.post("/api/v1/tasks", json={
14            "title": "Test task",
15            "priority": "high"
16        })

Results at 10K concurrent users:

Metric	Before Optimization	After Optimization
RPS	2,500	4,200
P50 Latency	180ms	95ms
P95 Latency	850ms	280ms
P99 Latency	2.1s	520ms
Error Rate	2.3%	0.1%

--------	--------------------	--------------------
P50 Latency	180ms	95ms
P95 Latency	850ms	280ms
P99 Latency	2.1s	520ms
Error Rate	2.3%	0.1%

RPS	2,500	4,200
P95 Latency	850ms	280ms
P99 Latency	2.1s	520ms
Error Rate	2.3%	0.1%

P50 Latency	180ms	95ms
P99 Latency	2.1s	520ms
Error Rate	2.3%	0.1%

P95 Latency	850ms	280ms
Error Rate	2.3%	0.1%

P99 Latency	2.1s	520ms

Key Optimizations

Async database driver (asyncpg instead of psycopg2)
Connection pooling (20 base, 30 overflow)
Redis caching for read-heavy endpoints
Pagination for list endpoints
Query optimization (proper indexes, eager loading)

Predictable Failure Modes

Systems will fail. The goal is predictable, graceful failure.

Structured Error Responses

python

1from fastapi import HTTPException
2from pydantic import BaseModel
3
4class ErrorResponse(BaseModel):
5    error_code: str
6    message: str
7    details: dict | None = None
8
9@app.exception_handler(HTTPException)
10async def http_exception_handler(request, exc):
11    return JSONResponse(
12        status_code=exc.status_code,
13        content=ErrorResponse(
14            error_code=f"ERR_{exc.status_code}",
15            message=exc.detail,
16        ).dict()
17    )

Circuit Breakers

python

1from circuitbreaker import circuit
2
3@circuit(failure_threshold=5, recovery_timeout=30)
4async def call_external_service(data: dict):
5    async with httpx.AsyncClient() as client:
6        response = await client.post(EXTERNAL_URL, json=data)
7        response.raise_for_status()
8        return response.json()

Health Checks

python

1@app.get("/health")
2async def health_check():
3    checks = {
4        "database": await check_database(),
5        "redis": await check_redis(),
6        "external_api": await check_external_api(),
7    }
8
9    status = "healthy" if all(checks.values()) else "degraded"
10    return {"status": status, "checks": checks}

Graceful Degradation

python

1@app.get("/recommendations/{user_id}")
2async def get_recommendations(user_id: int):
3    try:
4        # Try personalized recommendations
5        return await ml_service.get_personalized(user_id)
6    except ServiceUnavailable:
7        # Fall back to popular items
8        return await get_popular_items()
9    except Exception:
10        # Ultimate fallback
11        return {"recommendations": [], "fallback": True}

Observability

Structured Logging

python

1import structlog
2
3logger = structlog.get_logger()
4
5@app.middleware("http")
6async def logging_middleware(request: Request, call_next):
7    request_id = str(uuid.uuid4())
8
9    with structlog.contextvars.bound_contextvars(
10        request_id=request_id,
11        path=request.url.path,
12        method=request.method,
13    ):
14        logger.info("request_started")
15
16        start = time.perf_counter()
17        response = await call_next(request)
18        duration = time.perf_counter() - start
19
20        logger.info(
21            "request_completed",
22            status_code=response.status_code,
23            duration_ms=round(duration * 1000, 2)
24        )
25
26        return response

Metrics

python

1from prometheus_fastapi_instrumentator import Instrumentator
2
3Instrumentator().instrument(app).expose(app)

This gives you automatic metrics for:

Request count by endpoint
Request latency histograms
Response status codes
In-flight requests

Results

After the migration and optimizations:

40% reduction in average response time
10K+ concurrent requests handled reliably
99.5% deployment success rate with CI/CD
Zero-downtime deployments with rolling updates
Predictable failure modes with circuit breakers

Key Takeaways

Go async: FastAPI's async support is its superpower—use it everywhere
Pool connections: Database connections are expensive; pool aggressively
Cache strategically: Redis caching can eliminate most database load
Design for failure: Circuit breakers and graceful degradation are essential
Observe everything: You can't optimize what you can't measure

FastAPI makes building high-performance Python APIs accessible. The key is understanding async patterns and designing for scale from the start.

Questions about FastAPI or microservices? Connect with me on LinkedIn or GitHub.

Introduction

Why FastAPI?

Architecture: From Monolith to Microservices

Before: The Monolith

After: Microservices

Building High-Performance FastAPI Services

Async All The Way

Connection Pooling

Response Caching

Handling 10K Concurrent Requests

Load Testing Results

Key Optimizations

Predictable Failure Modes

Structured Error Responses

Circuit Breakers

Health Checks

Graceful Degradation

Observability

Structured Logging

Metrics

Results

Key Takeaways

Enjoyed this article?