Async HTTP Clients & Servers: High-Performance Patterns in Python¶
Architecting production async HTTP services in Python takes more than swapping requests for httpx or aiohttp. The libraries are easy; the operational discipline is not. Every byte you send and receive crosses a single-threaded event loop, so a client that opens a connection per request, a server that buffers a multi-gigabyte body, or a fan-out that spawns ten thousand uncapped coroutines will degrade the entire process, not just one request. This guide is the operational playbook: how to reuse one client session across the whole process, how to attach a timeout to every request rather than the outermost call, how to bound concurrency so you saturate bandwidth instead of file descriptors, how to stream large bodies without buffering, how to build a minimal async server, and how to drain everything cleanly on shutdown. The two companion references — reusing aiohttp ClientSession across requests and streaming large responses with httpx — drill into the two patterns that cause the most production incidents.
Both client libraries are third-party; install whichever you use before running the snippets below:
Architectural Principles¶
- One client object, whole process. A
ClientSession/AsyncClientowns the connection pool, DNS cache, and TLS session cache. Create it once at startup and share it; constructing one per request throws all of that away and leaks connectors. This is the single highest-leverage decision and the subject of reusing aiohttp ClientSession across requests. - Every request carries its own deadline. A timeout on the outermost coroutine does not protect against a nested call that hangs on a half-open socket. Attach an explicit timeout to each request and let it propagate, as covered under timeouts and deadlines.
- Concurrency is a resource you cap, not a number you maximize. Unbounded
asyncio.gather()over a URL list creates a coroutine per URL, all racing for the pool. Gate them with an asyncio.Semaphore so in-flight work matches what the pool and the upstream can absorb. - Large bodies stream; they do not buffer.
response.read()or.json()pulls the whole body into RAM. For anything unbounded, iterate the body in chunks — the technique in streaming large responses with httpx. - Failure is retried with backoff, and shutdown is explicit. Transient 5xx and connection errors get bounded, jittered retries via retry and backoff strategies; on exit, the session is closed so the pool drains instead of emitting
Unclosedwarnings.
Event Loop Integration & I/O Multiplexing¶
The asyncio event loop delegates socket readiness to OS-level selectors (epoll on Linux, kqueue on macOS/BSD). When a coroutine awaits an HTTP operation, it yields control back to the loop, which registers the underlying socket for read/write readiness. The loop resumes the coroutine only when the selector signals I/O completion. HTTP work fits this model exactly, which is why a single thread can hold thousands of in-flight requests — see the Network I/O & Protocol Handling overview for the full execution model.
That same model is why blocking is catastrophic. Synchronous requests.get(), time.sleep(), or a multi-megabyte json.loads() inside an async def does not yield — it freezes the loop, and every other connection stalls until it returns. CPU-bound work (large JSON parsing, decompression, hashing) belongs on a thread via asyncio.to_thread(), and any library that is not natively async belongs behind an executor.
Task Scheduling Overhead vs Raw Throughput¶
Async I/O eliminates thread context-switching, but coroutine scheduling adds microsecond-level latency per await. For high-frequency services, excessive await chains or an unbounded gather() can saturate the loop's ready queue: thousands of coroutines wake, each does a few microseconds of work, and the scheduler thrashes. Async is an I/O multiplexer, not a CPU parallelizer — the wins come from overlapping waits, not from running Python faster.
Diagnostic Hook: Loop Latency Tracing¶
Enable debug mode and sample loop.time() to measure how long the loop goes between iterations; a large gap is a blocking call hiding inside an async def.
Pattern Catalogue¶
Shared ClientSession / AsyncClient¶
The pool, DNS cache, and TLS session cache live on the client object. Build one per process and pass it to every caller. The anti-pattern — async with aiohttp.ClientSession() as s: inside the request handler — opens and tears down a connector on every call, so no connection is ever reused and you pay a full handshake each time.
When to use: always, for any client that issues more than one request. The lifecycle and connector tuning are detailed in reusing aiohttp ClientSession across requests.
Per-Request Timeouts¶
A timeout wrapped around gather() only bounds the aggregate; a single stalled socket inside it can still consume the whole budget. Attach a timeout to each request. With httpx, pass timeout= per call or set a default on the client; with aiohttp, use aiohttp.ClientTimeout or wrap the call in asyncio.timeout().
When to use: every outbound request. Distinguish connect, read, and pool timeouts — a generous read timeout with a tight connect timeout fails fast on dead hosts while tolerating slow-but-alive ones. See timeouts and deadlines for deadline propagation.
Bounded Concurrency with a Semaphore¶
Fan-out without a cap is the most common throughput bug: asyncio.gather(*(fetch(u) for u in urls)) over 10,000 URLs creates 10,000 coroutines that all contend for a pool of 100 connections, inflating latency and risking FD exhaustion. Gate the fan-out with a semaphore sized to the pool, so the number of in-flight requests matches what the connector can actually serve.
When to use: any fan-out over more than a handful of targets. Size the semaphore to the pool limit, not to the URL count. The primitive itself is covered in synchronization primitives.
Streaming Responses¶
For a download whose size you do not control, await resp.read() buffers the entire body before you see a byte. Stream it instead: open the response and iterate the body in fixed chunks, keeping memory flat regardless of total size.
When to use: any body that can be large or unbounded — file downloads, log tails, server-sent events. The read-timeout-per-chunk and backpressure details are in streaming large responses with httpx.
A Minimal Async Server¶
You do not need a framework to serve HTTP asynchronously. aiohttp ships a server; the pattern that matters is streaming the response body and registering cleanup so the loop drains on shutdown.
When to use: lightweight services, internal endpoints, or when you want full control over the response lifecycle. For routing-heavy APIs, an ASGI framework on top of the same loop is more ergonomic.
Resource Boundaries¶
Every async HTTP client is bounded by three limits that must agree with each other and with the OS: the connector's limit/limit_per_host, your concurrency gate, and the kernel's file-descriptor ceiling (ulimit -n). If the semaphore allows 500 in-flight requests but the pool caps at 100, the extra 400 coroutines park on the connector waiter — controlled backpressure, but invisible unless you measure pool wait time. If the pool exceeds the FD limit, you get Too many open files under load instead of a clean queue.
The right relationship is: semaphore ≤ pool limit ≤ a safe fraction of ulimit -n. Size the pool to sustained concurrency, not peak, and let the semaphore absorb bursts. The full sizing methodology — idle timeouts, per-host caps, DNS-cache TTL, and recycling — lives in connection pooling and keep-alive. Keep-alive idle timeouts on the client must be shorter than the server's, or the pool will hand a freshly-closed socket to a live request and surface intermittent connection-reset errors.
Integrated Production Example: Concurrent Fetcher¶
This fetcher combines every pattern above: one shared AsyncClient, a semaphore gate, a per-request timeout, jittered retries on transient failures, and clean shutdown. It returns successes and failures separately so a single bad URL never sinks the batch.
Diagnostic Hook: Track peak_inflight against your semaphore cap. If it pins at the cap for sustained periods, the gate — not the upstream — is your bottleneck, and you should raise both the semaphore and the pool together. If it sits well below the cap while latency climbs, the upstream is slow and adding concurrency will not help. Also watch retry counts: a rising retry rate is an early warning of upstream degradation, well before raw error rates spike.
Diagnostic callout: what to watch in production
- Connection reuse ratio — new connections opened ÷ requests issued. Healthy keep-alive keeps this near zero; a ratio near 1.0 means session reuse is broken (a session-per-request bug).
- Pool wait time — time a request spends parked on the connector before getting a connection. Climbing wait time with in-use connections pinned at the limit means the pool is undersized, not the upstream slow.
ss -tnpsocket states — a growing pile ofTIME-WAITindicates connections are being closed and reopened (no reuse);CLOSE-WAITindicates bodies are not being fully consumed/closed.PYTHONASYNCIODEBUG=1surfaces unawaited coroutines and slow callbacks that block the loop.
Failure Modes¶
| Failure mode | Root cause | Detection | Fix |
|---|---|---|---|
| No connection reuse, high latency | New session/connector per request | Reuse ratio ≈ 1.0; TIME-WAIT pile-up in ss -tnp |
Share one session for the process (guide) |
Too many open files under load |
Pool/semaphore exceeds ulimit -n |
OSError: [Errno 24]; FD count near limit |
Cap pool ≤ safe fraction of ulimit -n; raise limit |
| OOM on large download | read()/.json() buffers whole body |
RSS scales with response size | Stream in chunks (guide) |
| Requests hang indefinitely | Timeout only on outer call, not per request | Coroutines stuck in recv; p99 → ∞ |
Per-request timeout / asyncio.timeout() |
| Intermittent connection reset | Client keep-alive idle > server idle | Sporadic ConnectionResetError on reused sockets |
Set client idle timeout below server's |
Unclosed client session warning |
Session not closed on shutdown | Warning at exit; leaked connectors | Close session in finally/lifespan |
| Loop freezes under load | Sync call or heavy parse in async def |
Loop-lag monitor flags multi-ms gaps | Move blocking work to asyncio.to_thread() |
Frequently Asked Questions¶
When should I choose httpx over aiohttp for async HTTP clients?
httpx is preferred for modern HTTP/2 support, strict standards compliance, and a synchronous/async API parity that makes testing easier. aiohttp excels server-side — it ships its own server, has first-class WebSocket support, and a mature plugin ecosystem. For a pure client, either works; pick httpx if you want HTTP/2 by default, aiohttp if you also run the server in the same stack.
How do I prevent file descriptor exhaustion under high async concurrency?
Cap three limits so they agree: an asyncio.Semaphore on in-flight requests, the connector pool limit (limit/max_connections), and the OS ulimit -n. The relationship should be semaphore ≤ pool ≤ a safe fraction of the FD limit. Monitor open-FD count and pool wait time; when wait time climbs with connections pinned at the limit, raise the pool and semaphore together.
Why is a per-request timeout better than wrapping gather() in one timeout?
A timeout around gather() bounds the aggregate, but a single request stalled on a half-open socket can consume that entire budget while the others finish instantly. A per-request deadline (via the client's timeout= or asyncio.timeout()) fails the slow request on its own and lets the rest proceed, which is the behavior you almost always want. See timeouts and deadlines.
How do I stream a large response without loading it into memory?
Use the streaming API instead of read()/.json(). With httpx, async with client.stream("GET", url) as resp: then iterate resp.aiter_bytes(); with aiohttp, iterate resp.content. Memory stays flat at one chunk regardless of total size. The chunk-timeout and backpressure details are in streaming large responses with httpx.
What is the right way to shut down an async HTTP client?
Close the session/client explicitly — await session.close() (aiohttp) or await client.aclose() (httpx) — inside a finally block or your framework's lifespan/shutdown hook. This drains the pool and closes sockets; relying on garbage collection produces Unclosed client session warnings and leaked connectors. Combine with graceful shutdown that lets in-flight requests finish before closing.
Related¶
- Network I/O & Protocol Handling — up to the overview for the full async networking mental model.
- Reusing aiohttp ClientSession Across Requests — fix the most common throughput bug by sharing one session.
- Streaming Large Responses with httpx — keep memory flat on unbounded downloads.
- Connection Pooling & Keep-Alive — size pools, idle timeouts, and per-host caps.
- Retry and Backoff Strategies — bounded, jittered retries for transient HTTP failures.
- Synchronization Primitives — the Semaphore behind bounded concurrency.