Sizing Async Connection Pools for Throughput¶
You set limit_per_host=100 because it felt safe, and throughput still plateaus at 200 requests/second while half those connections sit idle — or you left the default of 10 and watched p99 latency explode the moment traffic doubled. Pool size is not a guess: it is a number you derive from your target throughput and your measured per-request latency. This guide walks through measuring latency, applying Little's Law to compute the concurrency you actually need, setting aiohttp.TCPConnector / httpx.Limits to match, gating callers with a Semaphore so excess load waits instead of stalling, and load-testing to confirm pool wait time drops to zero.
Prerequisites¶
- Python 3.11+ (for
asyncio.timeout()andasyncio.TaskGroup). aiohttporhttpx:pip install aiohttp(orpip install httpx).- Familiarity with the connector and keep-alive model from Connection Pooling & Keep-Alive and the broader Network I/O & Protocol Handling execution model.
Step 1 — Measure Per-Request Latency¶
You cannot size a pool without knowing how long one request takes against the real upstream, warm pool included. Measure with a single connection so you isolate service time from queueing.
Verify: you get a stable p50 (e.g. 40 ms). Use p50 for the average in Little's Law; keep p95 in mind for headroom.
Step 2 — Compute Required Concurrency With Little's Law¶
Little's Law states that the average number of in-flight requests equals throughput times latency: concurrency = throughput × latency. Invert it to find the connections you need to sustain a target rate.
Verify: the math is intuitive — 500 req/s at 40 ms means 20 requests are always in flight. Add ~30% headroom so a latency spike does not instantly serialize traffic.
Step 3 — Set Connector Limits to Match¶
Apply the computed number to the connector. limit_per_host caps a single upstream; the global limit covers all hosts combined. In httpx, max_connections is the global cap and max_keepalive_connections controls how many idle sockets are retained.
Verify: limit_per_host matches your headroom figure (26), and the global limit is at least the sum of per-host needs across destinations.
Step 4 — Guard With a Semaphore¶
The connector caps connections, but unbounded callers still pile onto the waiter queue. An asyncio.Semaphore sized to the same concurrency makes callers wait at a single, observable choke point instead.
Verify: the semaphore value equals (or slightly trails) limit_per_host, so the gate, not the connector, is where excess demand queues. See synchronization primitives for the semantics.
Step 5 — Load-Test and Watch Pool Wait Time¶
Drive the client at and above your target rate while recording the time each request spends acquiring a slot. Pool wait time is the metric that tells you whether the size is right.
Verify: at the target rate, average wait is near 0 ms and measured rps reaches the target. If wait climbs while rps is below target, the pool is undersized — go back to Step 3 and raise the limits together.
Step 6 — Tune Keep-Alive Idle Timeout¶
A correctly sized pool still throws errors if idle sockets outlive the server's keep-alive window. Set the client idle timeout below the server's so you never reuse a reaped connection.
Verify: under steady load the keep-alive reuse ratio exceeds 0.9 and ServerDisconnectedError counts stay near zero, especially during low-traffic dips. Pair acquisition with explicit timeouts and deadlines so a stalled dial cannot block indefinitely.
Verification¶
A correctly sized pool exhibits all three signals together:
- Pool wait time → ~0 ms at the target throughput. Any sustained acquisition wait below target rate means the pool is too small.
- Throughput plateaus at the limit, not below it. Increasing offered load past the target should keep rps flat at the target (graceful saturation), not collapse it.
- No
ServerDisconnectedErrorstorms. A handful from genuine network events is fine; a correlated burst during quiet periods means the keep-alive timeout exceeds the server's.
Pitfalls & Edge Cases¶
- Over-sizing wastes file descriptors and upstream connections. A pool far larger than
throughput × latencyholds idle sockets that consume FDs on both ends (watchulimit -n) and may push the server past its own connection cap. Bigger is not safer. - Under-sizing serializes traffic. A pool smaller than required concurrency queues excess requests on the connector waiter, inflating tail latency even though the upstream is healthy. The tell is high pool wait time with connections pinned at the limit.
limit_per_hostvs globallimitconfusion. A generous globallimitdoes nothing iflimit_per_hostis the real bottleneck for a single hot upstream — and vice versa. Size both; the effective cap for one host is the smaller of the two.- Keep-alive longer than the server's idle timeout causes resets. If
keepalive_timeoutexceeds the server's, you will reuse sockets it has already closed and seeConnectionResetError. Always set the client value below the server's. - DNS TTL outliving reality.
ttl_dns_cachelonger than the record's real TTL keeps the pool dialing a retired IP after an upstream failover. Match the cache TTL to the DNS record, or rebuild the connector on repeated connect failures.
Frequently Asked Questions¶
How do I calculate the right connection pool size?
Apply Little's Law: required concurrency equals target throughput times measured per-request latency. For 500 req/s at 40 ms latency you need about 20 connections in flight; add roughly 30 percent headroom and set limit_per_host to about 26.
What is the difference between limit and limit_per_host?
limit_per_host caps simultaneous connections to a single upstream, while the global limit caps connections across all hosts combined. For one hot host the effective cap is the smaller of the two, so both must be sized correctly.
How do I know my pool is too small?
Under load you will see high pool wait time with connections pinned at the limit and throughput stuck below target. Raise the connector limit and the Semaphore together, then re-run the load test until wait time returns to near zero.
Related¶
- Connection Pooling & Keep-Alive — up to the overview covering the connector model, recycling, and failure modes.
- Network I/O & Protocol Handling — the parent overview for async networking architecture.
- Synchronization primitives —
Semaphoresemantics for the acquisition gate.