Skip to content

Sizing Async Connection Pools for Throughput

You set limit_per_host=100 because it felt safe, and throughput still plateaus at 200 requests/second while half those connections sit idle — or you left the default of 10 and watched p99 latency explode the moment traffic doubled. Pool size is not a guess: it is a number you derive from your target throughput and your measured per-request latency. This guide walks through measuring latency, applying Little's Law to compute the concurrency you actually need, setting aiohttp.TCPConnector / httpx.Limits to match, gating callers with a Semaphore so excess load waits instead of stalling, and load-testing to confirm pool wait time drops to zero.

Prerequisites

Step 1 — Measure Per-Request Latency

You cannot size a pool without knowing how long one request takes against the real upstream, warm pool included. Measure with a single connection so you isolate service time from queueing.

import asyncio
import statistics
import aiohttp

async def measure(url: str, n: int = 100) -> float:
    connector = aiohttp.TCPConnector(limit=1)  # serialize: pure service time
    timings: list[float] = []
    async with aiohttp.ClientSession(connector=connector) as session:
        loop = asyncio.get_running_loop()
        for _ in range(n):
            t0 = loop.time()
            async with session.get(url) as resp:
                await resp.read()
            timings.append(loop.time() - t0)
    p50 = statistics.median(timings)
    p95 = statistics.quantiles(timings, n=20)[18]
    print(f"p50={p50*1000:.1f}ms  p95={p95*1000:.1f}ms")
    return p50

# asyncio.run(measure("https://example.com/"))

Verify: you get a stable p50 (e.g. 40 ms). Use p50 for the average in Little's Law; keep p95 in mind for headroom.

Step 2 — Compute Required Concurrency With Little's Law

Little's Law states that the average number of in-flight requests equals throughput times latency: concurrency = throughput × latency. Invert it to find the connections you need to sustain a target rate.

1
2
3
4
5
6
7
8
def required_concurrency(target_rps: float, latency_s: float) -> int:
    """Little's Law: L = lambda * W.  Connections needed in flight."""
    return round(target_rps * latency_s)

# Target 500 req/s against a 40 ms upstream:
need = required_concurrency(500, 0.040)   # -> 20
headroom = round(need * 1.3)              # -> 26, absorbs latency jitter
print(f"need={need}  with headroom={headroom}")

Verify: the math is intuitive — 500 req/s at 40 ms means 20 requests are always in flight. Add ~30% headroom so a latency spike does not instantly serialize traffic.

Step 3 — Set Connector Limits to Match

Apply the computed number to the connector. limit_per_host caps a single upstream; the global limit covers all hosts combined. In httpx, max_connections is the global cap and max_keepalive_connections controls how many idle sockets are retained.

import aiohttp
import httpx

# aiohttp: 26 per host, global headroom for ~3 hosts.
connector = aiohttp.TCPConnector(
    limit=80,
    limit_per_host=26,
    keepalive_timeout=30,
    ttl_dns_cache=300,
)

# httpx equivalent.
limits = httpx.Limits(
    max_connections=80,
    max_keepalive_connections=26,
    keepalive_expiry=30.0,
)
client = httpx.AsyncClient(limits=limits)

Verify: limit_per_host matches your headroom figure (26), and the global limit is at least the sum of per-host needs across destinations.

Step 4 — Guard With a Semaphore

The connector caps connections, but unbounded callers still pile onto the waiter queue. An asyncio.Semaphore sized to the same concurrency makes callers wait at a single, observable choke point instead.

import asyncio
import aiohttp

class SizedClient:
    def __init__(self, session: aiohttp.ClientSession, concurrency: int) -> None:
        self._session = session
        self._gate = asyncio.Semaphore(concurrency)

    async def fetch(self, url: str) -> bytes:
        async with self._gate:                  # wait for a slot, don't stall the pool
            async with asyncio.timeout(10):
                async with self._session.get(url) as resp:
                    resp.raise_for_status()
                    return await resp.read()

Verify: the semaphore value equals (or slightly trails) limit_per_host, so the gate, not the connector, is where excess demand queues. See synchronization primitives for the semantics.

Step 5 — Load-Test and Watch Pool Wait Time

Drive the client at and above your target rate while recording the time each request spends acquiring a slot. Pool wait time is the metric that tells you whether the size is right.

import asyncio
import time

async def load_test(client, url: str, total: int) -> None:
    waits: list[float] = []

    async def one() -> None:
        t0 = time.perf_counter()
        async with client._gate:
            waits.append(time.perf_counter() - t0)
        await client.fetch(url)

    start = time.perf_counter()
    async with asyncio.TaskGroup() as tg:
        for _ in range(total):
            tg.create_task(one())
    elapsed = time.perf_counter() - start
    print(f"rps={total/elapsed:.0f}  avg_wait={sum(waits)/len(waits)*1000:.2f}ms")

Verify: at the target rate, average wait is near 0 ms and measured rps reaches the target. If wait climbs while rps is below target, the pool is undersized — go back to Step 3 and raise the limits together.

Step 6 — Tune Keep-Alive Idle Timeout

A correctly sized pool still throws errors if idle sockets outlive the server's keep-alive window. Set the client idle timeout below the server's so you never reuse a reaped connection.

1
2
3
4
5
6
7
8
import aiohttp

# Server (nginx) keepalive_timeout = 75s; stay comfortably under it.
connector = aiohttp.TCPConnector(
    limit_per_host=26,
    keepalive_timeout=60,   # < server's 75s
    ttl_dns_cache=300,
)

Verify: under steady load the keep-alive reuse ratio exceeds 0.9 and ServerDisconnectedError counts stay near zero, especially during low-traffic dips. Pair acquisition with explicit timeouts and deadlines so a stalled dial cannot block indefinitely.

Verification

A correctly sized pool exhibits all three signals together:

  • Pool wait time → ~0 ms at the target throughput. Any sustained acquisition wait below target rate means the pool is too small.
  • Throughput plateaus at the limit, not below it. Increasing offered load past the target should keep rps flat at the target (graceful saturation), not collapse it.
  • No ServerDisconnectedError storms. A handful from genuine network events is fine; a correlated burst during quiet periods means the keep-alive timeout exceeds the server's.

Pitfalls & Edge Cases

  • Over-sizing wastes file descriptors and upstream connections. A pool far larger than throughput × latency holds idle sockets that consume FDs on both ends (watch ulimit -n) and may push the server past its own connection cap. Bigger is not safer.
  • Under-sizing serializes traffic. A pool smaller than required concurrency queues excess requests on the connector waiter, inflating tail latency even though the upstream is healthy. The tell is high pool wait time with connections pinned at the limit.
  • limit_per_host vs global limit confusion. A generous global limit does nothing if limit_per_host is the real bottleneck for a single hot upstream — and vice versa. Size both; the effective cap for one host is the smaller of the two.
  • Keep-alive longer than the server's idle timeout causes resets. If keepalive_timeout exceeds the server's, you will reuse sockets it has already closed and see ConnectionResetError. Always set the client value below the server's.
  • DNS TTL outliving reality. ttl_dns_cache longer than the record's real TTL keeps the pool dialing a retired IP after an upstream failover. Match the cache TTL to the DNS record, or rebuild the connector on repeated connect failures.

Frequently Asked Questions

How do I calculate the right connection pool size?

Apply Little's Law: required concurrency equals target throughput times measured per-request latency. For 500 req/s at 40 ms latency you need about 20 connections in flight; add roughly 30 percent headroom and set limit_per_host to about 26.

What is the difference between limit and limit_per_host?

limit_per_host caps simultaneous connections to a single upstream, while the global limit caps connections across all hosts combined. For one hot host the effective cap is the smaller of the two, so both must be sized correctly.

How do I know my pool is too small?

Under load you will see high pool wait time with connections pinned at the limit and throughput stuck below target. Raise the connector limit and the Semaphore together, then re-run the load test until wait time returns to near zero.

Throughput versus pool size Throughput climbs roughly linearly with pool size up to the Little's Law point (concurrency = throughput times latency), then plateaus; sizes below that point are throughput-limited, sizes above waste idle connections. Throughput vs Pool Size Pool size (limit_per_host) Throughput (req/s) target rps (upstream limit) L = rps x latency undersized: throughput-limited oversized: idle conns, wasted FDs