Connection Pooling & Keep-Alive for Async Clients¶

A connection pool is the single most leveraged tuning surface in any high-throughput async HTTP client. Open a fresh TCP connection per request and you pay the three-way handshake plus a TLS negotiation on every call, flood the kernel with TIME_WAIT sockets, and exhaust file descriptors long before you saturate bandwidth. Reuse a bounded set of keep-alive connections and the same hardware sustains an order of magnitude more requests per second. This guide covers how async pools actually behave — the connector model behind aiohttp.TCPConnector and httpx.Limits, idle-timeout and DNS-cache tuning, per-host fan-out caps, what happens when a coroutine waits for a free slot, and how to drain and recycle connections without leaking them. The companion sizing async connection pools for throughput reference turns these principles into concrete numbers.

Architectural Principles¶

The pool is a bounded resource, not an optimization toggle. Every limit (limit, limit_per_host, max_connections, max_keepalive_connections) is a backpressure knob. Sizing them is a capacity-planning decision, not a default to accept.
Reuse is the default; opening a connection is the exception. A warm pool amortizes TCP and TLS setup across thousands of requests. Any code path that creates a session per request silently discards this and is almost always a bug.
Acquisition can block, and blocking is good. When the pool is full, a request should wait for a slot rather than open an uncapped connection. Controlled waiting is how the pool propagates backpressure upstream.
Keep-alive is a negotiation with the server, not a local setting. Your idle timeout must be shorter than the server's, or you will hand dead sockets to live requests.
Pools must be explicitly drained on shutdown. Connectors and sessions own kernel resources; relying on garbage collection to close them produces leaked sockets and Unclosed connection warnings.

How Pool Acquisition Interacts With the Event Loop¶

The connector lives inside the Network I/O & Protocol Handling execution model: it is a piece of state the event loop schedules around, not a separate thread. When a coroutine issues a request, the client asks the connector for a connection. If an idle keep-alive socket exists for that host, it is returned immediately and the request proceeds without yielding for setup. If none is free and the per-host or global cap has not been reached, the connector initiates a new TCP connect — itself an await point that registers the socket with the loop's selector (epoll/kqueue) and suspends the coroutine until the connect completes.

If the cap has been reached, the connector parks the coroutine on an internal waiter (an asyncio.Future or condition). The coroutine is suspended and the loop runs other ready tasks; when an in-flight request returns its connection to the pool, one waiter is woken. This is the critical scheduling property: pool exhaustion does not spin the CPU or raise immediately — it serializes excess demand into a wait queue that the loop services in FIFO order. Understanding this is what separates "the pool is too small" from "the upstream is slow"; both manifest as climbing latency, but only the first shows connections-in-use pinned at the limit.

Because acquisition order is tied to the loop's ready queue, an unbounded asyncio.gather() over ten thousand URLs does not create ten thousand connections — it creates ten thousand coroutines, almost all of which immediately park on the connector waiter. That backlog is invisible until you measure pool wait time, which is why every pattern below pairs the connector limit with an explicit gate.

Pattern Catalogue¶

One Shared Pooled Session Per Process¶

The foundational pattern: create exactly one session at startup, share it across every coroutine, and close it at shutdown. This is what reusing aiohttp ClientSession across requests is about — a per-request session throws away the entire pool and warm-connection benefit. Use it whenever the process talks to a stable set of hosts, which is nearly always.

import asyncio
import aiohttp

class HttpService:
    """One connector + session for the whole process lifetime."""

    def __init__(self) -> None:
        self._session: aiohttp.ClientSession | None = None

    async def start(self) -> None:
        connector = aiohttp.TCPConnector(
            limit=100,            # global cap on simultaneous connections
            limit_per_host=20,    # per-host fan-out cap
            ttl_dns_cache=300,    # cache DNS resolutions for 5 min
            keepalive_timeout=30, # drop idle sockets after 30s
        )
        timeout = aiohttp.ClientTimeout(total=10, connect=2)
        self._session = aiohttp.ClientSession(
            connector=connector, timeout=timeout
        )

    async def get_json(self, url: str) -> dict:
        assert self._session is not None
        async with self._session.get(url) as resp:
            resp.raise_for_status()
            return await resp.json()

    async def close(self) -> None:
        if self._session is not None:
            await self._session.close()
            self._session = None

Diagnostic Hook: log a warning if HttpService is instantiated more than once per process — duplicate sessions mean duplicate pools and silent FD inflation. Track len(session.connector._conns) periodically to confirm the pool is being reused, not rebuilt.

Per-Host Connection Caps¶

A single global limit lets one slow host monopolize the entire pool, starving every other destination. limit_per_host (aiohttp) or routing distinct hosts to distinct pools bounds each upstream independently. Use this whenever a process fans out to multiple backends with different latency profiles.

import aiohttp

# Global limit 200, but no single host may use more than 25 slots.
connector = aiohttp.TCPConnector(limit=200, limit_per_host=25)

# httpx expresses the global cap and keep-alive cap separately;
# per-host isolation is achieved by giving each host its own client.
import httpx

limits = httpx.Limits(
    max_connections=200,
    max_keepalive_connections=50,
    keepalive_expiry=30.0,
)
client = httpx.AsyncClient(limits=limits)

Diagnostic Hook: emit a per-host gauge of active connections. If one host sits at limit_per_host while total utilization is low, that host is your bottleneck — not the pool size.

Keep-Alive Idle Timeout Tuning¶

Keep-alive reuse only pays off if the idle socket survives until the next request and the server has not already closed it. Set your idle timeout (keepalive_timeout / keepalive_expiry) below the server's idle timeout. Common servers default to 5s (nginx keepalive_timeout), 60s, or 75s; reusing a socket the server has reaped yields a ConnectionResetError or ServerDisconnectedError on the next write.

import aiohttp

# Server (nginx) closes idle conns at 75s. Stay comfortably under it
# so we never reuse a socket the server has already torn down.
connector = aiohttp.TCPConnector(keepalive_timeout=60)

# For bursty traffic with long gaps, a short idle timeout avoids
# stockpiling sockets the server will reap anyway.
bursty = aiohttp.TCPConnector(keepalive_timeout=5)

Diagnostic Hook: count ServerDisconnectedError/ConnectionResetError rates. A steady trickle correlated with low traffic periods means your keep-alive timeout exceeds the server's — lower it.

Bounded Acquisition With asyncio.Semaphore¶

The connector caps connections, but a asyncio.Semaphore from the synchronization primitives toolkit caps in-flight requests at the application layer, giving you a single explicit number to reason about and a place to attach metrics. Pair it with timeouts and deadlines so a stalled acquisition cannot block forever. Use it whenever callers might submit far more work than the pool can absorb.

import asyncio
import aiohttp

class GatedClient:
    def __init__(self, session: aiohttp.ClientSession, max_inflight: int) -> None:
        self._session = session
        self._gate = asyncio.Semaphore(max_inflight)

    async def fetch(self, url: str) -> bytes:
        # Wait for a slot instead of piling onto the connector queue.
        async with self._gate:
            async with asyncio.timeout(10):
                async with self._session.get(url) as resp:
                    resp.raise_for_status()
                    return await resp.read()

Diagnostic Hook: export the semaphore's pending waiter count. Sustained waiters mean callers consistently outpace capacity — either raise the limit and pool together, or shed load upstream.

Connection Recycling & Health Checks¶

Long-lived pools accumulate sockets that the network silently broke (NAT timeouts, load-balancer recycling, transient resets). Bounding connection lifetime forces periodic reconnection, and a cheap pre-flight retry handles the stale-socket race. Use recycling for connections to hosts behind aggressive load balancers or in environments with idle NAT reaping.

import asyncio
import aiohttp

async def fetch_with_recycle(
    session: aiohttp.ClientSession, url: str, retries: int = 1
) -> bytes:
    """Retry once on a stale keep-alive socket; the second attempt
    forces the connector to dial a fresh connection."""
    for attempt in range(retries + 1):
        try:
            async with session.get(url) as resp:
                resp.raise_for_status()
                return await resp.read()
        except (aiohttp.ServerDisconnectedError,
                aiohttp.ClientConnectionError):
            if attempt == retries:
                raise
            await asyncio.sleep(0)  # yield; next get() dials anew
    raise RuntimeError("unreachable")

Diagnostic Hook: track the ratio of stale-socket retries to total requests. A rising ratio signals your keep-alive timeout drifting past the server's, or an LB recycling connections under you.

Resource Boundaries¶

Pool sizing is a queueing problem, and Little's Law gives the back-of-envelope answer: the average number of connections in use equals throughput times per-request latency. To sustain 500 requests/second against a host where each request takes 40 ms, you need 500 * 0.040 = 20 concurrent connections in flight on average. Provision limit_per_host slightly above that (say 25–30) to absorb latency jitter, and set the global limit to cover the sum across all hosts plus headroom.

Three boundary rules follow:

Under-sizing serializes. If the pool is smaller than required concurrency, excess requests queue on the connector waiter and tail latency climbs even though the upstream is healthy. The symptom is high pool wait time with connections pinned at the limit.
Over-sizing wastes and endangers. Each connection is a file descriptor on both ends plus server-side memory. Pools far larger than throughput * latency hold idle sockets that consume FDs (watch ulimit -n) and may push the server past its own connection limits.
Backpressure must be explicit. When demand exceeds capacity, the pool should make callers wait (semaphore + bounded acquisition timeout) and ideally surface a fast failure (HTTP 503 / circuit breaker) rather than let unbounded coroutines accumulate. A pool with no backpressure converts an upstream slowdown into an unbounded memory leak of parked coroutines.

Integrated Production Example¶

A pooled client combining a shared session, a semaphore gate, per-acquisition timeouts, stale-socket recycling, and graceful drain on shutdown.

import asyncio
import logging
import time
import aiohttp

logger = logging.getLogger("pooled_client")


class PooledClient:
    def __init__(
        self,
        *,
        limit: int = 100,
        limit_per_host: int = 20,
        max_inflight: int = 100,
        keepalive_timeout: float = 30.0,
        request_timeout: float = 10.0,
    ) -> None:
        self._connector = aiohttp.TCPConnector(
            limit=limit,
            limit_per_host=limit_per_host,
            keepalive_timeout=keepalive_timeout,
            ttl_dns_cache=300,
        )
        self._session = aiohttp.ClientSession(
            connector=self._connector,
            timeout=aiohttp.ClientTimeout(total=request_timeout, connect=2),
        )
        self._gate = asyncio.Semaphore(max_inflight)
        self._wait_total = 0.0
        self._wait_count = 0

    async def fetch(self, url: str, retries: int = 1) -> bytes:
        t0 = time.perf_counter()
        async with self._gate:                       # bounded acquisition
            waited = time.perf_counter() - t0
            self._wait_total += waited
            self._wait_count += 1
            for attempt in range(retries + 1):
                try:
                    async with asyncio.timeout(self._session.timeout.total):
                        async with self._session.get(url) as resp:
                            resp.raise_for_status()
                            return await resp.read()
                except (aiohttp.ServerDisconnectedError,
                        aiohttp.ClientConnectionError) as exc:
                    if attempt == retries:
                        logger.error("connection failed for %s: %s", url, exc)
                        raise
                    await asyncio.sleep(0)            # recycle: dial fresh
                except TimeoutError:
                    logger.error("timeout fetching %s", url)
                    raise
        raise RuntimeError("unreachable")

    @property
    def avg_wait_ms(self) -> float:
        if not self._wait_count:
            return 0.0
        return (self._wait_total / self._wait_count) * 1000

    @property
    def in_use(self) -> int:
        # Acquired connections currently checked out of the pool.
        return len(self._connector._acquired)

    async def aclose(self) -> None:
        # Drain: close the session, then wait for the underlying
        # transports to actually shut down before exit.
        await self._session.close()
        await asyncio.sleep(0.25)  # let SSL transports finish closing


async def main() -> None:
    client = PooledClient(limit_per_host=25, max_inflight=50)
    try:
        urls = ["https://example.com/"] * 200
        async with asyncio.TaskGroup() as tg:
            for u in urls:
                tg.create_task(client.fetch(u))
        logger.info("avg pool wait=%.1fms in_use=%d",
                    client.avg_wait_ms, client.in_use)
    finally:
        await client.aclose()


if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)
    asyncio.run(main())

Diagnostic Hook: the avg_wait_ms and in_use properties are your primary pool-health signals. Export them as metrics; a wait climbing above a few milliseconds while in_use sits at the limit is the definitive signature of an undersized pool.

Diagnostic Hook — pool health metrics

Instrument three numbers and alert on them:

Pool wait time (time spent in acquisition): healthy is ~0 ms; sustained > 5–10 ms means the pool or semaphore is undersized for offered load.
Connections in use vs limit: a ratio pinned at 1.0 confirms saturation; chronically near 0.0 means the pool is over-provisioned and wasting file descriptors.
Keep-alive reuse ratio (reused connections ÷ total requests): healthy clients run > 0.9; a low ratio means connections are being created per request — check session reuse and that keep-alive timeout sits below the server's.

Failure Modes¶

Failure mode	Root cause	Detection	Fix
Pool exhaustion deadlock	A pooled request internally awaits another pooled request while holding its slot; with the pool full, neither can proceed	Latency climbs to the acquisition timeout; `in_use` pinned at limit with no throughput	Never nest a pool-acquiring call inside a held slot; size the pool above the recursion depth, or use separate pools per tier
Stale keep-alive connection	Local keep-alive timeout exceeds the server's idle timeout, so a reaped socket is reused	`ConnectionResetError` / `ServerDisconnectedError` correlated with quiet periods	Set `keepalive_timeout` below the server's; add a single-retry recycle on disconnect errors
Connector / session leak	Session never `close()`d (created per request, or no shutdown drain)	`Unclosed client session` / `Unclosed connector` warnings; FD count grows monotonically (`ls /proc/PID/fd`)	One session per process via `async with` or an explicit `aclose()` in a `finally`/lifespan hook
DNS cache staleness	`ttl_dns_cache` longer than the record's real TTL, so the pool keeps dialing a retired IP	Connection failures after an upstream IP change while DNS already updated	Lower `ttl_dns_cache` to match the record TTL; force a connector rebuild on repeated connect failures

Frequently Asked Questions¶

Why does pool acquisition block instead of opening a new connection?

When the per-host or global cap is reached, the connector parks the requesting coroutine on an internal waiter and the event loop runs other tasks. A returned connection wakes one waiter in FIFO order. This intentional blocking is how the pool propagates backpressure rather than exhausting file descriptors with uncapped connections.

How should I set the keep-alive idle timeout?

Set your client idle timeout (keepalive_timeout in aiohttp, keepalive_expiry in httpx) below the server's idle timeout. If the server reaps a socket before you reuse it, the next request raises ConnectionResetError or ServerDisconnectedError. Common server defaults are 5s, 60s, or 75s, so a client value of 30s is usually safe.

What size should a connection pool be?

Use Little's Law: average connections in use equals throughput times per-request latency. For 500 requests/second at 40 ms latency you need about 20 concurrent connections, so set limit_per_host to roughly 25-30 to absorb jitter and the global limit to cover all hosts plus headroom.

Why do I see ConnectionResetError only during quiet periods?

During low traffic, connections sit idle long enough for the server to close them while your client still considers them alive. Reusing such a socket raises ConnectionResetError or ServerDisconnectedError. Lower your keep-alive timeout below the server's and add a single-retry recycle that forces a fresh dial on disconnect errors.

How do I close a pool cleanly on shutdown?

Call session.close() (or the client's aclose()) explicitly in a finally block or framework lifespan hook, then briefly await so SSL transports finish shutting down. Relying on garbage collection produces Unclosed connection warnings and leaked file descriptors. One session per process closed once is the correct lifecycle.

Network I/O & Protocol Handling — up to the overview for the full async networking mental model.
Async HTTP Clients & Servers — where pooling fits into broader client and server architecture.
Reusing aiohttp ClientSession across requests — the per-process session pattern that backs every pool.
Sizing async connection pools for throughput — turn Little's Law into concrete limit values.
Synchronization primitives — the Semaphore that bounds acquisition.