Skip to content

Network I/O & Protocol Handling: Architecture & Async Concurrency in Python

Network I/O is the workload asyncio was built for. A single Python process can hold tens of thousands of open sockets, multiplex them over one OS-level readiness notifier, and keep the CPU busy parsing frames instead of parking threads on blocking recv() calls. The architectural payoff is the C10k story made routine: thread-per-connection servers collapse under context-switch overhead and per-thread stack memory somewhere past a few thousand connections, while an event-loop server scales with file-descriptor limits and parse cost, not thread count.

That payoff is conditional. The event loop is a single thread executing a cooperative scheduler, so one blocking syscall, one CPU-heavy parse, or one unbounded buffer freezes every connection at once. Designing async network systems is therefore mostly about three things: keeping the loop non-blocking, framing protocols correctly over a byte stream, and propagating backpressure so that a fast producer cannot exhaust memory when a slow consumer falls behind. This reference walks the full stack — transports and protocols, the StreamReader/StreamWriter layer, HTTP clients and servers, real-time streams, and the connection pools and database drivers underneath — with the operational failure modes and diagnostics that matter in production.

Key architectural considerations:

  • The transport/protocol/streams layering: where the kernel selector ends, where your framing logic begins, and how buffers flow between them.
  • Explicit trade-offs between throughput, tail latency, and memory under network-bound load, and how connection reuse amortizes handshake cost.
  • Diagnostic workflows for blocking syscalls, connection leaks, event-loop starvation, and pool saturation before they breach SLAs.

This overview sits alongside three other references you will reach for constantly: asyncio fundamentals & the event loop for the scheduler mechanics underneath everything here, concurrent execution & worker patterns for offloading the CPU-bound parsing that network protocols generate, and resilience, cancellation & error handling for the timeout and retry discipline every remote call needs.


The conceptual model: transports, protocols, and streams

asyncio's networking stack is layered, and confusion about the layers is the root of most production bugs. At the bottom, the event loop registers file descriptors with a platform selector — epoll on Linux, kqueue on macOS/BSD, IOCP on Windows — and yields until a readiness event fires. A transport wraps a single connection's FD and owns the low-level read/write buffering plus flow-control thresholds. A protocol is your callback object: the loop calls data_received(data) when bytes arrive and connection_lost(exc) when the peer hangs up. This transport/protocol pair is the high-performance, callback-driven core.

On top of that sits the streams layer — asyncio.open_connection() and asyncio.start_server() hand you a StreamReader/StreamWriter pair that turns the callback model into await reader.read() / writer.write() coroutines. Streams are easier to reason about and the right default for most code; transports/protocols win when you need zero-copy framing or to avoid the stream layer's extra buffer copies. Higher still, libraries like aiohttp, httpx, and websockets build full application protocols on transports, adding connection pools, TLS/ALPN negotiation, and protocol state machines.

The figure below shows the request path from your application code down to the wire and the buffers that sit at each boundary.

The async network I/O stack Application requests flow through a client session and connection pool, into transports managed by the event loop selector, and out to the network; readiness events flow back up. Async network I/O stack Application code await client.get(url) Client / session aiohttp / httpx / websockets Connection pool / keep-alive reuse, limits, eviction Transport + protocol read/write buffers, flow control Event-loop selector + socket epoll / kqueue / IOCP → network request bytes out readiness bytes in

The table below is the working vocabulary for the rest of this reference — the primitives you reach for, what they own, and when each is the right tool.

Primitive / API Layer What it owns Use when
loop.sock_recv / sock_sendall raw socket A non-blocking FD, manual buffering Custom binary protocols, SO_REUSEPORT sharding, zero-copy framing
asyncio.Transport / asyncio.Protocol transport/protocol Read/write buffers, pause_reading() flow control High-throughput gateways, callback-driven framing
StreamReader / StreamWriter streams Coroutine read/write, readuntil/readexactly Default for clients and servers; line- or length-framed protocols
aiohttp.ClientSession HTTP client Connection pool, cookies, keep-alive, TLS Long-lived service-to-service HTTP, WebSocket client
httpx.AsyncClient HTTP client HTTP/2, pool limits, granular timeouts Modern clients needing HTTP/2 or streaming downloads
websockets / aiohttp WS real-time Frame masking, ping/pong, close handshake Full-duplex channels, live telemetry, pub/sub fan-out
TCPConnector / Limits pool max_connections, keep-alive cap, DNS cache Tuning concurrency and reuse for a host or process
asyncpg / asyncmy pools DB driver Prepared statements, connection pool, binary protocol Async access to PostgreSQL/MySQL without blocking the loop

Transports, protocols & streams

Network protocols rarely deliver a complete message in a single recv(). TCP is a byte stream: partial reads, coalesced packets, and fragmented frames are guaranteed under load. Correct framing therefore needs an explicit state machine that tracks message boundaries, enforces backpressure, and bounds buffer growth. The streams layer gives you readexactly(n) and readuntil(separator) so length-prefixed and delimiter-framed protocols stay readable; the transport/protocol layer gives you data_received plus pause_reading()/resume_reading() when you need to feed a custom parser without copying through stream buffers.

The critical operational detail is flow control. When you call writer.write(), bytes go into the transport's send buffer; if the peer reads slowly, that buffer grows without bound unless you cooperate. The contract is await writer.drain() after writes — it suspends when the buffer exceeds the high-water mark and resumes when it drains below the low-water mark. Skipping drain() is the most common way to turn a slow consumer into an out-of-memory kill.

import asyncio


class FrameBuffer:
    """Length-prefixed framing with explicit backpressure via pause_reading()."""

    def __init__(self, transport: asyncio.Transport, max_queue: int = 1024):
        self._transport = transport
        self._queue: asyncio.Queue[bytes] = asyncio.Queue(maxsize=max_queue)

    async def ingest(self, frames) -> None:
        async for frame in frames:
            try:
                self._queue.put_nowait(frame)
            except asyncio.QueueFull:
                # Stop pulling bytes off the socket until the consumer catches up.
                self._transport.pause_reading()
                await self._queue.put(frame)
                self._transport.resume_reading()

    async def next_frame(self, timeout: float = 10.0) -> bytes:
        async with asyncio.timeout(timeout):
            return await self._queue.get()

For raw-socket cases — custom binary protocols, SO_REUSEPORT load sharding across processes — loop.sock_recv() and loop.sock_sendall() give the lowest latency at the cost of manual state management. Most teams should stay on streams unless a profiler proves the stream layer's buffer copies are the bottleneck.

Diagnostic Hook: Enable asyncio.get_running_loop().set_debug(True) and watch for "Executing took N seconds" warnings driven by slow_callback_duration; instrument data_received with a byte-count histogram to catch framing drift and buffer starvation. At the syscall level, strace -e trace=epoll_ctl,recvfrom,sendto -p <pid> reveals blocking calls and EAGAIN spin loops.


HTTP clients & servers

HTTP is the dominant async network workload, and almost every production HTTP bug traces to one mistake: creating a fresh client per request. A new aiohttp.ClientSession or httpx.AsyncClient builds a new connection pool, a new DNS cache, and a new TLS session cache each time, so every request pays DNS resolution, a TCP handshake, and a full TLS negotiation. Create one session at startup, share it across all requests, and close it on shutdown — keep-alive then reuses warm connections and your tail latency drops sharply. The details of session reuse, pool configuration, and HTTP/2 streaming live in async HTTP clients & servers.

aiohttp and httpx make different trade-offs. aiohttp is a mature client-and-server framework with a battle-tested ClientSession; httpx offers a requests-like API, first-class HTTP/2, and clean streaming downloads. Both expose pool limits and granular timeouts — and you should set every timeout explicitly, because the defaults are generous enough to let a stalled peer pin a connection for minutes.

Architecture Multiplexing Head-of-line risk Config complexity
HTTP/1.1 + keep-alive No (one request per connection at a time) High (per connection) Low
HTTP/2 Yes (stream-level over one TCP conn) Medium (TCP-layer HoL) High (flow control, HPACK)
HTTP/3 (QUIC) Yes (independent UDP streams) None Very high (lib/kernel support)

The snippet below is the canonical pattern: a bounded concurrent fetcher over a single shared session, with a semaphore capping in-flight requests, explicit per-request timeouts, and a TaskGroup so a failure cancels siblings cleanly. Pair it with the retry-and-backoff strategies reference to add jittered retries around fetch_one.

# pip install httpx
import asyncio
import httpx


async def fetch_one(
    client: httpx.AsyncClient,
    sem: asyncio.Semaphore,
    url: str,
) -> tuple[str, int, int]:
    """Fetch a single URL under the shared concurrency limit."""
    async with sem:                       # cap concurrent in-flight requests
        async with asyncio.timeout(15):   # per-request deadline, not just socket timeout
            resp = await client.get(url, follow_redirects=True)
            resp.raise_for_status()
            body = await resp.aread()
            return url, resp.status_code, len(body)


async def bounded_fetch(
    urls: list[str],
    concurrency: int = 20,
) -> list[tuple[str, int, int]]:
    limits = httpx.Limits(
        max_connections=concurrency,            # pool ceiling matches the semaphore
        max_keepalive_connections=concurrency,  # keep them all warm for reuse
        keepalive_expiry=30.0,
    )
    timeout = httpx.Timeout(connect=5.0, read=10.0, write=5.0, pool=5.0)
    sem = asyncio.Semaphore(concurrency)
    results: list[tuple[str, int, int]] = []
    errors: list[tuple[str, str]] = []

    # One session for the whole batch: shared pool, DNS cache, TLS session cache.
    async with httpx.AsyncClient(
        http2=True, limits=limits, timeout=timeout
    ) as client:
        async def worker(u: str) -> None:
            try:
                results.append(await fetch_one(client, sem, u))
            except (httpx.HTTPError, asyncio.TimeoutError) as exc:
                errors.append((u, type(exc).__name__))

        async with asyncio.TaskGroup() as tg:
            for u in urls:
                tg.create_task(worker(u))

    if errors:
        print(f"{len(errors)} of {len(urls)} requests failed: {errors[:5]}")
    return results


if __name__ == "__main__":
    sample = [f"https://example.com/?n={i}" for i in range(100)]
    rows = asyncio.run(bounded_fetch(sample, concurrency=20))
    print(f"fetched {len(rows)} responses")

The key invariant: the semaphore limit, the pool max_connections, and the max_keepalive_connections should agree. If the semaphore allows more concurrency than the pool can serve, excess tasks block on pool acquisition until they hit the pool timeout — a saturation failure that looks like a hung application rather than an obvious error.

Diagnostic Hook: Track httpx/aiohttp pool metrics — connections in use vs. idle vs. pending pool acquisitions. A rising pending-acquisition count with flat throughput means the pool is the bottleneck; either raise max_connections or lower the semaphore. Confirm HTTP/2 multiplexing with curl -v --http2 and verify the SETTINGS frame exchange and stream-ID allocation under load.


Real-time streams: WebSockets & server-sent events

Persistent, low-latency channels power live telemetry, collaborative editing, and pub/sub fan-out. A WebSocket upgrades an HTTP connection into a full-duplex framed channel, which brings obligations the request/response model never had: frame masking on the client side, ping/pong heartbeats to detect dead peers, an orderly close handshake, and — most importantly — per-connection backpressure on the send path. The lifecycle, heartbeat tuning, and slow-consumer handling are covered in depth in WebSocket & real-time streams.

The dominant failure mode at scale is the slow consumer during broadcast. If you fan a message out to thousands of clients with a bare await ws.send(msg) loop, one client on a congested link blocks the whole broadcast, and if you instead give each client an unbounded outbound queue, a slow client's queue grows until the process is OOM-killed. The fix is a bounded per-connection outbound queue: when it fills, you drop the slowest non-critical messages or disconnect the laggard rather than letting memory grow without limit.

import asyncio
import contextlib


async def ws_session(ws, ping_interval: float = 20.0) -> None:
    """Bounded outbound queue + heartbeat; drops a peer that cannot keep up."""
    outbound: asyncio.Queue[str] = asyncio.Queue(maxsize=256)

    async def pump() -> None:
        while True:
            msg = await outbound.get()
            await ws.send(msg)          # send applies its own flow control

    async def heartbeat() -> None:
        while True:
            await asyncio.sleep(ping_interval)
            await ws.ping()             # raises if the peer is gone

    try:
        async with asyncio.TaskGroup() as tg:
            tg.create_task(pump())
            tg.create_task(heartbeat())
    except* Exception as eg:
        # Either task failing (dead peer, send error) tears down the session.
        with contextlib.suppress(Exception):
            await ws.close()
        raise eg

Where proxies or corporate firewalls block the WebSocket upgrade, server-sent events (SSE) give a simpler unidirectional fallback with built-in reconnection semantics. SSE rides plain HTTP, so it traverses restrictive infrastructure that mangles the Upgrade header.

Diagnostic Hook: Track per-connection outbound queue depth and the count of dropped/disconnected slow consumers. Watch established-connection counts with ss -tan state established and correlate against application-level heartbeat latency; a divergence between OS-established sockets and application-tracked sessions reveals zombie connections leaking FDs.


Connection pooling & async database drivers

Every layer above rests on connection reuse. A new TCP connection costs DNS resolution, a three-way handshake, TLS negotiation, and TCP slow-start before the first useful byte moves — easily tens of milliseconds before any payload. Pooling amortizes that across requests, which is why the new connection pooling & keep-alive reference treats pool sizing as a first-class capacity decision rather than a tuning afterthought.

The same discipline governs databases. Synchronous drivers (psycopg2, PyMySQL) issue blocking syscalls that freeze the entire event loop, so async services need native async drivers — asyncpg for PostgreSQL, asyncmy for MySQL — each with its own connection pool. The new async database drivers reference covers pool acquisition, prepared-statement caching, and the cancellation hazards specific to in-flight queries. The non-negotiable rule: pool size is a hard concurrency ceiling. A pool of 20 connections means at most 20 concurrent queries; everything else queues on acquisition, and an acquisition without a timeout is an invisible stall.

The table maps the pooling primitives to use cases and their trade-offs.

Primitive Use case Trade-off
asyncio.Semaphore Cap concurrency around any resource Simple, but holds no connection state; pair with a real pool
aiohttp.TCPConnector(limit=…) Per-process and per-host HTTP connection caps Shared across a ClientSession; misuse links unrelated request flows
httpx.Limits HTTP pool ceiling + keep-alive cap Pending acquisitions block until pool timeout if undersized
asyncpg.create_pool(min_size, max_size) Bounded DB connection pool max_size caps query concurrency; saturation surfaces as acquire latency
Health-checked pool with drain Long-lived pools over flaky links Extra round-trips for liveness; lowest stale-connection risk

A bounded acquire pattern keeps saturation observable instead of silent. Always wrap pool acquisition in a deadline so a saturated pool fails fast rather than hanging a request indefinitely — combine this with the timeouts & deadlines reference for end-to-end deadline propagation.

# pip install asyncpg
import asyncio
import asyncpg


async def run_queries(dsn: str, queries: list[str], pool_size: int = 10) -> None:
    pool = await asyncpg.create_pool(dsn, min_size=2, max_size=pool_size)
    try:
        async def one(sql: str) -> None:
            # Bounded acquire: a saturated pool fails fast instead of hanging.
            async with asyncio.timeout(3.0):
                conn = await pool.acquire()
            try:
                async with asyncio.timeout(5.0):
                    await conn.fetch(sql)
            finally:
                await pool.release(conn)

        async with asyncio.TaskGroup() as tg:
            for q in queries:
                tg.create_task(one(q))
    finally:
        await pool.close()

Diagnostic Hook: Export pool gauges — pool.get_size(), idle count, and acquire-wait latency (the time between acquire() and getting a connection). Rising acquire latency with flat query latency is the textbook signal of an undersized pool. At the OS level, lsof -i -p <pid> and ss -s track TIME_WAIT accumulation, CLOSE_WAIT leaks (unclosed connections), and approach toward FD exhaustion (EMFILE).


Diagnostics & tuning workflow

Async network bottlenecks rarely show up as CPU spikes; they appear as event-loop lag, growing acquire queues, and socket-buffer exhaustion. Work the problem top-down with this numbered workflow:

  1. Confirm the loop is not blocked. Set loop.slow_callback_duration low (e.g. 0.05) and run with set_debug(True) in a canary. Any callback exceeding the threshold logs a warning — that is a blocking syscall or a CPU-heavy parse masquerading as a coroutine. Offload such work via CPU-bound task offloading.
  2. Find the saturated resource. Compare in-flight requests against pool max_connections and against the semaphore limit. If pending acquisitions climb while throughput is flat, the pool — not the network — is the ceiling.
  3. Check connection reuse. Count new connections per second vs. requests per second. A near-1:1 ratio means keep-alive is not working: you are probably creating a session per request or the keep-alive expiry is too short.
  4. Inspect FD and socket state. Use ss -s and lsof to watch TIME_WAIT/CLOSE_WAIT and total FDs against ulimit -n. Leaks here precede EMFILE outages.
  5. Profile what is left. Use py-spy dump / py-spy top (sampling, production-safe) to see where loop time goes, and reserve tracemalloc for canaries to find buffer leaks on long-lived connections.
  6. Then tune the runtime. Only after the above is clean, swap in uvloop for a typical 2–4x throughput gain on its libuv bindings — and re-benchmark, since it changes some syscall timing.
import asyncio
import logging
import time

logger = logging.getLogger("netio.diag")


async def timed(coro, name: str, threshold_ms: float = 50.0):
    """Wrap a network coroutine to log when it exceeds a latency budget."""
    start = time.perf_counter()
    try:
        return await coro
    finally:
        elapsed = (time.perf_counter() - start) * 1000
        if elapsed > threshold_ms:
            logger.warning("op %s took %.1fms (budget %.0fms)", name, elapsed, threshold_ms)


def loop_health() -> None:
    """Snapshot live tasks; long-pending I/O tasks point at saturation or stalls."""
    for task in asyncio.all_tasks():
        if not task.done():
            logger.info("pending: %s", task.get_name())
Tool Use case Overhead Production safe
loop.slow_callback_duration Detect blocking in the loop thread Low Yes
Pool / acquire-latency gauges Find resource saturation Low Yes
py-spy / austin CPU and async profiling Medium Sampling mode only
tracemalloc Socket/buffer leak hunting High Canary only
strace / ss / lsof Syscall and FD-level inspection Medium Short bursts

Common pitfalls

Anti-pattern Impact Mitigation
New ClientSession/AsyncClient per request Pays DNS + TCP + TLS every call; no keep-alive; tail latency balloons Create one session at startup, share it, close on shutdown
Blocking I/O inside async def (requests, time.sleep, sync DB driver) Freezes the loop; every connection stalls at once Use async clients/drivers; offload unavoidable blocking with asyncio.to_thread()
writer.write() without await writer.drain() Send buffer grows unbounded against a slow peer → OOM Always await drain() after writes; honor flow control
Unbounded outbound/broadcast queues One slow consumer exhausts process memory Bound the queue; drop or disconnect laggards
Pool acquire with no timeout Saturation looks like a hang, not an error Wrap acquire() in asyncio.timeout(); size pools to load
Semaphore limit > pool max_connections Excess tasks block on pool acquisition until pool timeout Keep the semaphore, pool, and keep-alive caps in agreement
Naive retry loops without jitter Thundering-herd retries overwhelm a recovering service Use exponential backoff with jitter and a global deadline
Mishandled partial recv()/send() Frame corruption, state-machine desync in custom protocols Use readexactly/readuntil; track framing in an explicit state machine

Frequently asked questions

Why does creating an HTTP client per request hurt so much?

Each new aiohttp.ClientSession or httpx.AsyncClient builds its own connection pool, DNS cache, and TLS session cache. Per request you then pay DNS resolution, a TCP three-way handshake, and a full TLS negotiation — tens of milliseconds before the first byte of payload moves, plus you never reuse a keep-alive connection. Create one session at startup, share it across all requests, and close it on shutdown.

When should I drop to raw sockets instead of streams?

Only for custom binary protocols, SO_REUSEPORT load sharding across processes, or when a profiler proves the stream layer's buffer copies are your bottleneck. loop.sock_recv()/sock_sendall() give the lowest latency at the cost of manual buffering and state management. Otherwise prefer StreamReader/StreamWriter for safety and readexactly/readuntil framing helpers.

How do I apply backpressure on a WebSocket broadcast?

Give each connection a bounded outbound asyncio.Queue. When the queue fills, either drop low-priority messages or disconnect the slow consumer — never let it grow unbounded, which OOM-kills the process. On the raw stream side, always await writer.drain() so the transport's high/low water marks suspend a fast producer when a peer reads slowly.

Why do my requests hang instead of erroring under load?

Almost always pool saturation. If the concurrency limit (semaphore) exceeds the connection pool's max_connections, excess tasks block on pool acquisition. Without a timeout on acquire(), that block is indefinite and looks like a hang. Size the pool to expected concurrency, keep the semaphore and pool caps in agreement, and wrap acquisition in asyncio.timeout().

Which async database driver should I use, and why not psycopg2?

Use a native async driver — asyncpg for PostgreSQL, asyncmy for MySQL. Synchronous drivers like psycopg2 or PyMySQL issue blocking syscalls that freeze the entire event loop, stalling every other connection in the process. Remember that the pool's max_size is a hard ceiling on concurrent queries; everything else queues on acquisition.

What does uvloop actually buy me, and is it safe?

uvloop replaces the default event loop with one built on libuv C bindings and typically delivers a 2–4x throughput improvement for network-bound workloads. It is production-safe but changes some syscall and exception-propagation timing, so benchmark your workload and run a canary before rolling it out fleet-wide. Make it the last tuning step, after you have removed blocking calls and right-sized pools.