Coroutine Design Patterns: Production-Ready Asyncio Architectures¶

Q: When should I use asyncio.TaskGroup over asyncio.gather()?

Use TaskGroup (Python 3.11+) for owned work where all tasks share a lifecycle and must be cancelled together on the first failure, with errors surfacing as an ExceptionGroup. Use gather() for independent best-effort batches where partial success is acceptable and return_exceptions=True lets you collect and triage every result. Reach for TaskGroup first and drop to gather() only when you specifically want non-cancelling, order-preserving collection.

Q: How do I bound concurrency when fanning out over a large list?

Wrap each coroutine's critical section in async with semaphore where semaphore = asyncio.Semaphore(N), then schedule them all inside a TaskGroup. The semaphore admits at most N coroutines into the downstream call at once, so 50,000 scheduled tasks still produce only N concurrent connections. Size N to your connection pool, not above it.

Q: When is as_completed() the right choice over gather()?

Use as_completed() when partial results are actionable and you want to handle each as soon as it finishes rather than waiting on the slowest coroutine, such as progressive UIs, first-response-wins, or early-exit search. Results arrive in completion order, so carry a correlating key. If you need results in input order or all-or-nothing semantics, use gather() or a TaskGroup.

Q: How do bounded queues prevent memory blowup in pipelines?

An asyncio.Queue(maxsize=N) makes await queue.put() suspend the producer once N items are buffered, so the producer can never outrun the consumers by more than N items. This is backpressure: ingestion throttles to processing capacity automatically, capping memory at N items instead of growing with the input. Pair it with sentinels and task_done() so consumers terminate cleanly.

Q: Why are my coroutines showing 'was never awaited' warnings?

A coroutine object was created but never scheduled via await or asyncio.create_task(), so it leaks until the GC finalizes it and emits the RuntimeWarning. Enable PYTHONASYNCIODEBUG=1, audit call sites where an async def is invoked without awaiting, and convert the warning to an error in CI.

Transitioning from basic async/await syntax to production-grade coroutine architectures requires deliberate pattern selection, strict resource boundaries, and deterministic lifecycle control. Once more than a handful of coroutines run concurrently, the questions stop being about syntax and start being about composition: how many run at once, who owns their cancellation, how failures propagate, and how data flows between stages without unbounded buffering. This guide is a working catalogue of the coroutine design patterns that answer those questions — fan-out/fan-in, streaming completion, producer/consumer pipelines, async-generator chains, and Semaphore-bounded concurrency — tuned for mid-to-senior engineers building high-throughput, low-latency Python systems.

Architectural principles that hold across every pattern below:

Pattern selection is a resource-budget decision. Each pattern has a distinct memory and event-loop-saturation profile; gather() materializes every coroutine at once, as_completed() yields incrementally, a Queue caps in-flight items. Choose by the budget you can spend, not by familiarity.
Concurrency must be bounded at the coroutine boundary. Unbounded create_task() or gather() over an attacker- or upstream-controlled list is the most common path to connection exhaustion and OOM. Gate it with asyncio.Semaphore or a fixed worker count.
Ownership and cancellation are inseparable. Every spawned task needs an owner that awaits it and propagates its failure. Structured concurrency via asyncio.TaskGroup makes that ownership explicit and the default.
No blocking on the loop thread. A single synchronous call inside a coroutine stalls every other coroutine sharing the loop. CPU and blocking I/O belong in an executor.
Failures propagate as data. Decide deliberately whether one task's failure should cancel its siblings (TaskGroup) or be collected and triaged (gather(return_exceptions=True)).

Event Loop Integration & Coroutine Execution Model¶

Coroutines are inherently lazy; they do not execute until explicitly scheduled via asyncio.create_task() or awaited. This lazy evaluation model enables cooperative multitasking, but it also imposes a contract: every coroutine must yield control back to the scheduler at predictable intervals. Failing to insert explicit await points starves the loop, causing latency spikes and degrading throughput across all concurrent workloads. Each pattern in this catalogue is, at bottom, a different discipline for how and when coroutines yield to the single scheduler thread.

When designing coroutine entry points, align your execution model with the loop's scheduling guarantees described in Asyncio Fundamentals & Event Loop Architecture. The scheduler is cooperative and single-threaded: it runs ready callbacks to completion until each hits an await that actually suspends. CPU-bound operations must never execute on the main event loop thread. Instead, isolate them using loop.run_in_executor() or asyncio.to_thread() to offload computation to a thread or process pool, preserving the non-blocking contract of the async runtime. How those scheduled tasks transition through PENDING → RUNNING → DONE/CANCELLED is covered in Task Scheduling & Lifecycle; the patterns here assume you can reason about those states.

The synchronization tools the bounded patterns rely on — Semaphore, Event, Lock, bounded Queue — are detailed in Synchronization Primitives. They are not OS primitives; they are cooperative objects that suspend a coroutine on the same loop, which is precisely why they impose backpressure without spawning threads.

Diagnostic Hook¶

Enable loop.set_debug(True) (or PYTHONASYNCIODEBUG=1) during staging and monitor len(asyncio.all_tasks()) in production metrics. Alert when a single callback runs longer than ~100 ms — debug mode logs these as slow-callback warnings — since that is the signature of synchronous work blocking the loop and starving every pattern below.

import asyncio
from concurrent.futures import ProcessPoolExecutor


def cpu_bound(n: int) -> int:
    return sum(i * i for i in range(n))


async def offloaded(executor: ProcessPoolExecutor, n: int) -> int:
    loop = asyncio.get_running_loop()
    # CPU work runs off-loop; the coroutine only awaits the result.
    return await loop.run_in_executor(executor, cpu_bound, n)


async def scheduler_demo() -> None:
    with ProcessPoolExecutor(max_workers=4) as executor:
        tasks = [asyncio.create_task(offloaded(executor, 5_000_000)) for _ in range(5)]
        results = await asyncio.gather(*tasks)
    print(f"Completed {len(results)} CPU-bound tasks without loop starvation")

Pattern Catalogue¶

The patterns below are ordered from least to most structured. Earlier patterns trade ownership for simplicity; later ones trade simplicity for guaranteed teardown. Real systems mix them — a TaskGroup whose members each acquire a Semaphore, draining a bounded Queue.

gather() fan-out / fan-in¶

When to use: a fixed, known set of independent coroutines whose results you need all at once, in input order. asyncio.gather() schedules every coroutine immediately and returns a list aligned to the argument order.

Trade-off: there is no concurrency limit — passing 10,000 coroutines schedules 10,000 tasks at once. By default a single exception cancels the remaining tasks and propagates; with return_exceptions=True, every result (success or exception) is returned and you triage afterward. gather() does not give you structured cancellation of its own parent task, which is why TaskGroup is now preferred for owned work.

import asyncio


async def fetch(symbol: str) -> float:
    await asyncio.sleep(0.1)  # stand-in for an I/O call
    if symbol == "BAD":
        raise ValueError(f"no quote for {symbol}")
    return len(symbol) * 1.5


async def fan_out(symbols: list[str]) -> None:
    results = await asyncio.gather(
        *(fetch(s) for s in symbols),
        return_exceptions=True,  # one bad symbol must not drop the rest
    )
    ok = {s: r for s, r in zip(symbols, results) if not isinstance(r, Exception)}
    bad = {s: r for s, r in zip(symbols, results) if isinstance(r, Exception)}
    print(f"ok={ok}  failures={ {k: str(v) for k, v in bad.items()} }")


asyncio.run(fan_out(["AAPL", "BAD", "MSFT"]))

as_completed() streaming results¶

When to use: you want to process each result the instant it is ready rather than blocking on the slowest coroutine. Ideal for latency-sensitive fan-out where partial results are actionable — first-response-wins, progressive rendering, early-exit search.

Trade-off: results arrive in completion order, not input order, so you must carry a correlating key. as_completed() still schedules all coroutines up front, so it does not bound concurrency on its own — combine it with a Semaphore for that.

import asyncio


async def probe(host: str, delay: float) -> tuple[str, float]:
    await asyncio.sleep(delay)
    return host, delay


async def stream_results() -> None:
    coros = [probe("a", 0.3), probe("b", 0.1), probe("c", 0.2)]
    # Yields awaitables in the order they finish, not the order passed in.
    for finished in asyncio.as_completed(coros):
        host, latency = await finished
        print(f"first to answer: {host} ({latency:.1f}s)")  # b, then c, then a

Producer/consumer via asyncio.Queue¶

When to use: producers and consumers run at different rates, or you need to decouple ingestion from processing. A bounded asyncio.Queue(maxsize=N) is the canonical backpressure mechanism: when full, await queue.put() suspends the producer until a consumer drains an item, throttling ingestion to processing capacity without dropping data. This pattern is explored in depth under Async Queue Management.

Trade-off: you own shutdown. Sentinels or queue.join() with task_done() are required to terminate consumers cleanly; forgetting them leaves consumers awaiting get() forever.

import asyncio


async def producer(q: asyncio.Queue[int | None], n: int, consumers: int) -> None:
    for i in range(n):
        await q.put(i)  # suspends here when q is full -> backpressure
    for _ in range(consumers):
        await q.put(None)  # one sentinel per consumer


async def consumer(name: str, q: asyncio.Queue[int | None]) -> None:
    while (item := await q.get()) is not None:
        await asyncio.sleep(0.05)  # processing work
        q.task_done()
    q.task_done()


async def run() -> None:
    q: asyncio.Queue[int | None] = asyncio.Queue(maxsize=10)
    async with asyncio.TaskGroup() as tg:
        tg.create_task(producer(q, 50, consumers=3))
        for i in range(3):
            tg.create_task(consumer(f"c{i}", q))
    print("queue drained")

Async-generator pipeline¶

When to use: streaming transformation of an unbounded or large source where you must not materialize the whole dataset. Async generators (async def with yield) chain naturally — each stage async fors over the previous one, applying lazy, demand-driven transformation.

Trade-off: a generator is single-consumer and back-pressured by the consumer's pace, which is usually what you want, but it does not parallelize across stages by itself. Always close generators (via async with aclosing(...) or an explicit aclose()) so their finally blocks release resources.

import asyncio
from contextlib import aclosing
from typing import AsyncIterator


async def source(n: int) -> AsyncIterator[int]:
    for i in range(n):
        await asyncio.sleep(0.01)
        yield i


async def transform(src: AsyncIterator[int]) -> AsyncIterator[int]:
    async for x in src:
        yield x * x  # lazy, one item at a time


async def pipeline() -> None:
    async with aclosing(transform(source(5))) as stream:
        async for value in stream:
            print(value)  # 0, 1, 4, 9, 16 — never more than one item in flight

Semaphore-bounded fan-out¶

When to use: a large or upstream-controlled set of I/O coroutines that must run concurrently but capped — the standard fix for "I called gather() over 50,000 URLs and exhausted the connection pool." An asyncio.Semaphore(N) admits at most N coroutines into the critical section at once.

Trade-off: the cap is global to that semaphore; per-host or per-tenant fairness needs multiple semaphores. The acquire/release adds negligible overhead, but holding the slot across the wrong scope (e.g., across a retry sleep) silently shrinks effective concurrency.

import asyncio


async def bounded_fetch(sem: asyncio.Semaphore, url: str) -> str:
    async with sem:  # at most N concurrent inside this block
        await asyncio.sleep(0.1)  # stand-in for an HTTP GET
        return f"{url}:200"


async def fan_out_bounded(urls: list[str], limit: int = 20) -> list[str]:
    sem = asyncio.Semaphore(limit)
    async with asyncio.TaskGroup() as tg:
        tasks = [tg.create_task(bounded_fetch(sem, u)) for u in urls]
    return [t.result() for t in tasks]


asyncio.run(fan_out_bounded([f"https://api/{i}" for i in range(1000)], limit=20))

TaskGroup structured concurrency¶

When to use: any set of tasks with a shared lifecycle and a clear owner — which is most owned concurrency. asyncio.TaskGroup (Python 3.11+) guarantees that when the async with block exits, every child has completed, been cancelled, or its failure has propagated as an ExceptionGroup. No orphaned tasks, no forgotten awaits. This is the default recommendation; for a step-by-step treatment see Structured Concurrency with asyncio.TaskGroup.

Trade-off: the first failure cancels all siblings — exactly right for "all-or-nothing" work, wrong when you want independent best-effort batches (use gather(return_exceptions=True) there). Handling its ExceptionGroup requires except*, covered under Exception Groups & TaskGroups.

import asyncio


async def step(name: str, delay: float) -> str:
    await asyncio.sleep(delay)
    return f"{name} done"


async def structured() -> None:
    async with asyncio.TaskGroup() as tg:
        t1 = tg.create_task(step("db", 0.2))
        t2 = tg.create_task(step("cache", 0.1))
    # Reached only if BOTH succeeded; otherwise an ExceptionGroup is raised.
    print(t1.result(), "|", t2.result())


asyncio.run(structured())

Resource Boundaries: Pooling, Limits & Backpressure¶

Unbounded concurrency is the primary cause of connection exhaustion, memory spikes, and downstream service overload in async systems. Production architectures enforce limits at three layers, and the pattern catalogue maps directly onto them: a Semaphore caps concurrent in-flight operations, a bounded Queue(maxsize=N) caps buffered work and applies backpressure, and a connection pool caps real sockets. Size them together — a semaphore limit above your pool size just queues coroutines inside the pool's own waiter list, hiding the true bottleneck.

Pool sizing must align with the thread-pool and file-descriptor ceilings discussed in Event Loop Configuration, plus OS-level socket buffers. The guiding rule: the narrowest limit should be the one you chose, not the one the OS imposes on you under load.

Mechanism	Bounds what	Use case	Trade-off
`asyncio.Semaphore(N)`	Concurrent in-flight coroutines	Cap parallel calls to an external API/DB	Global to the semaphore; per-tenant fairness needs more of them
`asyncio.Queue(maxsize=N)`	Buffered, not-yet-processed items	Producer/consumer, stream ingestion	Producer blocks when full; you own shutdown sentinels
Connection pool (`aiohttp.ClientSession`, `asyncpg.Pool`)	Real sockets / sessions	Reuse TCP/TLS handshakes	Higher memory baseline; requires explicit lifecycle teardown
`TaskGroup`	Task ownership & lifetime	Owned, all-or-nothing concurrency	First failure cancels siblings; not for best-effort batches

Diagnostic Hook¶

Instrument the semaphore's available permits (sem._value in a pinch, or wrap acquire/release with your own gauge) and queue depth via q.qsize(). Export both as metrics and alert when semaphore wait time exceeds your p95 latency or when qsize() sits at maxsize for sustained windows — that is backpressure firing, telling you consumers cannot keep up with producers.

Integrated Production Example¶

The following service composes the catalogue into one realistic shape: a bounded Queue ingests work, a TaskGroup owns a fixed pool of consumer coroutines, each consumer acquires a Semaphore before touching a rate-limited downstream, and per-item failures are recorded rather than aborting the run. It demonstrates backpressure, bounded concurrency, structured ownership, and graceful drain together.

import asyncio
import logging
import random
from dataclasses import dataclass, field

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("pipeline")


@dataclass
class Stats:
    processed: int = 0
    failed: int = 0
    errors: list[str] = field(default_factory=list)


async def call_downstream(sem: asyncio.Semaphore, item: int) -> int:
    async with sem:  # cap concurrent downstream calls regardless of consumer count
        await asyncio.sleep(random.uniform(0.02, 0.08))
        if item % 17 == 0:
            raise RuntimeError(f"downstream rejected item {item}")
        return item * 2


async def producer(q: asyncio.Queue[int | None], total: int, consumers: int) -> None:
    for i in range(total):
        await q.put(i)  # backpressure: blocks when the queue is full
    for _ in range(consumers):
        await q.put(None)  # poison pill per consumer for clean shutdown


async def consumer(
    name: str,
    q: asyncio.Queue[int | None],
    sem: asyncio.Semaphore,
    stats: Stats,
) -> None:
    while (item := await q.get()) is not None:
        try:
            result = await call_downstream(sem, item)
            stats.processed += 1
            logger.debug("%s -> %s", name, result)
        except Exception as exc:  # isolate per-item failure; keep the worker alive
            stats.failed += 1
            stats.errors.append(str(exc))
        finally:
            q.task_done()
    q.task_done()  # account for the sentinel


async def run_service(total: int = 200, consumers: int = 4, max_inflight: int = 10) -> Stats:
    q: asyncio.Queue[int | None] = asyncio.Queue(maxsize=32)
    sem = asyncio.Semaphore(max_inflight)
    stats = Stats()
    try:
        async with asyncio.timeout(30):  # whole-run deadline
            async with asyncio.TaskGroup() as tg:
                tg.create_task(producer(q, total, consumers), name="producer")
                for i in range(consumers):
                    tg.create_task(consumer(f"c{i}", q, sem, stats), name=f"consumer-{i}")
    except TimeoutError:
        logger.error("run exceeded deadline; consumers cancelled by TaskGroup")
        raise
    logger.info("processed=%d failed=%d", stats.processed, stats.failed)
    return stats


if __name__ == "__main__":
    asyncio.run(run_service())

Diagnostic Hook. Export stats.processed, stats.failed, q.qsize(), and the count of in-flight downstream calls (max_inflight - sem._value) as gauges. In a healthy run qsize() oscillates below maxsize; if it pins at maxsize, consumers are the bottleneck — add consumers or raise max_inflight only if the downstream can absorb it. A rising failed rate with flat latency points at the downstream, not your loop.

Diagnostic Hook: what to watch in production

Slow-callback warnings under PYTHONASYNCIODEBUG=1 — any callback over ~100 ms is synchronous work blocking the loop; offload it.
len(asyncio.all_tasks()) trend — a monotonic climb means tasks are created faster than they finish, or are never awaited; cross-check with the unawaited-coroutine guide.
Queue depth at maxsize — backpressure is active; producers are being throttled to consumer capacity.
Semaphore wait time — if it exceeds p95 request latency, the downstream limit, not your code, is the constraint.
RuntimeWarning: coroutine ... was never awaited in logs/CI — a missing await or create_task; treat as an error in tests.

Failure Modes¶

Failure mode	Root cause	Detection	Fix
Connection-pool / FD exhaustion	`gather()` or `create_task()` over an unbounded list	`Too many open files`; pool-timeout errors; FD count climbing	Gate with `Semaphore(N)` (bounded fan-out) sized to pool capacity
Event-loop starvation, p99 spikes	Blocking I/O or CPU run on the loop thread	Slow-callback warnings in debug mode; latency cliff under load	Offload via `asyncio.to_thread()` / `run_in_executor()`
Orphaned tasks / silent failures	Bare `create_task()` with no owner; result never awaited	`Task exception was never retrieved`; `all_tasks()` count drifting up	Own tasks in a `TaskGroup`; see debugging unawaited coroutines
Unbounded memory growth	`gather()` materializing huge result lists; unclosed async generators	RSS climbs with input size; lingering generator frames in `gc`	Stream with `as_completed()`/generators; `aclose()` every generator
Consumers hang forever	Missing sentinels / `task_done()` in producer/consumer	`queue.join()` never returns; workers stuck in `get()`	Send one sentinel per consumer; pair every `get()` with `task_done()`
One failure aborts the whole batch	`gather()` without `return_exceptions=True` when partial success is acceptable	A single error cancels siblings unexpectedly	Use `return_exceptions=True` and triage, or switch to a `TaskGroup` if all-or-nothing is intended

Frequently Asked Questions¶

When should I use asyncio.TaskGroup over asyncio.gather()?

Use TaskGroup (Python 3.11+) for owned work where all tasks share a lifecycle and must be cancelled together on the first failure, with errors surfacing as an ExceptionGroup. Use gather() for independent best-effort batches where partial success is acceptable and return_exceptions=True lets you collect and triage every result. As a rule, reach for TaskGroup first and drop to gather() only when you specifically want non-cancelling, order-preserving collection.

How do I bound concurrency when fanning out over a large list?

Wrap each coroutine's critical section in async with semaphore where semaphore = asyncio.Semaphore(N), then schedule them all inside a TaskGroup. The semaphore admits at most N coroutines into the downstream call at once, so 50,000 scheduled tasks still produce only N concurrent connections. Size N to your connection pool, not above it.

When is as_completed() the right choice over gather()?

Use as_completed() when partial results are actionable and you want to handle each as soon as it finishes rather than waiting on the slowest coroutine — progressive UIs, first-response-wins, or early-exit search. Results arrive in completion order, so carry a correlating key. If you need results in input order or all-or-nothing semantics, use gather() or a TaskGroup.

How do bounded queues prevent memory blowup in pipelines?

A asyncio.Queue(maxsize=N) makes await queue.put() suspend the producer once N items are buffered, so the producer can never outrun the consumers by more than N items. This is backpressure: ingestion throttles to processing capacity automatically, capping memory at N items instead of growing with the input. Pair it with sentinels and task_done() so consumers terminate cleanly.

Why are my coroutines showing 'was never awaited' warnings?

A coroutine object was created but never scheduled via await or asyncio.create_task(), so it leaks until the GC finalizes it and emits the RuntimeWarning. Enable PYTHONASYNCIODEBUG=1, audit call sites where an async def is invoked without awaiting, and convert the warning to an error in CI. The full workflow is in the debugging unawaited coroutines guide.

Asyncio Fundamentals & Event Loop Architecture — up to the overview for the scheduler model these patterns build on.
Structured Concurrency with asyncio.TaskGroup — step-by-step adoption of nursery-style task ownership.
Debugging Unawaited Coroutines in Large Codebases — find and fix the orphaned-task failure mode above.
Task Scheduling & Lifecycle — how scheduled tasks move through their states.
Synchronization Primitives — the Semaphore, Lock, Event, and Queue these patterns depend on.
Async Queue Management — deeper treatment of producer/consumer and queue backpressure.

Coroutine Design Patterns: Production-Ready Asyncio Architectures¶

Event Loop Integration & Coroutine Execution Model¶

Diagnostic Hook¶

Pattern Catalogue¶

gather() fan-out / fan-in¶

as_completed() streaming results¶

Producer/consumer via asyncio.Queue¶

Async-generator pipeline¶

Semaphore-bounded fan-out¶

TaskGroup structured concurrency¶

Resource Boundaries: Pooling, Limits & Backpressure¶

Diagnostic Hook¶

Integrated Production Example¶

Failure Modes¶

Frequently Asked Questions¶

Related¶