Asyncio Fundamentals & Event Loop Architecture¶
Python's asyncio framework implements a single-threaded, cooperative concurrency model designed for high-throughput I/O-bound workloads. Unlike preemptive threading, where the OS scheduler interrupts execution at arbitrary bytecode boundaries, asyncio relies on explicit suspension points (await) to yield control back to a central event loop. This architecture minimizes context-switching overhead and eliminates most thread synchronization costs, making it the default substrate for network services, microservice gateways, real-time data pipelines, and any system where the dominant cost is waiting on sockets, files, or downstream services rather than CPU.
The trade-off is that the cooperative model is unforgiving: a single coroutine that fails to yield — because it runs a CPU-heavy loop, calls a blocking C extension, or invokes synchronous I/O — stalls every other task sharing that loop. There is no preemption to rescue you. Operating asyncio in production therefore means reasoning precisely about where control transfers, which awaitable owns which state, and how cancellation and exceptions propagate through a tree of concurrently running work. This reference deconstructs the event loop mechanics, the awaitable state machines (coroutines, tasks, futures), scheduling and concurrency control, resource management, and the diagnostic workflows required to deploy resilient async systems.
The material is organized so you can read top-to-bottom for the full mental model, or jump to a primitive. Each major section carries a runnable example and a Diagnostic Hook describing the metric, debug flag, or signal to watch in production.
The Conceptual Model: Loop, Awaitables, and Scheduling¶
At its foundation, asyncio operates on the Reactor pattern. The event loop continuously polls registered file descriptors using a platform-specific selector (epoll on Linux, kqueue on macOS/BSD, IOCP on Windows). When an I/O operation transitions to a ready state, the loop schedules the associated callback or coroutine step for execution. The loop runs one iteration ("tick") at a time, and within a tick it drains a queue of ready callbacks to completion before polling I/O again — so a callback that never returns control freezes the entire process.
The loop maintains three structures that fully describe its scheduling behavior:
- Ready queue (
_ready): a FIFO deque of callbacks and resumed coroutine steps immediately eligible to run. Everything that "happens next" passes through here. - Timer heap (
_scheduled): a min-heap of time-based callbacks registered viacall_later/call_at. The loop computes its selector poll timeout from the soonest due timer, so timers never busy-wait. - Selector / I/O multiplexer: the bridge to the OS. File descriptors registered for read/write readiness; when ready, their callbacks are pushed onto the ready queue.
A single loop tick is therefore: run every callback currently in the ready queue, pop any timers whose deadline has passed and enqueue their callbacks, then block in the selector (bounded by the next timer) until I/O is ready or the timeout fires, enqueueing the readiness callbacks. The diagram below shows how these pieces feed one another and how a coroutine step flows from "scheduled" to "running."
Layered above the loop is the awaitable hierarchy. A coroutine is the unit of suspendable work; a Task drives a coroutine to completion on the loop; a Future is the low-level result placeholder that Tasks subclass and that bridges callback-based code into await. The aggregation APIs (gather, wait, as_completed) and the synchronization primitives (Lock, Semaphore, Event) all sit on top of these three. The following reference table fixes the vocabulary used throughout this document.
| Primitive / API | What it is | Awaitable? | Schedules on loop? | Primary use |
|---|---|---|---|---|
| Event loop | The scheduler running ready callbacks, timers, and I/O each tick | No | — | Owns all execution; one per thread |
| Coroutine | async def object; lazy, runs only when driven |
Yes | No (inert until wrapped) | Define suspendable logic |
Task |
Wraps a coroutine and schedules it immediately | Yes | Yes | Run coroutines concurrently |
Future |
Low-level result placeholder resolved via set_result |
Yes | No (resolved externally) | Bridge callbacks; base class of Task |
asyncio.gather() |
Run awaitables concurrently, collect ordered results | Yes | Wraps coroutines in Tasks | Fan-out / fan-in batch |
asyncio.wait() |
Run awaitables, return (done, pending) sets |
Yes (returns sets) | Operates on Tasks | Low-level grouping, timeouts |
asyncio.as_completed() |
Yield awaitables in completion order | Iterator of awaitables | Wraps in Tasks | Streaming earliest results |
asyncio.TaskGroup |
Structured concurrency scope (3.11+) | Async context manager | Spawns child Tasks | Default for grouped concurrency |
asyncio.Semaphore |
Counting gate limiting concurrent holders | async with |
No | Bound concurrency / rate |
asyncio.Lock |
Mutual-exclusion gate, one holder | async with |
No | Protect critical sections |
The dedicated guides — event loop configuration, coroutine design patterns, task scheduling and lifecycle, async context managers and iterators, future objects and callbacks, and synchronization primitives — drill into each row. The sections that follow define the three core awaitables in turn, then move to a production bootstrap, concurrency control, and diagnostics.
The Event Loop Core & Execution Model¶
The loop is the only thing that actually runs. asyncio.run(main()) creates a fresh loop, runs the main() coroutine to completion, cancels any leftover tasks, shuts down async generators, and closes the loop — it is the single sanctioned entry point in 3.11+. Underneath, run_until_complete schedules a future and ticks the loop until that future is done; the choice between them is covered in when to use asyncio.run vs loop.run_until_complete.
Loop behavior is governed by a handful of knobs that materially affect tail latency: the selector class, the exception handler (the catch-all for errors in callbacks and discarded tasks), debug mode, and slow_callback_duration (the threshold above which the loop logs a callback as too long). Misconfiguration here directly degrades stability — see event loop configuration for the production checklist and how to properly configure asyncio event loops for production for the full walkthrough.
Production-Grade Event Loop Bootstrap¶
Diagnostic Hook: Monitor loop.time() drift against time.monotonic() to detect long ticks caused by GC pauses or a blocking callback. Keep slow_callback_duration at 50ms or lower in staging so the loop logs offenders by name; in production, enable loop.set_debug(True) conditionally via an env flag because frame tracking adds roughly 10–15% overhead.
Coroutines: The Unit of Suspendable Work¶
A coroutine is the object produced by calling an async def function. It is lazy: calling the function runs none of its body — it returns a coroutine object whose code advances only when something drives it (an await, a Task, or asyncio.run). Internally a coroutine is a resumable state machine; each await on a not-yet-ready awaitable suspends the coroutine and hands a value back up to the loop, which resumes it later from exactly that point.
Lifecycle / state. A coroutine moves through created (object exists, never started) → suspended (paused at an await) → running (executing a step between awaits) → finished (returned or raised). It owns no scheduling of its own — a bare coroutine never reaches the loop until wrapped in a Task or awaited from one that is.
Common misuse. The signature failure is the un-awaited coroutine: writing fetch(url) instead of await fetch(url) (or asyncio.create_task(fetch(url))) creates the object, discards it, and emits RuntimeWarning: coroutine 'fetch' was never awaited — the work silently never runs. A second trap is treating a coroutine like a thread by calling it in a tight CPU loop without any await, which monopolizes the tick. Run with PYTHONASYNCIODEBUG=1 and a coroutine-aware linter; the techniques in debugging unawaited coroutines in large codebases make these findable at scale, and broader composition idioms live in coroutine design patterns.
Tasks: Driving Coroutines Concurrently¶
A Task is a Future subclass that wraps a coroutine and schedules the first step on the loop at creation time via asyncio.create_task(). Where a coroutine is inert, a Task is live — it runs concurrently with whatever created it, advancing whenever the loop gives it a turn. Tasks are how you get more than one thing happening at once.
Lifecycle / state. PENDING (scheduled, possibly suspended at an await) → done, where "done" resolves to one of: a result, an exception, or CANCELLED. Querying state uses task.done(), task.cancelled(), task.result() (re-raises the stored exception), and task.exception(). Cancellation is delivered by throwing CancelledError into the coroutine at its current suspension point; the coroutine may run cleanup before the exception propagates, but in 3.11+ it must not swallow it.
Common misuse. The most damaging anti-pattern is the fire-and-forget Task held by no reference: asyncio.create_task(worker()) whose return value is dropped. The loop keeps only a weak reference, so the Task can be garbage-collected mid-flight, producing Task was destroyed but it is pending! and losing exceptions. Always retain the reference (or use a TaskGroup). The second misuse is ignoring exceptions: a Task that raises but is never awaited logs its error only at GC time. Prefer asyncio.TaskGroup, which awaits all children and re-raises failures as an ExceptionGroup. The distinction between create_task and the legacy ensure_future is covered in understanding asyncio.create_task vs asyncio.ensure_future.
Futures: The Low-Level Result Placeholder¶
A Future represents a result that does not exist yet. Unlike a Task it has no coroutine to drive — it sits PENDING until external code calls future.set_result(value) or future.set_exception(exc), at which point any coroutine awaiting it is rescheduled. Task is itself a Future, which is why await some_task works. You rarely construct a raw Future in application code, but it is the indispensable adapter when wrapping a callback-driven library (a database driver, a C extension, a legacy transport) so it can be await-ed.
Lifecycle / state. PENDING → (result set | exception set | cancelled). A Future is single-shot: setting its result twice raises InvalidStateError, so resolution code must guard with if not future.done(). It is also loop-bound — a Future created on one loop cannot be awaited on another.
Common misuse. Resolving a Future from a non-loop thread without loop.call_soon_threadsafe causes data races and lost wakeups; the bridge pattern and run_in_executor integration are detailed in future objects and callbacks. A second trap is leaking a never-resolved Future, which leaves its awaiter suspended forever — always pair Future creation with a guaranteed resolver or a timeout.
Manual Future vs. Task Lifecycle¶
Diagnostic Hook: Use asyncio.all_tasks() to enumerate live workloads and task.get_stack() on a suspended task to see exactly which await it is parked on. Confirm cancellation actually settles by awaiting the cancelled task inside contextlib.suppress(asyncio.CancelledError) — a Task that never gets awaited after cancel becomes a zombie and may emit a destroy warning.
A Production Async Service: Wiring the Primitives Together¶
The three awaitables, a TaskGroup, a Semaphore, and structured timeouts combine into a realistic worker that fans out bounded concurrent work, propagates failures cleanly, and shuts down deterministically. This is the shape most production services converge on.
Diagnostic Hook: Track metrics.max_inflight against the semaphore ceiling — if peak in-flight is pinned at the limit for sustained periods, requests are queueing on the gate and you are either under-provisioned on concurrency or the downstream is the bottleneck (watch p50/p95 in durations rising together). Export inflight, success/fail counts, and the duration percentiles to Prometheus; a divergence between succeeded + failed and the input size flags work lost to silent cancellation.
Concurrency Control & Resource Boundaries¶
The scheduler is cooperative and has no notion of fairness beyond FIFO ordering, so you own concurrency limits. Unbounded task creation is the most common production failure: ten thousand simultaneous create_task calls open ten thousand sockets, exhaust the connection pool, and bury the downstream service under a thundering herd. The fix is always a bound — a Semaphore, a TaskGroup over a chunked workload, or a queue with a fixed worker count. The table below maps each control primitive to its use case and the trade-off you accept by choosing it.
| Primitive | Use case | Trade-off |
|---|---|---|
asyncio.Semaphore(n) |
Cap concurrent holders of a resource (sockets, downstream calls) | Simple and composable, but does not order waiters by priority; a long holder blocks the queue |
asyncio.Lock |
Protect a critical section so one coroutine mutates shared state at a time | Serializes access; over-scoping it collapses concurrency to one |
asyncio.Event |
Broadcast a one-to-many "go" / "ready" signal | No payload and no auto-reset; you manage set/clear lifecycle |
asyncio.gather() |
Fan-out a fixed batch, collect ordered results | Returns only when all finish (or one raises, unless return_exceptions=True); unbounded if you pass thousands |
asyncio.as_completed() |
Stream results as soon as each finishes | Earliest-first latency, but results arrive unordered and need per-item error handling |
asyncio.TaskGroup |
Default grouped concurrency with cancel-on-failure | Structured and leak-free, but a single failure cancels all siblings — wrong when you want partial success |
asyncio.Queue(maxsize) |
Producer/consumer with backpressure across a worker pool | Bounded queue applies backpressure; sizing the pool and queue is its own tuning problem |
Choosing between Lock, Semaphore, and Event is rarely obvious; choosing asyncio Lock vs Semaphore vs Event walks the decision in depth, and the synchronization primitives overview covers the full set including Condition and BoundedSemaphore. The gather / wait / as_completed aggregation choices and their cancellation semantics are detailed under task scheduling and lifecycle.
Bounded Concurrency with a Semaphore¶
Note the composition: a TaskGroup creates all 100 Tasks at once, but the shared Semaphore(10) ensures at most ten are inside guarded_call's critical region simultaneously. Spawning is cheap; the gate enforces the real resource limit. For resource objects that must be deterministically released — pooled connections, files, locks acquired across await — wrap them in async with so __aexit__ runs even under cancellation; the protocol and its pitfalls are covered in async context managers and iterators and the best practices for async context managers in Python guide.
For CPU-bound work, no async primitive helps — the loop is single-threaded. Offload to a process pool via loop.run_in_executor() or hand blocking SDK calls to a thread with asyncio.to_thread(); the worker-pool topologies for this live in the concurrent execution and worker patterns reference. The bulk of real async work, though, is network I/O — sockets, HTTP clients, and protocol handlers, covered in network I/O and protocol handling. When the work is fundamentally about timeouts, cancellation, and retries rather than scheduling, the resilience, cancellation, and error handling reference is the companion to this one.
Async Resource Management & Iteration Protocols¶
Deterministic cleanup is non-negotiable in async architectures. Leaked sockets, unclosed file descriptors, and orphaned database connections degrade stability over time and surface as slow resource exhaustion that is painful to trace. Python's async with relies on the __aenter__ / __aexit__ protocol to guarantee teardown even during cancellation or exception propagation — the key property being that __aexit__ is invoked when a CancelledError tears through the async with body, which is what makes it safe under timeout.
Async generators (async def with yield) introduce extra suspension boundaries: the scheduler can pause only at an explicit await or yield. A generator suspended mid-iteration holds its frame — and any large buffers it references — until it is exhausted or explicitly closed via aclose(). asyncio.run calls loop.shutdown_asyncgens() on exit to close stragglers, but generators created and abandoned mid-run leak. The interaction between generator frames and low-level resolution is unpacked in future objects and callbacks.
Async Connection Pool with Graceful Drain¶
Diagnostic Hook: Run with PYTHONASYNCIODEBUG=1 to surface ResourceWarning for unclosed transports and sockets, and call gc.get_referrers() on a suspected leak to find the coroutine frame still holding it. Wrap concurrent cleanup in an asyncio.TaskGroup so a teardown that itself raises does not silently swallow sibling errors.
Production Diagnostics & Performance Tuning¶
Asyncio performance degradation almost always traces to one of three causes: a synchronous blocking call on an async path, excessive per-task allocation churn, or selector inefficiency. Diagnose them with a repeatable workflow rather than ad-hoc print statements.
- Reproduce under load and capture a baseline. Drive realistic traffic and record p50/p95/p99 latency plus throughput. A tail that balloons while the median stays flat points to intermittent loop stalls, not steady-state cost.
- Detect blocking calls. Enable
loop.set_debug(True)and setslow_callback_durationto 50ms; the loop then logs any callback that hogged a tick, named by function. Confirm with a sampling profiler —py-spy dumporaustin— which captures async frames without taking the GIL. - Quantify task and memory churn. Count live tasks with
len(asyncio.all_tasks())over time; a monotonic climb means tasks are created faster than they finish. Each Task carries a coroutine frame and scheduler metadata, so high-churn paths should reuse pools and chunkgatherbatches rather than spawn unbounded. - Tune the selector / loop. For high fd counts, swap in
uvloop(built on libuv) which typically cuts latency and lifts throughput 2–4x as a drop-in policy. Re-measure against step 1's baseline before and after — never tune by feel. - Wire the metrics permanently. Promote the ad-hoc measurements (slow-callback count, live-task gauge, p95 latency) into exported metrics so the next regression is caught by an alert, not a customer.
Slow-Callback Profiler Hook¶
Diagnostic Hook: Instrument boundary functions, not every await in a hot path — wrapping inner awaits distorts the very latency you are measuring. Correlate the profiler's slow-boundary warnings with the loop's own slow_callback_duration log lines; when both fire on the same function, you have located a true blocking offender. Route these counts to Prometheus/Grafana for latency-spike root-cause analysis.
Common Pitfalls in Production Systems¶
| Anti-Pattern | Impact | Mitigation |
|---|---|---|
| Blocking the loop with synchronous I/O or CPU-heavy work | Scheduler starvation, cascading timeouts across every task | Offload via loop.run_in_executor() / asyncio.to_thread(), or use async-native libraries |
| Fire-and-forget Task with no retained reference | Task was destroyed but it is pending!, lost exceptions |
Keep the reference or spawn inside an asyncio.TaskGroup |
Ignoring CancelledError in cleanup paths |
Zombie tasks, leaked resources, incomplete transactions | Catch, run deterministic cleanup, then re-raise — never swallow |
Unbounded concurrency (thousands of create_task / gather) |
Memory exhaustion, pool saturation, downstream thundering herd | Gate with asyncio.Semaphore, chunk batches, or use a bounded queue |
Resolving a Future from another thread directly |
Data races, lost wakeups | Marshal back with loop.call_soon_threadsafe |
Mixing loop.run_until_complete() with asyncio.run() |
RuntimeError: Event loop is closed, loop-reuse conflicts |
Use asyncio.run() as the single entry point; reserve legacy calls for embedding |
Failing to await a coroutine |
RuntimeWarning: coroutine was never awaited, silent no-op |
Enable PYTHONASYNCIODEBUG=1, lint with a coroutine-aware checker |
Frequently Asked Questions¶
How does asyncio achieve concurrency without threads?
Through cooperative multitasking on a single thread. The event loop runs one coroutine step at a time and switches to another ready task only when the current one suspends at an await — typically while waiting on I/O or a timer. Because switches happen at known points and on one thread, there is no preemption, no GIL contention between tasks, and almost no lock overhead, which is why the model excels at I/O-bound concurrency.
What is the difference between a coroutine, a Task, and a Future?
A coroutine is the inert object an async def call returns — it runs nothing until driven. A Task wraps a coroutine and schedules it on the loop immediately, so it runs concurrently and tracks its own lifecycle. A Future is a low-level result placeholder resolved by external code via set_result; Task is a subclass of Future, which is why you can await a Task.
When should I use asyncio.run() vs loop.run_until_complete()?
Use asyncio.run() as the single production entry point: it creates a fresh loop, runs the coroutine, cancels stragglers, shuts down async generators, and closes the loop. Reserve run_until_complete() for REPLs, legacy code, or embedding asyncio into an existing synchronous framework where you manage the loop yourself.
How do I prevent event loop starvation in high-throughput systems?
Keep CPU-bound and blocking work off the loop by offloading it to ProcessPoolExecutor via run_in_executor() or to a thread via asyncio.to_thread(). Bound concurrency with a Semaphore so you never overwhelm downstreams, audit third-party libraries for hidden synchronous calls, and watch slow_callback_duration warnings and loop.time() drift to catch stalls early.
Why do I get 'Task was destroyed but it is pending!'?
A Task was garbage-collected before it finished because nothing held a reference to it. The loop keeps only a weak reference, so a fire-and-forget Task can vanish mid-flight, losing its result and any exception. Retain the Task object, await it, or — best — create it inside an asyncio.TaskGroup, which awaits all children before exiting.
What is the real performance impact of uvloop over the default loop?
uvloop, built on libuv, typically lowers latency and raises throughput by 2–4x for I/O-heavy workloads by replacing the pure-Python selector loop with an optimized C implementation. It is a drop-in policy swap, but you should benchmark against your own baseline since gains depend on connection counts and how much time is genuinely spent in the loop versus your handlers.
Related¶
- Event loop configuration — selector choice, exception routing, and slow-callback tuning for production loops.
- Coroutine design patterns — composition idioms and how to find un-awaited coroutines at scale.
- Task scheduling and lifecycle —
create_task, the aggregation APIs, and cancellation semantics in depth. - Synchronization primitives — choosing and using
Lock,Semaphore,Event, andCondition. - Future objects and callbacks — bridging callback-driven and threaded code into
await. - Resilience, cancellation, and error handling — timeouts, deadlines, retries, and exception groups that pair with this scheduling model.