Event Loop Configuration¶
A production-grade asyncio deployment requires deliberate configuration of the underlying event loop. The defaults that ship with CPython optimise for first-run ergonomics, not for throughput, observability, or fault isolation. A service that runs the loop with default settings will silently swallow exceptions in detached tasks, leave debug instrumentation off when you need it and on when you cannot afford it, block the reactor on a synchronous call nobody noticed, and drop in-flight requests when Kubernetes sends SIGTERM. This reference is the narrow set of decisions that turn a working script into a hardened daemon: how to choose an entrypoint, how to select and install a faster loop backend, how to install a hard error boundary, how to tune debug and slow-callback detection, how to size the executor that absorbs blocking work, and how to wire signals into a deterministic shutdown.
The scope here is configuration of one loop instance for one process. The decision of which API actually drives that loop — asyncio.run() versus a manually managed loop.run_until_complete() — and the step-by-step hardening checklist both have their own dedicated guides, linked throughout. Everything below assumes a Python 3.11+ runtime so that asyncio.Runner, asyncio.TaskGroup, and asyncio.timeout() are available.
Architectural principles¶
- Configure before the loop spins, validate after it runs. Loop backend, debug flag, and policy must be set before the first iteration; an exception handler and signal handlers must be attached from inside the running loop. Ordering errors fail silently — the runtime falls back to defaults rather than raising.
- Every detached task needs an error boundary. A task with no awaiter and no exception handler logs its traceback only when garbage-collected, often long after the failure. A loop-level
set_exception_handleris the single hook that guarantees uncaught task exceptions reach your logging pipeline. - Debug instrumentation is a runtime cost, not a constant. Debug mode and a low
slow_callback_durationare diagnostic tools with measurable per-tick overhead. They belong behind an environment flag, not baked into the production image. - The loop runs on one thread; blocking work must leave it. Any synchronous call —
requests,sqlite3,bcrypt, a vendor SDK — stalls every coroutine until it returns. A bounded executor is the pressure-relief valve, and its size is a tuning parameter, not a constant. - Shutdown is part of the contract. An orchestrator sends
SIGTERMand waits a fixed grace period beforeSIGKILL. A correct service intercepts the signal, stops accepting work, cancels in-flight tasks with a deadline, drains async generators, and closes the loop — all inside that window.
How configuration integrates with the loop scheduler¶
Configuration is not a layer on top of the loop; each knob mutates a specific stage of the loop's core iteration. That iteration runs ready callbacks from loop._ready, polls the selector (or IOCP, or libuv backend) for I/O that became ready, and fires due timers — then repeats. Choosing a backend swaps the multiplexer that the poll stage drives: the pure-Python SelectorEventLoop wraps epoll/kqueue through the selectors module, uvloop replaces the entire C core with libuv. The exception handler is invoked from the step machinery whenever a callback or task raises without a consumer, so it sits directly in the path that drives every Task. slow_callback_duration is checked against loop.time() deltas around each callback execution, which is why it measures synchronous stalls rather than total task latency. The default executor is the bridge from this single thread into a thread pool: run_in_executor submits work and returns a future that the loop's done-callback machinery re-enqueues when the pool thread finishes.
For the full picture of how the selector, timers, and executors compose into one loop iteration, start from the overview at Asyncio Fundamentals & Event Loop Architecture. Because configuration determines how fast and how cleanly tasks move through the ready queue, it is tightly coupled to Task Scheduling & Lifecycle, which covers the state transitions a Task undergoes once the loop is running.
Pattern catalogue¶
The configuration surface decomposes into a handful of patterns. Each is independently useful; the integrated example at the end composes them into one bootstrap.
The asyncio.run() entrypoint¶
For any standalone process — a CLI, a worker, a microservice main — asyncio.run() is the correct entrypoint. It creates a fresh loop, runs the coroutine, cancels leftover tasks, drains async generators, and closes the loop, all in one call. On 3.11+ it is a thin wrapper over asyncio.Runner, which exposes loop_factory so you can inject a backend without touching the deprecated policy system.
Use asyncio.Runner directly only when you need to run several top-level coroutines on one configured loop (the classic REPL/test-harness case). The full decision — and the legacy cases where manual loop control is still correct — is laid out in when to use asyncio.run vs loop.run_until_complete.
Installing uvloop as the backend¶
uvloop replaces the CPython loop with a libuv core and typically improves network throughput by 2–4x by cutting syscall and Python-level dispatch overhead. The forward-compatible installation on 3.11+ is loop_factory, which avoids the policy API that is deprecated since 3.12 and slated for removal in 3.16.
The try/except is mandatory: uvloop has no Windows wheels and may fail to build on minimal images, so the selector loop must remain a working fallback rather than a crash.
A loop-level exception handler¶
Detached tasks that raise without an awaiter only surface their traceback at garbage-collection time. A loop exception handler intercepts them immediately, serialises the context, and forwards it to your logging pipeline. Install it from inside the running loop.
The default_exception_handler fallback preserves CPython's built-in diagnostics; dropping it means losing context fields the handler did not explicitly copy. This boundary is the configuration counterpart to the patterns in Coroutine Design Patterns, where the goal is to never let a task fail unobserved.
Debug mode and slow-callback detection¶
loop.set_debug(True) enables coroutine creation-site tracking, resource-leak warnings for unclosed transports, and logging of any callback that exceeds slow_callback_duration. It adds roughly 10–30% per-tick overhead, so gate it behind an environment flag and only lower the slow-callback threshold when actively hunting stalls.
A logged Executing <Handle ...> took 0.120 seconds line is the loop telling you exactly which callback blocked it — the cheapest stall detector asyncio offers.
Signals and graceful shutdown¶
loop.add_signal_handler runs a callback in the loop thread when a signal arrives — unlike signal.signal, which fires on an arbitrary frame and is unsafe to mix with async state. The handler should set an Event (or cancel a sentinel) rather than do the teardown inline, so cancellation happens in coroutine context.
Signal handling and cancellation are two halves of one mechanism; the deeper treatment of cancel-safe teardown lives under Cancellation Patterns.
Resource boundaries¶
The single hard limit a configured loop must respect is the executor that absorbs blocking work. The loop runs on one thread, so every synchronous call is offloaded to run_in_executor, and the pool behind it has finite capacity. Over-provisioning past 2 × CPU_COUNT for CPU-adjacent work invites GIL thrashing and context-switch overhead; for I/O-bound blocking calls, min(32, (os.cpu_count() or 1) * 4) is a safe starting heuristic. The pool's work queue is unbounded, so backpressure must come from the caller — bound concurrency with a Semaphore or a TaskGroup so the queue cannot grow without limit.
The semaphore is the boundary: without it, a burst of callers can enqueue tens of thousands of jobs faster than the pool drains them, and memory grows until the OOM killer intervenes.
Integrated production bootstrap¶
The following composes every pattern above into one reusable bootstrap: backend selection, debug gating, error boundary, signal-driven shutdown, executor sizing, and deterministic close — driven by asyncio.Runner so the configured loop is created before the first iteration.
Diagnostic Hook: On startup, log
type(loop).__module__(expectuvloopin production),loop.get_debug()(expectFalse), andloop.slow_callback_duration. In production, emit the executor's_work_queue.qsize()andlen(_threads)as gauges; a queue depth that climbs while threads stay pinned atmax_workersis the precursor to memory blow-up. During a deploy, time the gap betweenSIGTERMreceipt and the finalclean shutdown completeline — if it approachesSHUTDOWN_GRACE, your tasks are not yielding on cancel.Diagnostic Hook (debug session): Set
PYTHONASYNCIODEBUG=1and dropslow_callback_durationto0.02. The loop will logExecuting <Handle ...> took N secondsfor every blocking callback and emitcoroutine ... was never awaitedand unclosed-transport warnings. Profile the offending callback withpy-spy dump --pid <pid>to see the synchronous frame stalling the reactor.
Failure modes¶
| Failure mode | Root cause | Detection | Fix |
|---|---|---|---|
| Detached task error vanishes | No awaiter and no loop exception handler; traceback only logs at GC | Errors appear minutes late or never; Task was destroyed but it is pending warnings |
Install loop.set_exception_handler; retain task handles or use TaskGroup |
| Latency spikes across all coroutines | Synchronous call blocking the single loop thread | slow_callback_duration logs Executing <Handle> took N s; py-spy shows a sync frame |
Offload via run_in_executor; bound with a Semaphore |
| Config silently ignored | Backend/debug/policy set after the loop was created | Logged type(loop) is _UnixSelectorEventLoop, not uvloop; debug stays default |
Use Runner(loop_factory=...); set set_debug before any await |
| Pod killed on deploy with dropped requests | No SIGTERM handler; abrupt exit mid-request |
Connection resets at deploy time; no clean shutdown log line |
add_signal_handler → cancel + gather inside the grace period |
| OOM under burst load | Unbounded executor work queue or unbounded task creation | _work_queue.qsize() climbs while _threads is pinned; RSS grows monotonically |
Bound submissions with a Semaphore; cap concurrency with TaskGroup |
RuntimeError: Event loop is closed at exit |
Tasks or executor submissions outlive loop.close() |
Traceback during teardown; ResourceWarning for unclosed transports |
await loop.shutdown_asyncgens() and executor.shutdown(wait=True) before close (handled by Runner) |
Frequently Asked Questions¶
How do I configure a custom event loop in Python 3.12+ without the deprecated policy API?
Pass loop_factory to asyncio.run(main(), loop_factory=...) or use asyncio.Runner(loop_factory=...). For uvloop specifically, loop_factory=uvloop.new_event_loop. The policy system (asyncio.set_event_loop_policy()) is deprecated since 3.12 and slated for removal in 3.16, so reserve it for libraries that still support older runtimes.
What is the actual performance impact of loop.set_debug(True)?
Roughly 10–30% added latency per loop iteration from coroutine creation-site tracking, slow-callback timing, and resource-leak detection, plus higher memory from retained reference chains. Keep it behind an environment flag and enable it only when diagnosing a stall, leak, or race.
When should I swap to uvloop versus tuning the default selector loop?
Use uvloop for I/O-bound network services where epoll/kqueue dispatch dominates — it typically yields 2–4x throughput. Stay on the default loop on Windows (no native uvloop), on minimal images where it will not build, or when a dependency relies on selector-loop internals. Always keep the selector loop as a fallback in the factory.
Why does my exception handler never fire for failing tasks?
It is attached too late, or the task is awaited. The handler fires only for exceptions with no consumer; if you await or gather a task, the exception propagates to that awaiter instead. Attach the handler from inside the running loop before creating detached tasks, and confirm with a deliberately raising fire-and-forget task.
How do I prevent dropped requests when Kubernetes sends SIGTERM?
Register loop.add_signal_handler(SIGTERM, ...) to stop accepting work, cancel in-flight tasks, and await asyncio.gather(..., return_exceptions=True) inside an asyncio.timeout() shorter than the pod's terminationGracePeriodSeconds. Then let Runner/asyncio.run drain async generators and close the loop.
Related¶
- Asyncio Fundamentals & Event Loop Architecture — up to the overview for the full loop-iteration mental model this configuration acts on.
- How to properly configure asyncio event loops for production — the step-by-step hardening checklist with verification commands.
- When to use asyncio.run vs loop.run_until_complete — choosing the entrypoint that drives the configured loop.
- Task Scheduling & Lifecycle — how tasks move through the ready queue once the loop is configured and running.
- Cancellation Patterns — cancel-safe teardown, the other half of graceful shutdown.