Timeouts & Deadlines for Async Operations¶
A timeout is the most misunderstood reliability primitive in asyncio because it does not "stop" anything — it schedules a cancellation. When you write async with asyncio.timeout(5), the loop arranges for a callback to fire on its timer heap five seconds from now, and that callback calls Task.cancel() on the running task. Everything that makes cancellation hard — cleanup that must still run, CancelledError that must propagate, blocking calls that ignore it — applies in full to timeouts. Get this wrong and a timeout becomes a source of leaked tasks, half-written state, and retry storms rather than a bound on latency. This guide covers the two modern APIs (asyncio.timeout() and asyncio.wait_for()), absolute deadlines with asyncio.timeout_at(), propagating a single deadline budget across sequential calls, nesting inner and outer bounds, and shielding the finalizers that must survive the cut. The companion reference on choosing asyncio.timeout vs wait_for drills into the API decision itself.
Architectural Principles¶
- A timeout is scheduled cancellation, not a kill switch. It enqueues a
call_lateron the loop's timer heap that cancels the task. Nothing is forcibly stopped; the task receives aCancelledErrorat its next suspension point and must unwind cooperatively. - A deadline is absolute; a timeout is relative. "Cancel in 5 seconds" (
timeout) drifts every time you re-derive it across steps. "Cancel at clock time T" (timeout_at) is the only correct way to spread one budget over several sequential awaits. - The tightest enclosing deadline wins. Nest timeouts freely, but understand that the inner bound can only ever shorten the effective deadline, never extend the outer one. An inner timeout firing first masks the outer budget.
- Cleanup is not exempt from the deadline. Code in a
finallyor__aexit__runs during cancellation and can itself be cancelled by an outer timeout. Critical finalizers needasyncio.shield()or their own fresh deadline. - A blocking call cannot be timed out. Cancellation is delivered at
awaitpoints only. A synchronousrequests.get()or a CPU loop with no suspension will run to completion regardless of any enclosingasyncio.timeout().
How a Timeout Interacts With the Loop Scheduler¶
Timeouts live entirely inside the Resilience, Cancellation & Error Handling execution model and ride on the same timer machinery as asyncio.sleep(). When you enter asyncio.timeout(d), the context manager captures the current task and calls loop.call_at(loop.time() + d, ...) to register a cancellation callback on the loop's timer heap — a min-heap keyed by deadline that the event loop drains on every iteration. Understanding that heap is the same context that governs event loop configuration: the loop computes how long to block in select() from the nearest timer, wakes when it expires, and runs the due callbacks.
When the deadline arrives, the scheduled callback calls task.cancel(). That does not interrupt the task mid-statement — it sets a flag and arranges for a CancelledError to be raised inside the coroutine at its next suspension point. If the task is parked on an await, it resumes by raising CancelledError from that await. If the task is busy in synchronous code, the exception is deferred until control returns to the loop, which is precisely why a blocking call defeats the timeout: the loop never gets a chance to deliver it.
The asyncio.timeout() context manager then catches that CancelledError on the way out, confirms the deadline (not an external cancel) caused it, and converts it into a TimeoutError. asyncio.wait_for() does the same for a single awaitable but additionally awaits the cancelled inner task before re-raising, guaranteeing the inner coroutine has finished unwinding before you see the error. This conversion-and-await dance is the entire substance of the two APIs; everything below is about applying it correctly.
Pattern Catalogue¶
Wrap a Block With async with asyncio.timeout(d)¶
The default modern choice. asyncio.timeout() bounds an arbitrary block of awaits — possibly several calls, branches, and loops — with a single relative deadline, and converts the resulting cancellation into TimeoutError. Use it whenever the unit you want to bound is "this whole operation," not one specific call.
Diagnostic Hook: log the elapsed time alongside the TimeoutError. If the block consistently times out at exactly the limit, the bound is too tight for the real p99; if it times out far short, an inner timeout or an external cancel — not yours — fired.
Per-Call Bound With wait_for¶
asyncio.wait_for(aw, timeout) wraps exactly one awaitable, cancels it on expiry, awaits it to completion, then raises TimeoutError. Reach for it when you have a single coroutine or future to bound and you want the guarantee that the inner task has fully unwound before control returns. See choosing asyncio.timeout vs wait_for for the full trade-off.
Diagnostic Hook: count TimeoutError per downstream. A per-call timeout rate that rises with no latency change in the upstream usually means the bound is below the dependency's real p99, not that the dependency degraded.
Shared Deadline Budget With timeout_at¶
When one request must complete all of its steps within a single overall budget, convert the budget to an absolute deadline once and pass that deadline — not a shrinking relative number — to every step via asyncio.timeout_at(). Each step is bounded by the same wall-clock instant, so time spent in early steps correctly reduces what later steps get.
Diagnostic Hook: record deadline - loop.time() (remaining budget) before each major step. A step that routinely sees near-zero remaining budget is the one starving the rest of the pipeline.
Nested Timeouts (Inner vs Outer)¶
Timeouts compose: an outer bound on the whole operation plus tighter inner bounds on individual risky calls. The effective deadline at any point is the minimum of all enclosing deadlines. The inner timeout protects one call; the outer caps total time even if many cheap calls add up.
When the inner TimeoutError is caught inside the outer block, the loop continues — but if the outer deadline expires, the cancellation propagates out and the for loop stops. Distinguish them: an inner timeout is a per-item failure; an outer one is a whole-operation failure.
Diagnostic Hook: tag each TimeoutError with which scope produced it (catch inner ones locally, let outer ones surface). Mixing the two in one counter hides whether you are losing individual items or blowing the global budget.
Shielding a Critical Finalizer From the Timeout¶
Cleanup runs during cancellation, so an outer deadline can cancel your finally block mid-flush — committing a transaction, releasing a lease, sending a final ack. Wrap the finalizer in asyncio.shield() (and give it its own fresh deadline) so the cut cannot abort it. This is the timeout-specific face of the broader cancellation patterns toolkit.
Diagnostic Hook: emit a counter for "cleanup ran after timeout." If shielded cleanups themselves start timing out, your finalizer budget is too small or the resource is wedged — alert separately from the main operation's timeout rate.
Resource Boundaries¶
Choosing timeout values is a capacity decision, not a guess. Three rules bound the choice:
- Set the timeout above your real p99, not your median. A bound at the median guarantees that the slowest half of healthy requests are cancelled and retried, which doubles load on a dependency precisely when it is already slow. Measure the dependency's latency distribution and place the timeout above p99 with margin (often p99 × 1.5).
- The timeout and the retry budget are one calculation. If you retry on timeout, the total time a caller waits is roughly
timeout × (attempts)plus backoff. A 2 s timeout with 3 retries is a 6 s+ user-facing latency. Coordinate the per-attempt timeout with the retry and backoff strategies so the product stays within the caller's deadline — ideally use onetimeout_atdeadline as the hard ceiling and let retries consume the remaining budget. - Inner bounds must sum below the outer bound — with slack. If an outer 5 s deadline wraps five steps each bounded at 1 s, there is zero slack for scheduling and the outer will fire mid-cleanup. Leave the outer deadline meaningfully larger than the sum of inner ones, or drive every step from one shared
timeout_at.
Integrated Production Example¶
A multi-step request handler that derives one absolute deadline, spreads it across sequential downstream calls with timeout_at, applies a tighter per-call bound where one dependency is known-flaky, shields the audit write so it survives the cut, and exports the metrics you need to tune the bounds.
Diagnostic Hook: the audit["steps"] map plus the outcome field are your tuning dataset. Aggregate per-step latency to find which dependency consumes the budget, and compare the deadline_exceeded rate against each downstream's measured p99 — if they diverge, your bound is mis-sized rather than the dependency being slow.
Diagnostic Hook — timeout health metrics
Track these three numbers per timeout site and alert on them:
- Timeout rate (
TimeoutError÷ total calls): a healthy bound sits well under 1%. A rate climbing without a corresponding latency shift means the bound is too tight, not that the dependency degraded. - p99 latency vs the timeout value: keep the timeout comfortably above measured p99. When p99 creeps up toward the bound, you are about to start cancelling healthy requests — raise the bound or fix the latency before the timeout rate spikes.
- Shielded-cleanup completion rate: cleanups that run after a timeout should virtually always finish. A falling completion rate means finalizers are being starved of budget and state is being left inconsistent.
Failure Modes¶
| Failure mode | Root cause | Detection | Fix |
|---|---|---|---|
| Timeout too tight triggers a retry storm | Bound set near median latency, so the slow half of healthy requests is cancelled and retried, amplifying load on an already-slow dependency | Timeout rate and retry rate spike together while upstream latency only mildly rose; dependency saturates | Set the timeout above measured p99 (e.g. p99 × 1.5); coordinate with the retry budget so total attempts stay within the caller deadline |
Swallowed CancelledError from a timeout |
A broad except Exception or bare except inside the bounded block catches the timeout's injected cancellation and continues |
The operation never honours the deadline; TimeoutError is never raised; latency exceeds the bound |
Never catch CancelledError/BaseException to suppress it inside a timeout; let it propagate so the context manager can convert it |
| Blocking/sync call ignores the timeout | A synchronous call (requests.get, a tight CPU loop, time.sleep) runs with no await, so cancellation can't be delivered |
The task overruns the deadline with no TimeoutError; the loop appears "stuck"; loop.slow_callback_duration warnings |
Move blocking work off the loop via asyncio.to_thread() / an executor so it sits behind a real await point that cancellation can reach |
| Inner timeout masks the outer deadline | An inner bound shorter than the remaining outer budget fires first and is caught, so the outer deadline never governs and per-item failures hide a blown global budget | Outer TimeoutError almost never seen; total latency creeps up via many caught inner timeouts |
Drive sequential steps from one shared timeout_at; size inner bounds against remaining budget (min(inner, remaining())); tag which scope fired |
Frequently Asked Questions¶
Is an asyncio timeout the same as forcibly stopping an operation?
No. A timeout schedules a cancellation on the loop's timer heap. When the deadline fires it calls task.cancel(), which raises CancelledError at the task's next await point. Nothing is forcibly interrupted, so the operation still unwinds cooperatively and any cleanup in finally or aexit still runs.
Why does my asyncio.timeout not fire around a blocking call?
Cancellation is delivered only at await points. A synchronous call like requests.get or a tight CPU loop never yields to the loop, so the scheduled cancellation cannot be injected and the task overruns the deadline with no TimeoutError. Move blocking work off the loop with asyncio.to_thread or an executor so it sits behind a real await.
How do I apply one deadline across several sequential async calls?
Convert the budget to an absolute instant once with loop.time() + budget, then wrap the whole sequence in asyncio.timeout_at(deadline). Every step is bounded by the same wall-clock instant, so time spent in early steps correctly reduces what later steps get, unlike re-deriving a fresh relative timeout per call.
How do I stop a timeout from cancelling my cleanup code?
Cleanup runs during cancellation, so an outer deadline can abort a finally block mid-flush. Wrap the critical finalizer in asyncio.shield() so the in-flight cancellation cannot abort it, and give the finalizer its own bounded timeout (via wait_for) so it cannot hang forever.
What is the relationship between timeout values and retries?
They are a single calculation. With retries, total wait is roughly per-attempt timeout times the number of attempts plus backoff, so a 2-second timeout with three retries is over six seconds of user-facing latency. Set the per-attempt timeout above the dependency's p99 and use one shared deadline as a hard ceiling that retries consume.
Related¶
- Resilience, Cancellation & Error Handling — up to the overview for the full reliability mental model.
- Cancellation Patterns — the cancellation mechanics every timeout depends on, including shielding cleanup.
- Retry & Backoff Strategies — pair per-attempt timeouts with a retry budget so total latency stays bounded.
- Choosing asyncio.timeout vs wait_for — the focused decision and migration guide for the two APIs.
- Event Loop Configuration — the timer heap and scheduler context that makes a timeout fire.