Preventing CancelledError Leaks in Cleanup¶
Your service hangs on shutdown, or you see Task was destroyed but it is pending in the logs, or a connection pool slowly exhausts itself even though every request "completes." The common cause is a cleanup block that eats CancelledError: a try/except around resource teardown catches the cancellation along with everything else and never lets it propagate. The task then never reaches the CANCELLED state, so the loop still considers it alive — a zombie that holds its sockets, files, and locks open and refuses to drain when you cancel it. This guide reproduces that leak, shows how to detect it with an asyncio.all_tasks() audit, and walks the narrow-catch / re-raise / bounded-shield fix to confirm the task actually ends cancelled.
Prerequisites¶
- Python 3.11+. The examples use
asyncio.timeout(),asyncio.TaskGroup, andTask.cancelling(). The narrow-catch rule applies to 3.8+ (whereCancelledErrorbecame aBaseException). - Familiarity with the cancellation model. This is a detail page under Cancellation patterns; read that for why
CancelledErroris control flow and must propagate. The broader failure-handling context is Resilience, Cancellation & Error Handling. - A way to inspect tasks. All diagnostics use
asyncio.all_tasks()andtask.cancelled(), both stdlib.
1. Reproduce the leak¶
The fastest way to understand the bug is to build one. A broad except Exception cannot catch CancelledError in 3.8+, but a try/except CancelledError that returns instead of re-raising — or an except BaseException — will. Here is the second, most common form: cleanup that catches the cancel and swallows it.
Verify: the program prints task.cancelled() -> False. The return turned a cancellation into a normal completion. In a real worker pool this task is now indistinguishable from one that finished its job, but any resource its loop body held was abandoned mid-cycle.
2. Detect it with an all_tasks audit¶
In production you rarely have the toy reproduction; you have a hang. Add a pre-close audit that lists tasks still alive after you have cancelled and drained them. A task that ignored its cancel shows up here.
Verify: run this against the leaky worker from step 1 and it reports LEAK: ... ended without cancelling. The discriminator is not t.cancelled(): a task you cancelled that reports cancelled() == False swallowed its CancelledError. Sampling len(asyncio.all_tasks()) on a timer gives the same signal live — a count that does not fall to baseline after a drain is a swallowed cancel.
3. Fix: catch narrowly and re-raise, or use try/finally¶
The fix is to make cleanup transparent to cancellation. The safest form is try/finally with no except — finally runs your cleanup and lets CancelledError keep propagating automatically. If you must catch it (to log or roll back), catch CancelledError specifically and raise at the end.
Verify: wrap either worker in the step-1 harness; t.cancelled() now returns True. The try/finally variant is preferred for plain resource release because it structurally cannot forget the re-raise. Reserve the explicit except CancelledError for when cleanup itself differs on the cancel path.
4. Protect must-run cleanup with shield and a bounded timeout¶
Some cleanup must complete even though the task is being cancelled — a commit, a final flush. An unprotected await in finally can be re-cancelled and silently skipped. Wrap it in asyncio.shield() and bound it with asyncio.timeout() so it neither gets skipped nor hangs forever.
Verify: cancel this worker and confirm flush completes (the buffer reaches storage) while the worker still ends cancelled — because the finally re-raises the original CancelledError after the shielded flush. The bounded timeout(1.0) guarantees that even a stuck flush gives up rather than hanging the drain. The deadline mechanics here are detailed in Timeouts and deadlines.
5. Verify the task reaches CANCELLED¶
Close the loop with an assertion-grade check so the fix is regression-proof. Cancel, drain, and assert both the state and that resources were released.
Verify: the program prints the success line and all three assertions hold. t.cancelled() == True is the authoritative signal that the cancellation propagated; conn.closed == True confirms cleanup still ran. Bake this assertion pattern into your shutdown tests for any long-lived worker.
Verification¶
After applying the fix, the system should satisfy all of the following:
- State is CANCELLED: for every task you cancel,
task.cancelled()returnsTrueafter draining — neverFalsewith a normal result. - No survivors after drain:
[t for t in asyncio.all_tasks() if not t.done()]is empty (minus the runner task) once shutdown completes; the pending count falls to baseline on a timer. - No finalization warnings: the log no longer emits
Task was destroyed but it is pending. Run withPYTHONASYNCIODEBUG=1to surface any remaining ones with a creation traceback. - Resources released: connection-pool / open-file gauges return to idle after shutdown rather than leaking one handle per cancelled task.
Pitfalls & edge cases¶
except Exceptiondoes NOT catchCancelledErrorin 3.8+, butexcept BaseExceptiondoes. If you must useexcept BaseException(rare), add an explicitif isinstance(exc, asyncio.CancelledError): raisebefore any generic handling, or you will swallow the cancel.- Cleanup that awaits can be re-cancelled. Under a deadline-driven shutdown, the
awaitin yourfinallymay receive a secondCancelledError. Useasyncio.shield()+ a boundedtimeout()for must-run cleanup, and keep finalizers shallow so the cancellation count stays interpretable. uncancel()does not undo a swallowed cancel.Task.uncancel()only decrements the cancellation counter for legitimate handoffs (likeasyncio.timeout()); it is not a way to "absorb" a cancel you do not want. Calling it without a matchingcancelling()increment corrupts the accounting and can suppress future cancellations.- A bare
returninsideexcept CancelledErroris the silent killer. It compiles, runs, and looks like cleanup, but it converts cancellation into completion. Lint forexcept asyncio.CancelledErrorblocks that lack a trailingraise. TaskGroupcleanup counts too. A child whose cleanup swallowsCancelledErrordefeats the group's fail-fast cancellation of siblings; the group then waits on a task that will never end. The same narrow-catch / re-raise rule applies inside TaskGroup children.
Frequently Asked Questions¶
Why does my asyncio task hang on shutdown after I added cleanup code?
Your cleanup almost certainly swallows CancelledError — typically an except block that returns instead of re-raising, or an except BaseException. The task then never reaches the CANCELLED state, so the loop still considers it alive and the drain never completes. Use try/finally or catch CancelledError narrowly and re-raise it.
Does except Exception catch CancelledError in modern Python?
No. Since Python 3.8, CancelledError derives from BaseException, so except Exception does not catch it. However, except BaseException does catch it, as does an explicit except asyncio.CancelledError. If you swallow it in either of those, the cancellation leaks.
How do I detect a task that swallowed its cancellation?
After cancelling and draining with asyncio.gather(*tasks, return_exceptions=True), check each task: one you cancelled that reports task.cancelled() == False swallowed the cancel. Live, sample len([t for t in asyncio.all_tasks() if not t.done()]) on a timer — a count that does not fall to baseline after a drain is the signature.
Related¶
- Cancellation patterns — up to the overview for the full catalogue of safe cancel, shield, and drain patterns.
- Exception groups and TaskGroups — how swallowed cancellations inside a child break a group's structured teardown.
- Resilience, Cancellation & Error Handling — the overview tying cancellation, timeouts, and error propagation together.