Skip to content

Preventing CancelledError Leaks in Cleanup

Your service hangs on shutdown, or you see Task was destroyed but it is pending in the logs, or a connection pool slowly exhausts itself even though every request "completes." The common cause is a cleanup block that eats CancelledError: a try/except around resource teardown catches the cancellation along with everything else and never lets it propagate. The task then never reaches the CANCELLED state, so the loop still considers it alive — a zombie that holds its sockets, files, and locks open and refuses to drain when you cancel it. This guide reproduces that leak, shows how to detect it with an asyncio.all_tasks() audit, and walks the narrow-catch / re-raise / bounded-shield fix to confirm the task actually ends cancelled.

Prerequisites

  • Python 3.11+. The examples use asyncio.timeout(), asyncio.TaskGroup, and Task.cancelling(). The narrow-catch rule applies to 3.8+ (where CancelledError became a BaseException).
  • Familiarity with the cancellation model. This is a detail page under Cancellation patterns; read that for why CancelledError is control flow and must propagate. The broader failure-handling context is Resilience, Cancellation & Error Handling.
  • A way to inspect tasks. All diagnostics use asyncio.all_tasks() and task.cancelled(), both stdlib.

1. Reproduce the leak

The fastest way to understand the bug is to build one. A broad except Exception cannot catch CancelledError in 3.8+, but a try/except CancelledError that returns instead of re-raising — or an except BaseException — will. Here is the second, most common form: cleanup that catches the cancel and swallows it.

import asyncio


async def leaky_worker() -> None:
    try:
        while True:
            await asyncio.sleep(0.1)  # cancelled here
    except asyncio.CancelledError:
        # BUG: caught to "clean up" but never re-raised.
        print("cleaning up...")  # cleanup runs
        return                   # <-- swallows the cancellation


async def main() -> None:
    t = asyncio.create_task(leaky_worker())
    await asyncio.sleep(0.2)
    t.cancel()
    await t  # completes "normally" — no CancelledError propagates
    print("task.cancelled() ->", t.cancelled())  # False!


asyncio.run(main())

Verify: the program prints task.cancelled() -> False. The return turned a cancellation into a normal completion. In a real worker pool this task is now indistinguishable from one that finished its job, but any resource its loop body held was abandoned mid-cycle.

2. Detect it with an all_tasks audit

In production you rarely have the toy reproduction; you have a hang. Add a pre-close audit that lists tasks still alive after you have cancelled and drained them. A task that ignored its cancel shows up here.

import asyncio


async def audit_after_drain(tasks: list[asyncio.Task]) -> None:
    for t in tasks:
        t.cancel()
    await asyncio.gather(*tasks, return_exceptions=True)

    leaked = [t for t in tasks if not t.cancelled() and not t.exception()]
    if leaked:
        for t in leaked:
            print(f"LEAK: {t.get_name()} ended without cancelling: {t._state}")
    else:
        print("clean: every task reached CANCELLED")

Verify: run this against the leaky worker from step 1 and it reports LEAK: ... ended without cancelling. The discriminator is not t.cancelled(): a task you cancelled that reports cancelled() == False swallowed its CancelledError. Sampling len(asyncio.all_tasks()) on a timer gives the same signal live — a count that does not fall to baseline after a drain is a swallowed cancel.

3. Fix: catch narrowly and re-raise, or use try/finally

The fix is to make cleanup transparent to cancellation. The safest form is try/finally with no exceptfinally runs your cleanup and lets CancelledError keep propagating automatically. If you must catch it (to log or roll back), catch CancelledError specifically and raise at the end.

import asyncio


async def fixed_worker_finally(conn) -> None:
    try:
        while True:
            await asyncio.sleep(0.1)
    finally:
        conn.close()  # runs on cancel; CancelledError keeps propagating


async def fixed_worker_explicit(conn) -> None:
    try:
        while True:
            await asyncio.sleep(0.1)
    except asyncio.CancelledError:
        conn.close()         # narrow catch, for logging/rollback
        print("rolled back")
        raise                # <-- mandatory: re-raise so task ends CANCELLED

Verify: wrap either worker in the step-1 harness; t.cancelled() now returns True. The try/finally variant is preferred for plain resource release because it structurally cannot forget the re-raise. Reserve the explicit except CancelledError for when cleanup itself differs on the cancel path.

4. Protect must-run cleanup with shield and a bounded timeout

Some cleanup must complete even though the task is being cancelled — a commit, a final flush. An unprotected await in finally can be re-cancelled and silently skipped. Wrap it in asyncio.shield() and bound it with asyncio.timeout() so it neither gets skipped nor hangs forever.

import asyncio
import contextlib


async def flush(buffer) -> None:
    await asyncio.sleep(0.05)  # must reach storage


async def worker_with_protected_flush(buffer) -> None:
    try:
        while True:
            await asyncio.sleep(0.1)
    finally:
        # Must-run, but bounded so it cannot stall shutdown.
        with contextlib.suppress(TimeoutError, asyncio.CancelledError):
            async with asyncio.timeout(1.0):
                await asyncio.shield(flush(buffer))

Verify: cancel this worker and confirm flush completes (the buffer reaches storage) while the worker still ends cancelled — because the finally re-raises the original CancelledError after the shielded flush. The bounded timeout(1.0) guarantees that even a stuck flush gives up rather than hanging the drain. The deadline mechanics here are detailed in Timeouts and deadlines.

5. Verify the task reaches CANCELLED

Close the loop with an assertion-grade check so the fix is regression-proof. Cancel, drain, and assert both the state and that resources were released.

import asyncio


class FakeConn:
    closed = False
    def close(self) -> None:
        self.closed = True


async def fixed_worker(conn: FakeConn) -> None:
    try:
        while True:
            await asyncio.sleep(0.1)
    finally:
        conn.close()


async def main() -> None:
    conn = FakeConn()
    t = asyncio.create_task(fixed_worker(conn))
    await asyncio.sleep(0.2)
    t.cancel()
    results = await asyncio.gather(t, return_exceptions=True)

    assert t.cancelled(), "task did not reach CANCELLED — cancel was swallowed"
    assert conn.closed, "resource not released during cleanup"
    assert isinstance(results[0], asyncio.CancelledError)
    print("verified: task cancelled and resources released")


asyncio.run(main())

Verify: the program prints the success line and all three assertions hold. t.cancelled() == True is the authoritative signal that the cancellation propagated; conn.closed == True confirms cleanup still ran. Bake this assertion pattern into your shutdown tests for any long-lived worker.

Verification

After applying the fix, the system should satisfy all of the following:

  • State is CANCELLED: for every task you cancel, task.cancelled() returns True after draining — never False with a normal result.
  • No survivors after drain: [t for t in asyncio.all_tasks() if not t.done()] is empty (minus the runner task) once shutdown completes; the pending count falls to baseline on a timer.
  • No finalization warnings: the log no longer emits Task was destroyed but it is pending. Run with PYTHONASYNCIODEBUG=1 to surface any remaining ones with a creation traceback.
  • Resources released: connection-pool / open-file gauges return to idle after shutdown rather than leaking one handle per cancelled task.

Pitfalls & edge cases

  • except Exception does NOT catch CancelledError in 3.8+, but except BaseException does. If you must use except BaseException (rare), add an explicit if isinstance(exc, asyncio.CancelledError): raise before any generic handling, or you will swallow the cancel.
  • Cleanup that awaits can be re-cancelled. Under a deadline-driven shutdown, the await in your finally may receive a second CancelledError. Use asyncio.shield() + a bounded timeout() for must-run cleanup, and keep finalizers shallow so the cancellation count stays interpretable.
  • uncancel() does not undo a swallowed cancel. Task.uncancel() only decrements the cancellation counter for legitimate handoffs (like asyncio.timeout()); it is not a way to "absorb" a cancel you do not want. Calling it without a matching cancelling() increment corrupts the accounting and can suppress future cancellations.
  • A bare return inside except CancelledError is the silent killer. It compiles, runs, and looks like cleanup, but it converts cancellation into completion. Lint for except asyncio.CancelledError blocks that lack a trailing raise.
  • TaskGroup cleanup counts too. A child whose cleanup swallows CancelledError defeats the group's fail-fast cancellation of siblings; the group then waits on a task that will never end. The same narrow-catch / re-raise rule applies inside TaskGroup children.

Frequently Asked Questions

Why does my asyncio task hang on shutdown after I added cleanup code?

Your cleanup almost certainly swallows CancelledError — typically an except block that returns instead of re-raising, or an except BaseException. The task then never reaches the CANCELLED state, so the loop still considers it alive and the drain never completes. Use try/finally or catch CancelledError narrowly and re-raise it.

Does except Exception catch CancelledError in modern Python?

No. Since Python 3.8, CancelledError derives from BaseException, so except Exception does not catch it. However, except BaseException does catch it, as does an explicit except asyncio.CancelledError. If you swallow it in either of those, the cancellation leaks.

How do I detect a task that swallowed its cancellation?

After cancelling and draining with asyncio.gather(*tasks, return_exceptions=True), check each task: one you cancelled that reports task.cancelled() == False swallowed the cancel. Live, sample len([t for t in asyncio.all_tasks() if not t.done()]) on a timer — a count that does not fall to baseline after a drain is the signature.

Leaked vs propagated cancellation path Left: cleanup swallows CancelledError, the task completes normally and becomes a zombie. Right: cleanup re-raises, the task reaches CANCELLED and resources are released. Swallowed vs re-raised CancelledError Leaked (bug) Propagated (fix) cancel() at await except: return swallows the cancel ZOMBIE TASK cancelled() == False cancel() at await finally: cleanup; raise re-raises the cancel CANCELLED resources released