Migrating legacy threading code to asyncio without downtime¶
You have a live service built on threading and ThreadPoolExecutor, and you need it on asyncio — for connection density, structured cancellation, or to shed per-thread stack memory — without a maintenance window. A big-bang rewrite is not an option: the service takes traffic, the blocking calls are buried in vendor SDKs, and a single synchronous call left on the event loop will freeze every request at once. The safe path is incremental: audit what will block the loop, wrap it behind an async bridge, run both paths behind a flag, then drain and cut over. This guide walks that path with verifiable checkpoints at each phase.
Prerequisites¶
- Python 3.11+ for
asyncio.TaskGroup,asyncio.timeout(), andasyncio.to_thread(). pip install py-spyfor live thread-state capture during the audit (the bridge and routing code is stdlib-only).- Assumed knowledge: you understand the GIL and the cooperative loop contract from Threading vs Multiprocessing vs Asyncio, and the executor-bridging idea behind hybrid concurrency models. For the broader worker context, see Concurrent Execution & Worker Patterns.
Step 1: Audit the blocking calls that will starve the loop¶
Before any code moves, snapshot every thread's stack and flag the primitives that block: lock acquisitions, queue.get(), blocking socket reads, time.sleep. These are the calls that, run directly in a coroutine, freeze the whole loop.
Verify: run this under live load (or pair it with py-spy dump --threads against the production PID). Every flagged call site is something you must either replace with an async-native client or route through the bridge in Step 2. Record baseline P95/P99 latency, throughput, and RSS now — they are your regression gates for the cutover.
Step 2: Build the run_in_executor bridge¶
Wrap each blocking function in an async shim so coroutines can call it without blocking the loop. The shim runs the synchronous call on a dedicated ThreadPoolExecutor and awaits its result.
Verify: with the bridge in place, an async handler that calls await run_legacy(blocking_db_query, sql) keeps the loop responsive — confirm by measuring event-loop lag (a backgrounded await asyncio.sleep(0) timer) stays low while the blocking call runs. Use asyncio.to_thread() directly for one-off blocking I/O; keep a named, bounded pool for hot paths so you can monitor its queue depth. Reserve CPU-bound legacy functions for a ProcessPoolExecutor instead — to_thread would saturate the bridge and starve I/O, as detailed in CPU-bound task offloading.
Step 3: Route dual-path behind a feature flag¶
Stand up the async handler beside the legacy one and route a configurable fraction of traffic to it. Both paths must share the exact request/response contract so you can compare them and roll back instantly by flipping the flag.
Verify: at 10% async weight, the async path's latency and error-rate distributions should track the legacy path within a few percent. A >10% divergence in timeout rate almost always means a blocking call slipped onto the loop unbridged — go back to Step 1's audit for that handler. The _inflight registry is what makes the drain in Step 4 lossless.
Step 4: Drain and cut over¶
Ramp the flag (10% to 50% to 100% over a day or two, validating SLOs at each step), then retire the thread pool by draining in-flight work before closing the loop's resources.
Verify: post-cutover, confirm zero dropped connections, RSS stable or lower than the Phase 1 baseline (you have shed per-thread stacks), and no ResourceWarning traces. len(threading.enumerate()) should fall to the loop's small fixed set. If thread count stays elevated, the bridge pool was never drained.
Verification¶
The migration is complete and safe when, at 100% async weight:
- P95/P99 latency and error rate match or beat the Phase 1 baseline recorded in Step 1.
- Resident memory is flat or reduced — the visible payoff of dropping OS-thread stacks for coroutine state.
threading.enumerate()shows only the loop's own threads plus the (now idle, drainable) bridge pool; noResourceWarningon shutdown.- Flipping
async_weightback to 0.0 still serves traffic correctly — keep the legacy path until you have soaked at 100% for a full traffic cycle.
Pitfalls & edge cases¶
- A blocking call left on the loop.
requests.get()or a sync driver inside a coroutine freezes every concurrent task at once. Anything not async-native must go throughrun_in_executor/to_thread. - Shared mutable state across the thread/async boundary. A bridge thread mutating a dict an async task iterates raises
RuntimeError: dictionary changed size during iteration. Guard shared state withasyncio.Lockplus athreading.Lock, or pass immutable copies. asyncio.to_threadfor CPU-heavy work. It uses the default thread pool; CPU work saturates it and starves I/O. Route compute to aProcessPoolExecutor— see Threading vs Multiprocessing vs Asyncio.- Undrained async generators on shutdown. Skipping
loop.shutdown_asyncgens()leaves sockets and file descriptors open, leaking in long-running services. - No connection draining at cutover. Killing workers mid-stream drops TCP connections and triggers client retry storms. Always drain in-flight tasks and handle
SIGTERMgracefully.
Frequently Asked Questions¶
Can I migrate to asyncio without rewriting my entire codebase?
Yes. Wrap legacy blocking functions with asyncio.to_thread() or loop.run_in_executor() so they run on a thread pool while coroutines await them. This enables incremental migration that preserves existing business logic until each path can be rewritten to native async I/O.
How do I prevent event loop starvation during migration?
Never run blocking I/O or CPU work directly on the loop. Offload it to a dedicated ThreadPoolExecutor or ProcessPoolExecutor, and watch loop.slow_callback_duration (set to about 0.1s) to catch accidental blocking early.
What is the safest way to cut over traffic without dropping requests?
Use a dual-path router controlled by a dynamic feature flag. Send a small percentage of traffic to the async path, validate metrics, then ramp gradually. Drain in-flight requests and call executor.shutdown(wait=True) before retiring the thread pool so nothing is dropped.
How do I debug deadlocks that span threads and async tasks?
Tag logs with a unique request ID across both boundaries, enable PYTHONASYNCIODEBUG=1 in staging to surface unawaited coroutines and long callbacks, and cross-reference asyncio.all_tasks() with threading.enumerate() to find cross-boundary lock contention.
Related¶
- Threading vs Multiprocessing vs Asyncio — the parent guide for choosing the target model before you migrate.
- Hybrid concurrency models — the loop-plus-executor shape this migration produces.
- Concurrent Execution & Worker Patterns — up to the parent overview for worker lifecycle and shutdown patterns.