Running Blocking SDK Calls with asyncio.to_thread¶
A payment provider, a cloud storage client, a legacy internal library — plenty of third-party SDKs ship only a synchronous API. Call one directly from a coroutine and its blocking network round-trip freezes your entire event loop: every other in-flight request stalls for the duration, throughput collapses, and your p99 latency tracks the slowest SDK call. The fix is not to rewrite the SDK; it is to push each blocking call onto a worker thread with asyncio.to_thread so the loop stays free to service everything else while the thread blocks harmlessly.
This guide wraps a synchronous SDK correctly: confirm it actually blocks, move it off the loop, bound the thread fan-out, add a timeout, and handle arguments and exceptions — with a clear-eyed look at the one thing to_thread cannot do, which is truly cancel a running thread.
Prerequisites¶
- Python 3.11+ for
asyncio.timeout()and currentto_threadsemantics. - The boundary rules from the parent Hybrid Concurrency Models guide, and the Concurrent Execution & Worker Patterns overview for where threads fit overall.
- A synchronous SDK you cannot make async. (If the SDK is CPU-bound, threads will not help — see the cancellation note in step 4 and CPU-bound task offloading.)
1. Confirm the SDK actually blocks¶
Do not guess. Turn on debug mode and let the loop tell you. loop.set_debug(True) logs a warning whenever a callback occupies the loop thread longer than loop.slow_callback_duration (default 0.1s) — a synchronous SDK call that does network I/O will trip it immediately.
Verify: you see Executing <...> took 0.400 seconds on stderr. That warning is your proof the call blocks the loop and must be offloaded. If it does not appear, the call is already non-blocking and you do not need this guide.
2. Wrap the call with asyncio.to_thread¶
Move the call onto a worker thread. await asyncio.to_thread(fn, *args, **kwargs) submits fn to the loop's default ThreadPoolExecutor, suspends the coroutine, and lets the loop run other tasks until the thread returns.
Verify: with loop.set_debug(True) the slow-callback warning is gone, and the two gather-ed calls finish in ~0.4s total rather than ~0.8s — proof the loop is no longer serialized behind the SDK.
3. Bound concurrency with a Semaphore¶
Threads are finite. to_thread uses the shared default pool, capped at min(32, os.cpu_count() + 4) workers; fan out beyond that and calls queue silently behind the cap, so latency climbs with no error. Gate the fan-out with an asyncio.Semaphore sized to a budget you actually want this SDK to consume.
Verify: instrument the gate — log 8 - _sdk_gate._value (acquired count) or wrap acquisition with a counter. Under the 100-request burst it should plateau at 8, never higher, while overall throughput stays steady instead of degrading as the default pool saturates.
4. Add a timeout (and respect the cancellation caveat)¶
Wrap the call in asyncio.timeout() so a hung SDK call does not pin a thread forever from the caller's perspective. The critical caveat: to_thread cannot truly cancel a running thread. When the timeout fires, the awaiting coroutine raises TimeoutError and stops waiting, but the worker thread keeps executing the blocking call until it returns on its own. You free the caller, not the thread.
Because the thread is not reclaimed on timeout, repeated timeouts slowly drain the pool — another reason the step-3 semaphore matters. Where the SDK supports it, prefer a native SDK timeout passed as an argument (the call returns and frees the thread), and treat the asyncio.timeout() only as a backstop. The general rules for cooperative cancellation live in cancellation patterns.
Verify: force a slow call (e.g. time.sleep(5) in the SDK stub) and confirm the coroutine raises within timeout seconds. Then observe that the worker thread is still busy afterward (log on entry/exit of the SDK stub) — that lingering thread is the caveat made visible.
5. Pass arguments and handle exceptions¶
asyncio.to_thread forwards positional and keyword arguments straight through, and any exception the SDK raises propagates out of the await exactly as if you had called it directly — so normal try/except works. Avoid mutable shared arguments if the SDK is not thread-safe (see Pitfalls).
Verify: call charge("acct-1", -5, "k1") and confirm the ValueError raised in the worker thread surfaces as your wrapped RuntimeError in the awaiting coroutine — proving exceptions cross the thread boundary intact.
Verification¶
The wrapping is correct when, under load: the loop logs no slow-callback warnings attributable to the SDK; concurrent SDK calls overlap rather than serialize (total time tracks the slowest call in a batch of N, not the sum); the in-flight count never exceeds the semaphore bound; timeouts free the caller within the deadline; and SDK exceptions propagate to ordinary try/except. If all five hold, the synchronous SDK is integrated without compromising loop responsiveness.
Pitfalls & Edge Cases¶
- The default thread pool is small and shared.
to_threadusesmin(32, os.cpu_count()+4)workers shared across everyto_threadcall site. A noisy SDK starves the rest. For a hot path, build a dedicatedThreadPoolExecutorand call it vialoop.run_in_executor(my_pool, fn, ...). - A timed-out call does not free its thread.
asyncio.timeout()cancels the awaiting coroutine, not the thread; the worker runs to completion and holds its slot. Repeated timeouts drain the pool. Prefer the SDK's own timeout argument, and keep the semaphore tight. - Thread-unsafe SDKs shared across calls. Many SDK client objects are not safe to use from multiple threads concurrently. If the docs are silent, assume not: use a per-thread client (
threading.local), a client pool, or serialize calls through aSemaphore(1). - CPU-heavy SDK calls gain nothing from threads. The GIL serializes Python bytecode, so a compute-bound SDK call on a thread still blocks parallelism. Offload those to a
ProcessPoolExecutorinstead — see CPU-bound task offloading. - Losing
contextvarsis rare but possible.to_threadcopies the current context into the worker, so tracing IDs survive; but if you switch to a manualrun_in_executor, you must copy the context yourself withcontextvars.copy_context().
Frequently Asked Questions¶
Does asyncio.to_thread give true parallelism for CPU-bound SDK calls?
No. asyncio.to_thread runs the call on an OS thread, but the GIL serializes Python bytecode, so CPU-bound work gains no parallelism and still blocks other compute. It is the right tool only for blocking I/O. For CPU-heavy work, offload to a ProcessPoolExecutor via loop.run_in_executor instead.
Can asyncio.to_thread cancel a blocking SDK call on timeout?
No. When asyncio.timeout() fires, the awaiting coroutine raises TimeoutError and stops waiting, but the worker thread continues running the blocking call until it returns on its own and holds its pool slot until then. Use the SDK's native timeout argument to actually free the thread, and keep concurrency bounded with a Semaphore so repeated timeouts do not drain the pool.
How many concurrent to_thread calls can run at once?
asyncio.to_thread uses the event loop's default ThreadPoolExecutor, which caps at min(32, os.cpu_count()+4) workers shared across all call sites. Beyond that cap, calls queue silently rather than erroring, so latency rises with no exception. Bound fan-out with an asyncio.Semaphore and use a dedicated ThreadPoolExecutor via run_in_executor for hot paths.
Related¶
- Hybrid Concurrency Models — up to the overview for the full async/thread boundary and its safe bridges.
- Concurrent Execution & Worker Patterns — the parent overview for choosing threads versus processes versus async.
- Cancellation Patterns — why a running thread cannot be force-cancelled and how cooperative cancellation works.