Running Blocking SDK Calls with asyncio.to_thread¶

A payment provider, a cloud storage client, a legacy internal library — plenty of third-party SDKs ship only a synchronous API. Call one directly from a coroutine and its blocking network round-trip freezes your entire event loop: every other in-flight request stalls for the duration, throughput collapses, and your p99 latency tracks the slowest SDK call. The fix is not to rewrite the SDK; it is to push each blocking call onto a worker thread with asyncio.to_thread so the loop stays free to service everything else while the thread blocks harmlessly.

This guide wraps a synchronous SDK correctly: confirm it actually blocks, move it off the loop, bound the thread fan-out, add a timeout, and handle arguments and exceptions — with a clear-eyed look at the one thing to_thread cannot do, which is truly cancel a running thread.

Prerequisites¶

Python 3.11+ for asyncio.timeout() and current to_thread semantics.
The boundary rules from the parent Hybrid Concurrency Models guide, and the Concurrent Execution & Worker Patterns overview for where threads fit overall.
A synchronous SDK you cannot make async. (If the SDK is CPU-bound, threads will not help — see the cancellation note in step 4 and CPU-bound task offloading.)

1. Confirm the SDK actually blocks¶

Do not guess. Turn on debug mode and let the loop tell you. loop.set_debug(True) logs a warning whenever a callback occupies the loop thread longer than loop.slow_callback_duration (default 0.1s) — a synchronous SDK call that does network I/O will trip it immediately.

import asyncio
import time

def sdk_get_account(account_id: str) -> dict:
    time.sleep(0.4)                      # stand-in for a blocking SDK round-trip
    return {"id": account_id, "balance": 1000}

async def main() -> None:
    loop = asyncio.get_running_loop()
    loop.set_debug(True)
    loop.slow_callback_duration = 0.05   # be strict so blocking stands out
    # Called inline — WRONG on purpose, to produce the warning:
    sdk_get_account("acct-1")

asyncio.run(main())

Verify: you see Executing <...> took 0.400 seconds on stderr. That warning is your proof the call blocks the loop and must be offloaded. If it does not appear, the call is already non-blocking and you do not need this guide.

2. Wrap the call with asyncio.to_thread¶

Move the call onto a worker thread. await asyncio.to_thread(fn, *args, **kwargs) submits fn to the loop's default ThreadPoolExecutor, suspends the coroutine, and lets the loop run other tasks until the thread returns.

import asyncio
import time

def sdk_get_account(account_id: str) -> dict:
    time.sleep(0.4)
    return {"id": account_id, "balance": 1000}

async def get_account(account_id: str) -> dict:
    # Loop stays free for other coroutines while this thread blocks.
    return await asyncio.to_thread(sdk_get_account, account_id)

async def main() -> None:
    # Two calls now overlap instead of running back-to-back on the loop.
    a, b = await asyncio.gather(get_account("acct-1"), get_account("acct-2"))
    print(a, b)

asyncio.run(main())

Verify: with loop.set_debug(True) the slow-callback warning is gone, and the two gather-ed calls finish in ~0.4s total rather than ~0.8s — proof the loop is no longer serialized behind the SDK.

3. Bound concurrency with a Semaphore¶

Threads are finite. to_thread uses the shared default pool, capped at min(32, os.cpu_count() + 4) workers; fan out beyond that and calls queue silently behind the cap, so latency climbs with no error. Gate the fan-out with an asyncio.Semaphore sized to a budget you actually want this SDK to consume.

import asyncio

# Never let this SDK occupy more than 8 threads at once.
_sdk_gate = asyncio.Semaphore(8)

async def get_account(account_id: str) -> dict:
    async with _sdk_gate:
        return await asyncio.to_thread(sdk_get_account, account_id)

async def main() -> None:
    # 100 requests, but at most 8 blocking calls in flight at any moment.
    results = await asyncio.gather(*(get_account(f"acct-{i}") for i in range(100)))
    print(len(results))

asyncio.run(main())

Verify: instrument the gate — log 8 - _sdk_gate._value (acquired count) or wrap acquisition with a counter. Under the 100-request burst it should plateau at 8, never higher, while overall throughput stays steady instead of degrading as the default pool saturates.

4. Add a timeout (and respect the cancellation caveat)¶

Wrap the call in asyncio.timeout() so a hung SDK call does not pin a thread forever from the caller's perspective. The critical caveat: to_thread cannot truly cancel a running thread. When the timeout fires, the awaiting coroutine raises TimeoutError and stops waiting, but the worker thread keeps executing the blocking call until it returns on its own. You free the caller, not the thread.

import asyncio

async def get_account(account_id: str, timeout: float = 1.0) -> dict:
    async with _sdk_gate:
        try:
            async with asyncio.timeout(timeout):
                return await asyncio.to_thread(sdk_get_account, account_id)
        except TimeoutError:
            # The coroutine is freed; the worker THREAD still runs to completion
            # and holds its pool slot until then. Plan for that leakage.
            raise RuntimeError(f"SDK timed out for {account_id}") from None

Because the thread is not reclaimed on timeout, repeated timeouts slowly drain the pool — another reason the step-3 semaphore matters. Where the SDK supports it, prefer a native SDK timeout passed as an argument (the call returns and frees the thread), and treat the asyncio.timeout() only as a backstop. The general rules for cooperative cancellation live in cancellation patterns.

Verify: force a slow call (e.g. time.sleep(5) in the SDK stub) and confirm the coroutine raises within timeout seconds. Then observe that the worker thread is still busy afterward (log on entry/exit of the SDK stub) — that lingering thread is the caveat made visible.

5. Pass arguments and handle exceptions¶

asyncio.to_thread forwards positional and keyword arguments straight through, and any exception the SDK raises propagates out of the await exactly as if you had called it directly — so normal try/except works. Avoid mutable shared arguments if the SDK is not thread-safe (see Pitfalls).

import asyncio

class SDKAuthError(Exception): ...

def sdk_charge(account_id: str, *, amount: int, idempotency_key: str) -> dict:
    if amount <= 0:
        raise ValueError("amount must be positive")
    return {"account": account_id, "charged": amount, "key": idempotency_key}

async def charge(account_id: str, amount: int, key: str) -> dict:
    async with _sdk_gate:
        try:
            return await asyncio.to_thread(
                sdk_charge, account_id, amount=amount, idempotency_key=key
            )
        except ValueError as exc:               # SDK exception crosses the boundary
            raise RuntimeError(f"bad charge for {account_id}: {exc}") from exc

Verify: call charge("acct-1", -5, "k1") and confirm the ValueError raised in the worker thread surfaces as your wrapped RuntimeError in the awaiting coroutine — proving exceptions cross the thread boundary intact.

Verification¶

The wrapping is correct when, under load: the loop logs no slow-callback warnings attributable to the SDK; concurrent SDK calls overlap rather than serialize (total time tracks the slowest call in a batch of N, not the sum); the in-flight count never exceeds the semaphore bound; timeouts free the caller within the deadline; and SDK exceptions propagate to ordinary try/except. If all five hold, the synchronous SDK is integrated without compromising loop responsiveness.

Pitfalls & Edge Cases¶

The default thread pool is small and shared. to_thread uses min(32, os.cpu_count()+4) workers shared across every to_thread call site. A noisy SDK starves the rest. For a hot path, build a dedicated ThreadPoolExecutor and call it via loop.run_in_executor(my_pool, fn, ...).
A timed-out call does not free its thread. asyncio.timeout() cancels the awaiting coroutine, not the thread; the worker runs to completion and holds its slot. Repeated timeouts drain the pool. Prefer the SDK's own timeout argument, and keep the semaphore tight.
Thread-unsafe SDKs shared across calls. Many SDK client objects are not safe to use from multiple threads concurrently. If the docs are silent, assume not: use a per-thread client (threading.local), a client pool, or serialize calls through a Semaphore(1).
CPU-heavy SDK calls gain nothing from threads. The GIL serializes Python bytecode, so a compute-bound SDK call on a thread still blocks parallelism. Offload those to a ProcessPoolExecutor instead — see CPU-bound task offloading.
Losing contextvars is rare but possible. to_thread copies the current context into the worker, so tracing IDs survive; but if you switch to a manual run_in_executor, you must copy the context yourself with contextvars.copy_context().

Frequently Asked Questions¶

Does asyncio.to_thread give true parallelism for CPU-bound SDK calls?

No. asyncio.to_thread runs the call on an OS thread, but the GIL serializes Python bytecode, so CPU-bound work gains no parallelism and still blocks other compute. It is the right tool only for blocking I/O. For CPU-heavy work, offload to a ProcessPoolExecutor via loop.run_in_executor instead.

Can asyncio.to_thread cancel a blocking SDK call on timeout?

No. When asyncio.timeout() fires, the awaiting coroutine raises TimeoutError and stops waiting, but the worker thread continues running the blocking call until it returns on its own and holds its pool slot until then. Use the SDK's native timeout argument to actually free the thread, and keep concurrency bounded with a Semaphore so repeated timeouts do not drain the pool.

How many concurrent to_thread calls can run at once?

asyncio.to_thread uses the event loop's default ThreadPoolExecutor, which caps at min(32, os.cpu_count()+4) workers shared across all call sites. Beyond that cap, calls queue silently rather than erroring, so latency rises with no exception. Bound fan-out with an asyncio.Semaphore and use a dedicated ThreadPoolExecutor via run_in_executor for hot paths.

Hybrid Concurrency Models — up to the overview for the full async/thread boundary and its safe bridges.
Concurrent Execution & Worker Patterns — the parent overview for choosing threads versus processes versus async.
Cancellation Patterns — why a running thread cannot be force-cancelled and how cooperative cancellation works.