Avoiding Event Loop Blocking with asyncpg¶
You migrated to asyncpg precisely because it is non-blocking, yet p99 latency still spikes in bursts: most requests are fast, then a burst of them stalls for hundreds of milliseconds for no obvious reason. The query plans are fine and the database is not overloaded. The cause is almost always something else on the event loop — a synchronous psycopg2 call left in a legacy module, a CPU-heavy transform over a large result set, or a single connection shared across concurrent tasks — that freezes the loop while asyncpg waits its turn. This guide is a repeatable workflow to find that hidden blocker and remove it.
Prerequisites¶
- Python 3.11+ (for
asyncio.timeout(),asyncio.TaskGroup, andasyncio.to_thread()). asyncpg:pip install asyncpg.- Familiarity with the driver and pool model from Async Database Drivers and the loop's I/O-multiplexing model in Network I/O & Protocol Handling.
Step 1 — Detect Loop Blocking¶
Turn on debug mode and lower the slow-callback threshold so the loop tells you when a single callback ran too long. Any warning is a synchronous call that froze the loop — its traceback names the offender.
Verify: Under load you should see Executing <...> took 0.123 seconds warnings. The frame in the warning is your blocking call. If you see none and latency is still spiky, the stall is between callbacks (e.g. GC or a blocked thread) — proceed to the next steps anyway, as a too-small pool also masquerades as latency.
Step 2 — Set Up the Pool Correctly¶
A per-request connection pays a TCP and auth round-trip every time, which looks exactly like latency. Create one pool at startup with explicit sizing and a command_timeout so a stuck query cannot hang a connection forever.
Verify: Log pool.get_size() at startup; it should equal min_size immediately, proving connections are warmed, not opened lazily on the hot path.
Step 3 — Acquire Per Task, Never Share a Connection¶
Sharing one connection across concurrent tasks interleaves protocol frames and either corrupts the stream or serializes the work behind one connection. Borrow per operation and release promptly.
Verify: Remove the anti-pattern, then run your concurrency test. Intermittent asyncpg protocol/decode errors should disappear entirely.
Step 4 — Move CPU-Heavy Row Processing Off the Loop¶
asyncpg's network wait yields, but the Python code that runs after fetch() does not. Deserializing 50k rows or running a heavy transform inline blocks the loop just like a sync driver would. Push it to a thread with asyncio.to_thread (or a process pool for truly CPU-bound work, per CPU-bound task offloading).
Verify: Re-run Step 1's debug loop. The slow-callback warning that pointed at _transform should be gone; the CPU time now lives on a worker thread.
Step 5 — Bound Concurrency to the Pool Size¶
If you launch 1000 tasks against a 20-connection pool, 980 of them block on acquire() and your acquire-wait p99 explodes. Gate callers with a Semaphore sized to the pool so excess load waits at one observable choke point instead of piling onto the connector.
Verify: Time the acquire() (wrap it in asyncio.timeout() and record the wait). With the Semaphore matched to the pool, acquire wait should sit near zero even when you submit far more tasks than connections.
Verification¶
After applying all five steps, confirm the fix holds under load:
- The slow-callback warnings from Step 1 no longer appear during a sustained load test.
- p99 latency stabilizes — the bursty spikes flatten because no single coroutine monopolizes the loop.
- Pool acquire wait (p99) stays low; idle connections are non-zero between bursts, proving the pool is no longer the bottleneck.
- Concurrency tests run clean: no intermittent asyncpg protocol or decode errors.
Pitfalls & Edge Cases¶
- A sync
psycopg2call sneaking in. A "quick" synchronous query in an admin endpoint or a logging hook blocks the loop for everyone. Grep for non-async drivers in the codebase; route any survivor throughasyncio.to_thread. - Sharing a connection via a captured variable. Storing an acquired connection on
selfor a module global and reusing it across tasks is the same bug as Step 3, just hidden. Acquire inside each operation, never above the fan-out. - Giant result sets. A
SELECTreturning millions of rows materializes them all in memory and the transform stalls the loop. Use a server-side cursor (conn.cursor()inside a transaction) to stream in batches, and offload the per-batch work. - A transaction spanning a slow await. Doing unrelated slow I/O (an HTTP call, a sleep) inside
async with conn.transaction()pins the connection and holds locks for the whole duration. Keep only the related writes inside the transaction. - Pool
max_sizegreater than the servermax_connections. The pool happily tries to open more backends than the server allows and you getFATAL: too many connections. Count every instance's pool against the server budget, as covered in Async Database Drivers.
Frequently Asked Questions¶
Why does asyncpg still cause latency spikes if it is non-blocking?
asyncpg yields only during network waits. Latency spikes come from something else on the same loop: a synchronous psycopg2 or sqlite3 call left in another module, CPU-heavy processing of a large result set running inline, or a connection shared across tasks. Enable loop debug with a low slow_callback_duration to find the offending callback.
How do I detect that something is blocking my asyncio event loop?
Call loop.set_debug(True) and set loop.slow_callback_duration to about 0.05 seconds. The loop then logs a warning whenever a single callback runs longer than that threshold, and the traceback in the warning points directly at the blocking code.
How do I keep heavy row processing from blocking the loop with asyncpg?
Do the network fetch on the loop, then hand the rows to asyncio.to_thread for CPU-heavy transforms, or to a ProcessPoolExecutor for truly CPU-bound work. This keeps the parsing and aggregation off the loop thread so other coroutines continue to run.
Related¶
- Async Database Drivers — up to the overview for pools, transactions, and sizing across native and blocking drivers.
- Network I/O & Protocol Handling — up to the overview for the loop's I/O-multiplexing model that asyncpg relies on.
- Running Blocking SDK Calls with
asyncio.to_thread— the pattern for any synchronous driver that must coexist with the loop.