Streaming Large Responses with httpx¶
response = await client.get(url); data = response.content is the line that takes down a worker when someone points it at a 4 GB export. httpx — like every high-level client — buffers the entire response body into memory before .content, .read(), .json(), or .text return a single byte. For a small JSON API that is fine; for a download whose size you do not control, RSS scales linearly with the body and the process is one large response away from an OOM kill. The fix is to stream: open the response without reading the body, then iterate it in fixed-size chunks so memory stays flat regardless of total size. This guide shows the buffering anti-pattern, the client.stream() + aiter_bytes() pattern, writing to disk or forwarding without buffering, applying a read timeout per chunk, and handling backpressure to a slow sink.
Prerequisites¶
- Python 3.11+ (for
asyncio.timeout()used in the chunk-timeout step). pip install httpx—httpxis third-party.- Familiarity with the Async HTTP Clients & Servers patterns and the Network I/O & Protocol Handling execution model, which explains why a blocking write to a slow sink stalls the loop.
Step 1: See the buffering anti-pattern¶
These calls all materialize the whole body before returning. There is no size limit — a large or attacker-controlled Content-Length (or a chunked stream with no length) drives RSS straight up.
Verify the problem: run against a large file while sampling RSS (ps -o rss= -p $PID or tracemalloc). Memory climbs to roughly the body size before the write begins. Repeat concurrently and the process OOMs.
Step 2: Use client.stream() + aiter_bytes()¶
client.stream() returns a context manager that yields a response whose body has not been read. Iterate aiter_bytes(chunk_size=...) to pull the body one bounded chunk at a time. Memory is now chunk_size, not the body size.
Verify: sample RSS during the download — it stays flat near chunk_size plus fixed overhead, independent of total file size. Note that resp.status and headers are available before you consume the body, so you can reject a bad response without downloading it.
Step 3: Write to disk or forward without buffering¶
The streamed chunks can go anywhere that accepts bytes incrementally — a file, another HTTP request, a queue, an S3 multipart upload. The rule is to never accumulate the chunks into one object. For forwarding, pass the byte iterator straight to the next request's content=.
Verify: RSS stays flat for both the read and the forward. If you accidentally do b"".join([c async for c in resp.aiter_bytes()]) anywhere, you are back to buffering — RSS will tell you immediately.
Step 4: Apply a read timeout per chunk¶
A streamed download can stall mid-body: the server stops sending but never closes the connection, and your async for waits forever holding a connection. Set a read timeout so each chunk read has a deadline, and wrap the whole loop in asyncio.timeout() if you also need an overall budget.
Verify: point at a server that sends a few bytes then stalls — the read timeout fires within ~10 s with httpx.ReadTimeout instead of hanging indefinitely. See timeouts and deadlines for how the per-chunk and overall deadlines compose.
Step 5: Handle backpressure to a slow sink¶
If your sink (slow disk, a downstream that throttles, a bounded queue) cannot keep up, you want the source to slow down too, not for chunks to pile up in memory. Because aiter_bytes() only pulls the next chunk when you await it, awaiting a slow write naturally throttles the read — TCP backpressure then propagates to the server. Push the sink behind a bounded queue if you need to decouple read and write rates.
Verify: with a deliberately slow write_chunk (e.g. await asyncio.sleep(0.1)), RSS still stays flat — the read rate tracks the write rate instead of buffering ahead. Confirm with ss -tn that the receive window shrinks, showing backpressure reaching the server.
Verification¶
The single most important signal is flat memory regardless of response size. Concretely:
- RSS stays constant across a 1 MB and a 4 GB download — sample with
ps -o rss= -p $PIDor atracemallocsnapshot before and after; the delta is roughlychunk_size, not the body size. - Stalled streams fail fast with
httpx.ReadTimeoutrather than hanging, once Step 4 is in place. - Backpressure propagates: with a slow sink, throughput drops but memory does not climb, and the TCP receive window narrows.
- No leaked connections: every
client.stream()context exits cleanly, so the connection returns to the pool (or closes) rather than leaking.
Pitfalls & Edge Cases¶
- You must consume or close the stream. A
client.stream()block that exits without fully reading the body releases the connection, but if youbreakout early without closing you can leak it or getResponseNotRead. Either drain it or let theasync withclose it. - Streamed responses do not have
.content/.json()until read. Accessing them on a streaming response raiseshttpx.ResponseNotRead. If you need the parsed body and it is small, use a normalget(); streaming is for bodies too large to hold. - Timeouts on stalled streams need a
readtimeout, not justtotal. The default may not bound an idle mid-stream gap. Setreadexplicitly so each chunk read has its own deadline. - Decompression interacts with chunk sizes. With
Content-Encoding: gzip,aiter_bytes()yields decompressed bytes, so the decompressed size — not the wire size — drives both memory per chunk and total disk usage; a small compressed body can expand dramatically. Useaiter_raw()if you want the compressed bytes. - Partial failures mid-stream leave a truncated sink. A connection drop after some chunks have been written leaves a partial file. Write to a temp path and rename on success, or verify length/checksum against
Content-Lengthbefore treating the download as complete.
Frequently Asked Questions¶
Why does response.read() or .json() use so much memory on large downloads?
Those accessors materialize the entire response body into memory before returning, so RSS scales linearly with the body size. For a download whose size you do not control, this can OOM the worker. Stream the body with client.stream() and aiter_bytes() so only one bounded chunk is in memory at a time.
How do I prevent a streamed httpx download from hanging mid-stream?
Set an explicit read timeout via httpx.Timeout(read=...) so each chunk read has a deadline, and optionally wrap the iteration in asyncio.timeout() for an overall budget. A stalled server then raises httpx.ReadTimeout instead of waiting forever while holding a connection.
Does aiter_bytes() give compressed or decompressed bytes?
aiter_bytes() yields decompressed bytes when the response is gzip or br encoded, so the decompressed size drives memory per chunk and total disk usage and a small compressed body can expand dramatically. Use aiter_raw() if you need the raw compressed bytes off the wire.
Related¶
- Async HTTP Clients & Servers — up to the overview for the full client/server pattern catalogue.
- Reusing aiohttp ClientSession Across Requests — the companion pattern for reusing connections across requests.
- Timeouts and Deadlines — how per-chunk read timeouts and overall deadlines compose.