Skip to content

Streaming Large Responses with httpx

response = await client.get(url); data = response.content is the line that takes down a worker when someone points it at a 4 GB export. httpx — like every high-level client — buffers the entire response body into memory before .content, .read(), .json(), or .text return a single byte. For a small JSON API that is fine; for a download whose size you do not control, RSS scales linearly with the body and the process is one large response away from an OOM kill. The fix is to stream: open the response without reading the body, then iterate it in fixed-size chunks so memory stays flat regardless of total size. This guide shows the buffering anti-pattern, the client.stream() + aiter_bytes() pattern, writing to disk or forwarding without buffering, applying a read timeout per chunk, and handling backpressure to a slow sink.

Prerequisites

  • Python 3.11+ (for asyncio.timeout() used in the chunk-timeout step).
  • pip install httpxhttpx is third-party.
  • Familiarity with the Async HTTP Clients & Servers patterns and the Network I/O & Protocol Handling execution model, which explains why a blocking write to a slow sink stalls the loop.
Buffered whole body versus streamed constant memory On the left a buffered read accumulates the full body in memory; on the right a streamed read passes one bounded chunk at a time from socket to sink, so memory is constant. Buffered: read() / .json() Streamed: aiter_bytes() socket RAM buffer whole body sink memory ∝ response size socket chunk disk / fwd one chunk in flight ↻ loop until EOF memory = constant

Step 1: See the buffering anti-pattern

These calls all materialize the whole body before returning. There is no size limit — a large or attacker-controlled Content-Length (or a chunked stream with no length) drives RSS straight up.

1
2
3
4
5
6
7
8
9
# pip install httpx
import httpx

async def download_bad(url: str, path: str) -> None:
    async with httpx.AsyncClient() as client:
        resp = await client.get(url)        # body fully buffered HERE
        resp.raise_for_status()
        with open(path, "wb") as f:
            f.write(resp.content)           # already in RAM; too late

Verify the problem: run against a large file while sampling RSS (ps -o rss= -p $PID or tracemalloc). Memory climbs to roughly the body size before the write begins. Repeat concurrently and the process OOMs.

Step 2: Use client.stream() + aiter_bytes()

client.stream() returns a context manager that yields a response whose body has not been read. Iterate aiter_bytes(chunk_size=...) to pull the body one bounded chunk at a time. Memory is now chunk_size, not the body size.

1
2
3
4
5
6
7
8
9
import httpx

async def download(url: str, path: str) -> None:
    async with httpx.AsyncClient() as client:
        async with client.stream("GET", url) as resp:
            resp.raise_for_status()         # headers are available before the body
            with open(path, "wb") as f:
                async for chunk in resp.aiter_bytes(chunk_size=65536):
                    f.write(chunk)          # one 64 KiB chunk in flight

Verify: sample RSS during the download — it stays flat near chunk_size plus fixed overhead, independent of total file size. Note that resp.status and headers are available before you consume the body, so you can reject a bad response without downloading it.

Step 3: Write to disk or forward without buffering

The streamed chunks can go anywhere that accepts bytes incrementally — a file, another HTTP request, a queue, an S3 multipart upload. The rule is to never accumulate the chunks into one object. For forwarding, pass the byte iterator straight to the next request's content=.

1
2
3
4
5
6
7
8
import httpx

async def proxy(src_url: str, dst_url: str) -> None:
    async with httpx.AsyncClient() as client:
        async with client.stream("GET", src_url) as resp:
            resp.raise_for_status()
            # Forward the stream without buffering: httpx pulls chunks on demand.
            await client.post(dst_url, content=resp.aiter_bytes(65536))

Verify: RSS stays flat for both the read and the forward. If you accidentally do b"".join([c async for c in resp.aiter_bytes()]) anywhere, you are back to buffering — RSS will tell you immediately.

Step 4: Apply a read timeout per chunk

A streamed download can stall mid-body: the server stops sending but never closes the connection, and your async for waits forever holding a connection. Set a read timeout so each chunk read has a deadline, and wrap the whole loop in asyncio.timeout() if you also need an overall budget.

import asyncio
import httpx

async def download_with_timeouts(url: str, path: str) -> None:
    # read=10 bounds the wait for each chunk; connect=2 bounds the handshake.
    timeout = httpx.Timeout(connect=2.0, read=10.0, write=10.0, pool=5.0)
    async with httpx.AsyncClient(timeout=timeout) as client:
        async with client.stream("GET", url) as resp:
            resp.raise_for_status()
            with open(path, "wb") as f:
                async with asyncio.timeout(300):     # overall ceiling
                    async for chunk in resp.aiter_bytes(65536):
                        f.write(chunk)

Verify: point at a server that sends a few bytes then stalls — the read timeout fires within ~10 s with httpx.ReadTimeout instead of hanging indefinitely. See timeouts and deadlines for how the per-chunk and overall deadlines compose.

Step 5: Handle backpressure to a slow sink

If your sink (slow disk, a downstream that throttles, a bounded queue) cannot keep up, you want the source to slow down too, not for chunks to pile up in memory. Because aiter_bytes() only pulls the next chunk when you await it, awaiting a slow write naturally throttles the read — TCP backpressure then propagates to the server. Push the sink behind a bounded queue if you need to decouple read and write rates.

import asyncio
import httpx

async def stream_to_slow_sink(url: str, write_chunk) -> None:
    # write_chunk is an async callable that may be slow (e.g. await queue.put()).
    async with httpx.AsyncClient(timeout=httpx.Timeout(read=15.0)) as client:
        async with client.stream("GET", url) as resp:
            resp.raise_for_status()
            async for chunk in resp.aiter_bytes(65536):
                # Awaiting the slow sink pauses the next read -> TCP backpressure.
                await write_chunk(chunk)

Verify: with a deliberately slow write_chunk (e.g. await asyncio.sleep(0.1)), RSS still stays flat — the read rate tracks the write rate instead of buffering ahead. Confirm with ss -tn that the receive window shrinks, showing backpressure reaching the server.

Verification

The single most important signal is flat memory regardless of response size. Concretely:

  • RSS stays constant across a 1 MB and a 4 GB download — sample with ps -o rss= -p $PID or a tracemalloc snapshot before and after; the delta is roughly chunk_size, not the body size.
  • Stalled streams fail fast with httpx.ReadTimeout rather than hanging, once Step 4 is in place.
  • Backpressure propagates: with a slow sink, throughput drops but memory does not climb, and the TCP receive window narrows.
  • No leaked connections: every client.stream() context exits cleanly, so the connection returns to the pool (or closes) rather than leaking.

Pitfalls & Edge Cases

  • You must consume or close the stream. A client.stream() block that exits without fully reading the body releases the connection, but if you break out early without closing you can leak it or get ResponseNotRead. Either drain it or let the async with close it.
  • Streamed responses do not have .content/.json() until read. Accessing them on a streaming response raises httpx.ResponseNotRead. If you need the parsed body and it is small, use a normal get(); streaming is for bodies too large to hold.
  • Timeouts on stalled streams need a read timeout, not just total. The default may not bound an idle mid-stream gap. Set read explicitly so each chunk read has its own deadline.
  • Decompression interacts with chunk sizes. With Content-Encoding: gzip, aiter_bytes() yields decompressed bytes, so the decompressed size — not the wire size — drives both memory per chunk and total disk usage; a small compressed body can expand dramatically. Use aiter_raw() if you want the compressed bytes.
  • Partial failures mid-stream leave a truncated sink. A connection drop after some chunks have been written leaves a partial file. Write to a temp path and rename on success, or verify length/checksum against Content-Length before treating the download as complete.

Frequently Asked Questions

Why does response.read() or .json() use so much memory on large downloads?

Those accessors materialize the entire response body into memory before returning, so RSS scales linearly with the body size. For a download whose size you do not control, this can OOM the worker. Stream the body with client.stream() and aiter_bytes() so only one bounded chunk is in memory at a time.

How do I prevent a streamed httpx download from hanging mid-stream?

Set an explicit read timeout via httpx.Timeout(read=...) so each chunk read has a deadline, and optionally wrap the iteration in asyncio.timeout() for an overall budget. A stalled server then raises httpx.ReadTimeout instead of waiting forever while holding a connection.

Does aiter_bytes() give compressed or decompressed bytes?

aiter_bytes() yields decompressed bytes when the response is gzip or br encoded, so the decompressed size drives memory per chunk and total disk usage and a small compressed body can expand dramatically. Use aiter_raw() if you need the raw compressed bytes off the wire.