Monday, August 14, 2023

Fetching JSON over streaming HTTP

Recently, as part of my work on Azure OpenAI code samples, I've been experimenting with different ways of streaming data from a server into a website. The most well known technique is web sockets, but there are also other approaches, like server-sent events and readable streams. A readable stream is the simplest of the options, and works well if your website only needs to stream a response from the server (i.e. it doesn't need bi-directional streaming).

HTTP streaming in Python

To stream an HTTP response, your backend needs to set the "Transfer Encoding" to "chunked". Most web frameworks provide documentation about streaming responses, such as Flask: Streaming and Quart: Streaming responses. In both Flask and Quart, the response must be a Python generator, so that the server can continually get the next data from the generator until it's exhausted.

This example from the Flask doc streams data from a CSV:

@app.route('/large.csv')
def generate_large_csv():
    def generate():
        for row in iter_all_rows():
            yield f"{','.join(row)}\n"
    return generate(), {"Content-Type": "text/csv"}

This example from the Quart docs is an infinite stream of timestamps:

@app.route('/')
async def stream_time():
    async def async_generator():
        time = datetime.isoformat()
        yield time.encode()
    return async_generator(), 200

Consuming streams in JavaScript

The standard way to consume HTTP requests in JavaScript is the fetch() function, and fortunately, that function can also be used to consume HTTP streams. When the browser sees that the data is chunked, it sets response.body to a ReadableStream.

This example fetches a URL, treats the response body as a stream, and logs out the output until it's done streaming:

const response = await fetch(url);
const readableStream = response.body;
const reader = readableStream.getReader();
while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    var text = new TextDecoder("utf-8").decode(value);
    console.log("Received ", text);
}

Streaming JSON

You might think it'd be super straightforward to stream JSON: just generate a JSON string on the server, and then JSON.parse the received text on the client. But there's a gotcha: the client could receive multiple JSON objects in the same chunk, and then an attempt to parse as JSON will fail.

The solution: JSON objects separated by new lines, known either as NDJSON or JSONlines.

This expression converts a Python dict to NDJSON, using the std lib json module:

json.dumps(some_dict) + "\n"

Here's how I actually used that, for one of the ChatGPT samples:

@bp.post("/chat")
def chat_handler():
    request_message = request.json["message"]

    def response_stream():
        response = openai.ChatCompletion.create(
            engine=os.getenv("AZURE_OPENAI_CHATGPT_DEPLOYMENT", "chatgpt"),
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": request_message},
            ],
            stream=True,
        )
        for event in response:
            yield json.dumps(event) + "\n"

    return Response(response_stream())
 

Consuming NDJSON streams in JavaScript

Once the server is outputting NDJSON, then we can write parsing code in JavaScript that splits by newlines and attempts to parse the resulting objects as JSON objects.

const response = await fetch(url);
const readableStream = response.body;
const reader = readableStream.getReader();
while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    var text = new TextDecoder("utf-8").decode(value);
    const objects = text.split("\n");
    for (const obj of objects) {
        try {
            runningText += obj;
            let result = JSON.parse(runningText);
            console.log("Received", result);
            runningText = "";
        } catch (e) {
           // Not a valid JSON object
        }
     }
}

Since I need to use this same processing code in multiple Azure OpenAI samples, I packaged that into a tiny npm package called ndjson-readablestream.

Here's how you can use the package from JavaScript to make NDJSON parsing easier:

import readNDJSONStream from "ndjson-readablestream";

const response = await chatApi(request);
if (!response.body) {
    throw Error("No response body");
}
for await (const event of readNDJSONStream(response.body)) {
    console.log("Received", event);
}

For more examples of using the package, see this PR that uses it in a TypeScript component to render ChatGPT responses or usage in an HTML page, for a non-React ChatGPT sample.

I hope this helps other developers use NDJSON streams in your projects. Please let me know if you have suggestions for improving my approach!

No comments: