Saturday, September 16, 2023

Best practices for OpenAI Chat apps: Streaming UI

As part of my role the Python advocacy team for Azure, I am now one of the maintainers on several ChatGPT samples, like my simple chat app and this popular chat + search app. In this series of blog posts, I'll share my learnings for writing chat-like applications. My experience is from apps with Python backends, but many of these practices apply cross-language.

Today I want to talk about the importance of streaming in the UI of a chat app, and how we can accomplish that. Streaming doesn't feel like a must-have at first, but users have gotten so accustomed to streaming in ChatGPT-using interfaces like ChatGPT, Bing Chat, and GitHub CoPilot, that they expect it in similar experiences. In addition, streaming can reduce the "time to first answer", as long as your UI is calling the streaming OpenAI API as well. Given it can take several seconds for ChatGPT to respond, we welcome any approaches to answer user's questions faster.

Animated GIF of GitHub CoPilot answering a question about bash

Streaming from the APIs

The openai package makes it easy to optionally stream responses from the API, by way of a stream argument:

chat_coroutine = openai.ChatCompletion.acreate(
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": request_message},

When stream is true, the response type is an asynchronous generator, so we can use async for to process each of the ChatCompletion chunk objects:

async for event in await chat_coroutine:
    message_chunk = event.choices[0].delta.content

Sending stream from backend to frontend

When we're making a web app, we need a way to send those objects as a stream from the backend to the browser. We can't use a standard HTTP response, since that sends everything at once and closes the connection. The most common approaches for streaming from backends are:

  • WebSockets: Bidirectional communication channel, client or server can push.
  • Server-sent events: An HTTP channel for server to push to client.
  • Readable streams: An HTTP response with a Transfer-encoding header of "chunked", signifying the browser must wait for all chunks.

All of these could potentially be used for a chat app, and I myself have experimented with both server-sent events and readable streams. Behind the scenes, the ChatGPT API actually uses server-sent events, so you'll find code in the openai package for parsing that protocol. However, I now prefer using readable streams for my frontend to backend communication. It's the simplest code setup on both the frontend and backend, and it supports the POST requests that our apps are already sending.

The key is to send the chunks from the backend using the NDJSON (jsonlines) format, and parse that format in the frontend. See my blog post on fetching JSON over streaming HTTP for Python and JavaScript example code.

Achieving a word-by-word effect

With all of that implemented, we have a frontend that reveals the answer gradually:

Animated GIF of answer appearing gradually

Here's what's interesting: despite our frontend receiving chunks of just a few tokens at a time, it appears to reveal almost entire sentences at a time. Why does the frontend UI seem to stream much larger chunks than what it receives? That's likely caused by the browser batching up repaints, deciding that it can wait to display the latest update to the innerHTML of the answer element. Normally that's a great performance enhancement on the browser's side, but it's not ideal in this case.

My colleague Steve Steiner experimented with various ways to force the browser to repaint more frequently, and settled on a technique that uses window.setTimeout() with a delay of 33 milliseconds for each chunk. That does mean that the browser takes overall more time to display a streamed response, but it doesn't end up faster than reading speed. See his PR for implementation details.

Now the frontend displays the answer at the same level of granularity that it receives from the ChatCompletions API:

Animated GIF of answer appearing word by word

Streaming more of the process

Many of our sample apps are RAG apps that "chat on your data", by chaining together calls across vector databases (like Azure Cognitive Search), embedding APIs, and the Chat Completion API. That chain of calls will take longer to process than a single ChatCompletion call, of course, so users may end up waiting longer for their answers.

One suggestion from Steve Steiner is to stream more of the process. Instead of waiting until we had the final answer, we could stream the process of finding the answer, like:

  • Processing your question: "Can you suggest a pizza recipe that incorporates both mushroom and pineapples?"
  • Generated search query "pineapple mushroom pizza recipes"
  • Found three related results from our cookbooks: 1) Mushroom calzone 2) Pineapple ham pizza 3) Mushroom loaf
  • Generating answer to your question...
  • Sure! Here's a recipe for a mushroom pineapple pizza...

We haven't integrated that idea into any of our samples yet, but it's interesting to consider for anyone building chat apps, as a way to keep the user engaged while the backend does additional work.

Making it optional

I just spent all that time talking about streaming, but I want to leave you with one final recommendation: make streaming optional, especially if you are developing a project for others to deploy. There are some web hosts that may not support streaming as readily as others, so developers appreciate the option to turn streaming off. There are also some use cases where streaming may not make sense, and it should be easy for developers (or even users) to turn it off.

Wednesday, September 13, 2023

Best practices for OpenAI Chat apps: Concurrency

As part of my role the Python advocacy team for Azure, I am now one of the maintainers on several ChatGPT samples, like my simple chat app and this popular chat + search app. In this series of blog posts, I'll share my learnings for writing chat-like applications. My experience is from apps with Python backends, but many of these practices apply cross-language.

My first tip is to use an asynchronous backend framework so that your app is capable of fulfilling concurrent requests from users.

The need for concurrency

Why? Let's imagine that we used a synchronous framework, like Flask. We deploy that to a server using gunicorn and several workers. One of those workers receives a POST request to the "/chat" endpoint. That chat endpoint in turns makes a request to the Azure ChatCompletions API. The request can take a while to complete - several seconds! During that time, the worker is tied up and cannot handle any more user requests. We could throw more CPUs and thus workers and threads at the problem, but that's a waste of server resources.

Without concurrency, requests must be handled serially:

Diagram of worker handling requests one after the other

The better approach when our app has long blocking I/O calls is to use an asynchronous framework. That way, when a request has gone out to a potentially slow-to-respond API, the Python program can pause that coroutine and handle a brand new request.

With concurrency, workers can handle new requests during I/O calls:

Diagram of worker handling second request while first request waits for API response

Asynchronous Python backends

We use Quart, the asynchronous version of Flask, for the simple chat quickstart as well as the chat + search app. I've also ported the simple chat to FastAPI, the most popular asynchronous framework for Python.

Our handlers now all have async in front, signifying that they return a Python coroutine instead of a normal function:

async def chat_handler():
    request_message = (await request.get_json())["message"]

When we deploy those apps, we still use gunicorn, but with the uvicorn worker, which is designed for Python ASGI apps. The configures it like so:

num_cpus = multiprocessing.cpu_count()
workers = (num_cpus * 2) + 1
worker_class = "uvicorn.workers.UvicornWorker"

Asynchronous API calls

To really benefit from the port to an asynchronous framework, we need to make asynchronous calls to all of the APIs, so that a worker can handle a new request whenever an API call is being awaited.

Our API calls to the openai SDK now use await with the acreate variant:

chat_coroutine = openai.ChatCompletion.acreate(
    deployment_id=os.getenv("AZURE_OPENAI_CHATGPT_DEPLOYMENT", "chatgpt"),
    messages=[{"role": "system", "content": "You are a helpful assistant."},
              {"role": "user", "content": request_message}],

For the RAG sample, we also have calls to Azure services like Azure Cognitive Search. To make those asynchronous, we first import the async variant of the credential and client classes in the aio module:

from azure.identity.aio import DefaultAzureCredential
from import SearchClient

Then the API calls themselves require await to the same function name:

r = await

Monday, September 11, 2023

Mocking async openai package calls with pytest

As part of my role the Python advocacy team for Azure, I am now one of the maintainers on several ChatGPT samples, like my simple chat app and the very popular chat + search app. Both of those samples use Quart, the asynchronous version of Flask, which enables them to use the asynchronous versions of the functions from the openai package.

Making async openai calls

A synchronous call to the streaming ChatCompletion API looks like:

response = openai.ChatCompletion.create(
  messages=[{"role": "system", "content": "You are a helpful assistant."},	
            {"role": "user", "content": request_message}],	
An asynchronous call to that same API looks like:
response = await openai.ChatCompletion.acreate(
  messages=[{"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": request_message},],

The difference is just the addition of await to wait for the results of the asynchronous function (and signal that the process can work on other tasks), along with the change in method name from create to acreate. That's a small difference in our app code, but it's a significant difference when it comes to mocking those calls, so it's worth pointing out.

Mocking a streaming call

In our tests of the apps, we don't want to actually make calls to the OpenAI servers, since that'd require authentication and would use up quota needlessly. Instead, we can mock the calls using the built-in pytest fixture monkeypatch with code that mimics the openai package's response.

Here's the fixture that I use to mock the asynchronous acreate call:

def mock_openai_chatcompletion(monkeypatch):

    class AsyncChatCompletionIterator:
        def __init__(self, answer: str):
            self.answer_index = 0
            self.answer_deltas = answer.split(" ")

        def __aiter__(self):
            return self

        async def __anext__(self):
            if self.answer_index < len(self.answer_deltas):
                answer_chunk = self.answer_deltas[self.answer_index]
                self.answer_index += 1
                return openai.util.convert_to_openai_object(
                    {"choices": [{"delta": {"content": answer_chunk}}]})
                raise StopAsyncIteration

    async def mock_acreate(*args, **kwargs):
        return AsyncChatCompletionIterator("The capital of France is Paris.")

    monkeypatch.setattr(openai.ChatCompletion, "acreate", mock_acreate)

The final line of that fixture swaps the acreate method with my mock method that returns a class that acts like an asynchronous generator thanks to its __anext__ dunder method. That method returns a chunk of the answer each time it's called, until there are no chunks left.

Mocking non-streaming call

For the other repo, which supports both streaming and non-streaming response, the mock acreate method must account for the non-streaming case by immediately returning the full answer.

    async def mock_acreate(*args, **kwargs):
        messages = kwargs["messages"]
        answer = "The capital of France is Paris."
        if "stream" in kwargs and kwargs["stream"] is True:
            return AsyncChatCompletionIterator(answer)
            return openai.util.convert_to_openai_object(
                {"choices": [{"message": {"content": answer}}]})

Mocking multiple answers

If necessary, it's possible to make the mock respond with different answers based off the passed on the last message passed in. We need that for the chat + search app, since we also use a ChatGPT call to generate keyword searches based on the user question.

Just change the answer based off the messages keyword arg:

    async def mock_acreate(*args, **kwargs):
        messages = kwargs["messages"]
        if messages[-1]["content"] == "Generate search query for: What is the capital of France?":
            answer = "capital of France"
            answer = "The capital of France is Paris."

Mocking other openai calls

We also make other calls through the openai package, like to create embeddings. That's a much simpler mock, since there's no streaming involved:

def mock_openai_embedding(monkeypatch):
    async def mock_acreate(*args, **kwargs):
        return {"data": [{"embedding": [0.1, 0.2, 0.3]}]}

    monkeypatch.setattr(openai.Embedding, "acreate", mock_acreate)

More resources

For more context and example tests, view the full tests in the repos:

Monday, August 14, 2023

Fetching JSON over streaming HTTP

Recently, as part of my work on Azure OpenAI code samples, I've been experimenting with different ways of streaming data from a server into a website. The most well known technique is web sockets, but there are also other approaches, like server-sent events and readable streams. A readable stream is the simplest of the options, and works well if your website only needs to stream a response from the server (i.e. it doesn't need bi-directional streaming).

HTTP streaming in Python

To stream an HTTP response, your backend needs to set the "Transfer Encoding" to "chunked". Most web frameworks provide documentation about streaming responses, such as Flask: Streaming and Quart: Streaming responses. In both Flask and Quart, the response must be a Python generator, so that the server can continually get the next data from the generator until it's exhausted.

This example from the Flask doc streams data from a CSV:

def generate_large_csv():
    def generate():
        for row in iter_all_rows():
            yield f"{','.join(row)}\n"
    return generate(), {"Content-Type": "text/csv"}

This example from the Quart docs is an infinite stream of timestamps:

async def stream_time():
    async def async_generator():
        time = datetime.isoformat()
        yield time.encode()
    return async_generator(), 200

Consuming streams in JavaScript

The standard way to consume HTTP requests in JavaScript is the fetch() function, and fortunately, that function can also be used to consume HTTP streams. When the browser sees that the data is chunked, it sets response.body to a ReadableStream.

This example fetches a URL, treats the response body as a stream, and logs out the output until it's done streaming:

const response = await fetch(url);
const readableStream = response.body;
const reader = readableStream.getReader();
while (true) {
    const { done, value } = await;
    if (done) break;
    var text = new TextDecoder("utf-8").decode(value);
    console.log("Received ", text);

Streaming JSON

You might think it'd be super straightforward to stream JSON: just generate a JSON string on the server, and then JSON.parse the received text on the client. But there's a gotcha: the client could receive multiple JSON objects in the same chunk, and then an attempt to parse as JSON will fail.

The solution: JSON objects separated by new lines, known either as NDJSON or JSONlines.

This expression converts a Python dict to NDJSON, using the std lib json module:

json.dumps(some_dict) + "\n"

Here's how I actually used that, for one of the ChatGPT samples:"/chat")
def chat_handler():
    request_message = request.json["message"]

    def response_stream():
        response = openai.ChatCompletion.create(
            engine=os.getenv("AZURE_OPENAI_CHATGPT_DEPLOYMENT", "chatgpt"),
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": request_message},
        for event in response:
            yield json.dumps(event) + "\n"

    return Response(response_stream())

Consuming NDJSON streams in JavaScript

Once the server is outputting NDJSON, then we can write parsing code in JavaScript that splits by newlines and attempts to parse the resulting objects as JSON objects.

const response = await fetch(url);
const readableStream = response.body;
const reader = readableStream.getReader();
while (true) {
    const { done, value } = await;
    if (done) break;
    var text = new TextDecoder("utf-8").decode(value);
    const objects = text.split("\n");
    for (const obj of objects) {
        try {
            runningText += obj;
            let result = JSON.parse(runningText);
            console.log("Received", result);
            runningText = "";
        } catch (e) {
           // Not a valid JSON object

Since I need to use this same processing code in multiple Azure OpenAI samples, I packaged that into a tiny npm package called ndjson-readablestream.

Here's how you can use the package from JavaScript to make NDJSON parsing easier:

import readNDJSONStream from "ndjson-readablestream";

const response = await chatApi(request);
if (!response.body) {
    throw Error("No response body");
for await (const event of readNDJSONStream(response.body)) {
    console.log("Received", event);

For more examples of using the package, see this PR that uses it in a TypeScript component to render ChatGPT responses or usage in an HTML page, for a non-React ChatGPT sample.

I hope this helps other developers use NDJSON streams in your projects. Please let me know if you have suggestions for improving my approach!

Monday, August 7, 2023

Accessibility snapshot testing for Python web apps (Part 2)

In my previous post, I showed a technique that used axe-core along with pytest and Playwright to make sure that pages in your web apps have no accessibility violations. That's a great approach if it works for you, but realistically, most webpages have a non-zero number of accessibility violations, due to limited engineering resources or dependence on third-party libraries. Do we just give up on being able to test them? No! Instead, we use a different approach: snapshot testing.

Snapshot testing

Snapshot testing is a way to test that the output of a function matches a previously saved snapshot.

Here's an example using pytest-snapshot:

def emojify(s):
    return s.replace('love', '❤️').replace('python', '🐍')

def test_function_output_with_snapshot(snapshot):
    snapshot.assert_match(emojify('I love python'), 'snapshot.txt')

The first time we run the test, it will save the output to a file. We check the generated snapshots into source control. Then, the next time anyone runs that test, it will compare the output to the saved snapshot.

Snapshot testing + axe-core

First, a big kudos to Michael Wheeler from UMich and his talk on Automated Web Accessibility Testing for the idea of using snapshot testing with axe-core.

Here's the approach: We save snapshots of the axe-core violations and check them into source control. That way, our tests will let us know when new violations come up, and our snapshot files keep track of which parts of the codebase need accessibility improvements.

To make it as easy as possible, I made a pytest plugin that combines Playwright, axe-core, and snapshot testing.

python3 -m pip install pytest-axe-playwright-snapshot
python3 -m playwright install --with-deps

Here's an example test from a Flask app:

from flask import url_for
from playwright.sync_api import Page

def test_index(page: Page, axe_pytest_snapshot):
    page.goto(url_for("index", _external=True))

Running the snapshot tests

First run: We specify the --snapshot-update argument to tell the plugin to save the snapshots to file.

python3 -m pytest --snapshot-update

That saves a file like this one to a directory named after the test and browser engine, like snapshots/test_violations/chromium/snapshot.txt:

color-contrast (serious) : 2
empty-heading (minor) : 1
link-name (serious) : 1

Subsequent runs: The plugin compares the new snapshot to the saved snapshot, and asserts if they differ.

python3 -m pytest

Let's look through some example outputs next.

Test results

New accessibility issue 😱

If there are violations in the new snapshot that weren't in the old, the test will fail with a message like this:

E  AssertionError: New violations found: html-has-lang (serious)
E  That's bad news! 😱 Either fix the issue or run `pytest --snapshot-update` to update the snapshots.
E  html-has-lang - Ensures every HTML document has a lang attribute
E    URL:
E    Impact Level: serious
E    Tags: ['cat.language', 'wcag2a', 'wcag311', 'ACT']
E    Elements Affected:
E    1) Target: html
E       Snippet: <html>
E       Messages:
E       * The <html> element does not have a lang attribute

Fixed accessibility issue 🎉

If there are less violations in the new snapshot than the old one, the test will also fail, but with a happy message like this:

E  AssertionError: Old violations no longer found: html-has-lang (serious).
E  That's good news! 🎉 Run `pytest --snapshot-update` to update the snapshots.

CI/CD integration

Once you've got snapshot testing setup, it's a great idea to run it on every potential change to your codebase.

Here's an example of a failing GitHub action due to an accessibility violation, using this workflow file: Screenshot of GitHub actions workflow that shows test failure due to accessibility violations

Fixing accessibility issues

What should you do if you realize you've introduced an accessibility violation, or if you are tasked with reducing existing violations? You can read the reports from pytest to get a gist for the accessibility violations, but it's often easier to use a browser extension that uses the same Axe-core rules.

Also consider an IDE extension like VS Code Axe Linter

Don't rely on automation to find all issues

I think it's really important for web apps to measure their accessibility violations, so that they can avoid introducing accessibility regressions and eventually resolve existing violations. However, it's really important to note that these automated tools can only go so far. According to the axe-core docs, it finds about 57% of WCAG issues automatically. There can still be many issues with your site, like with tab order or keyboard access.

In addition to automation, please consider other ways to discover issues, such as paying for an external accessibility audit, engaging with your disabled users, and hiring engineers with disabilities.

Friday, July 21, 2023

Automated accessibility audits for Python web apps (Part 1)

We all know by now the importance of accessibility for webpages. But it's surprisingly easy to create inaccessible web experiences, and unknowingly deploy those to production. How do we check for accessibility issues? One approach is to install a browser extension like Accessibility Insights and run that on changed webpages. I love that extension, but I don't trust myself to remember to run it. So I've been working on tools for running accessibility tests on Python web apps, which I'll present at next week's North Bay Python.

In this post, I'm going to share a way to automatically verify that a Python web app has *zero* accessibility issues -- or at least, zero issues that can be caught by automated testing. One should always do additional manual tests (like keyboard tests) and work with disabled users to discover all issues.


Here's what we'll need:

  • Playwright: A tool for end-to-end testing in various browser engines. Similar to Selenium, if you're familiar with that.
  • Axe-core: An accessibility engine for automated Web UI testing, built with JavaScript. Used by many other tools, like the Accessibility Insights browser extension.
  • axe-playwright-python: A package that I developed to connect the two together, running axe-core on Playwright pages and returning the results in useful formats.

For this example, I'll also use Flask, Pytest, and pytest-flask to run a local server during testing. However, you could easily use other frameworks (like Django and unittest).

The test

Here's the full code for a test of the four main routes on my personal website (

from axe_playwright_python.sync_playwright import Axe
from flask import url_for
from playwright.sync_api import Page

def test_a11y(app, live_server, page: Page):
    page.goto(url_for("home_page", _external=True))
    results = Axe().run(page)
    assert results.violations_count == 0, results.generate_report()

Let's break that down:

  • def test_a11y(app, live_server, page: Page):

    The app and live_server fixtures take care of starting up the app at a local URL. The app fixture comes from my and the live_server fixture comes from pytest-flask.

  • page.goto(url_for("home_page", _external=True))

    I use the Page fixture from Playwright to navigate to a route from my app.

  • results = Axe().run(page)

    Using the Axe object from my axe-playwright-python package, I run axe-core on the page.

  • assert results.violations_count == 0, results.generate_report()

    I assert that the violations count is zero, but I also provide a human-friendly report as the assertion message. That way, if any violations were found, I'll see the report in the pytest output.

For the full code, see the tests/ folder in the GitHub repository.

The output

When there are no violations found, the test passes! 🎉

When there are any violations found, the pytest output looks like this:

    def test_a11y(app, live_server, page: Page):
        axe = Axe()
        page.goto(url_for("home_page", _external=True))
        results =
>       assert results.violations_count == 0, results.generate_report()
E       AssertionError: Found 1 accessibility violations:
E         Rule Violated:
E         image-alt - Ensures  elements have alternate text or a role of none or presentation
E             URL:
E             Impact Level: critical
E             Tags: ['cat.text-alternatives', 'wcag2a', 'wcag111', 'section508', 'section508.22.a', 'ACT']
E             Elements Affected:
E               1)      Target: img
E                       Snippet: <img src="bla.jpg">
E                       Messages:
E                       * Element does not have an alt attribute
E                       * aria-label attribute does not exist or is empty
E                       * aria-labelledby attribute does not exist, references elements that do not exist or references elements that are empty
E                       * Element has no title attribute
E                       * Element's default semantics were not overridden with role="none" or role="presentation"
E       assert 1 == 0

I can then read the report, look for the HTML matching the snippet, and make it accessible. In the case above, there's an img tag missing an alt attribute. Once I fix that, the test passes.

Checking more routes

To check additional routes, I can either add more tests or I can parameterize the current test like so:

@pytest.mark.parametrize("route", ["home_page", "projects", "talks", "interviews"])
def test_a11y(app, live_server, page: Page, route: str):
    axe = Axe()
    page.goto(url_for(route, _external=True))
    results =
    assert results.violations_count == 0, results.generate_report()

For testing a route where user interaction causes a change in the page, I can use Playwright to interact with the page and then run Axe after the interaction. Here's an example of that from another app:

def test_quiz_submit(page: Page, snapshot, fake_quiz):
    page.goto(url_for("quizzes.quiz",, _external=True))
    page.get_by_label("Your name:").click()
    page.get_by_label("Your name:").fill("Pamela")
    page.get_by_label("Ada Lovelace").check()
    page.get_by_role("button", name="Submit your score!").click()
    expect(page.locator("#score")).to_contain_text("You scored 25% on the quiz.")
    results = Axe().run(page)
    assert results.violations_count == 0, results.generate_report()

Is perfection possible?

Fortunately, I was able to fix all of the accessibility violations for my very small personal website. However, many webpages are much bigger and more complicated, and it may not be possible to address all the violations. Is it possible to run tests like this in that situation? Yes, but we need to do something like snapshot testing: tracking the violations over time and ensuring that changes don't introduce additional violations. I'll show an approach for that in Part 2 of this blog post series. Stay tuned!

Tuesday, June 27, 2023

Tips for debugging Flask deployments to Azure App Service

There are many ways to deploy Flask web apps to App Service: Azure CLI, VS Code Azure Tools extension, Azure Developer CLI, or GitHub-based deployments. Unfortunately, sometimes a deploy fails, and it can be hard at first to understand what's wrong. Regardless of how you deploy Flask to App Service, you can follow these tips for debugging the deployment.

After you finish deploying, first visit the app URL to see if it loads. If it does, amazing! If it doesn't, here are steps you can take to figure out what went wrong.

Check the deployment logs

Select Deployment Center from the side navigation menu, then select Logs. You should see a timestamped list of recent deploys:

Check whether the status of the most recent deploy is "Success (Active)" or "Failed". If it's success, the deployment logs might still reveal issues, and if it's failed, the logs should certainly reveal the issue.

Click the commit ID to open the logs for the most recent deploy. First scroll down to see if any errors or warnings are reported at the end. This is what you'll hopefully see if all went well:

Now scroll back up to find the timestamp with the label "Running oryx build". Oryx is the open source tool that builds apps for App Service, Functions, and other platforms, across all the supported MS languages. Click the Show logs link next to that label. That will pop open detailed logs at the bottom. Scroll down.

Here's what a successful Oryx build looks like for a Flask application:

Command: oryx build /tmp/zipdeploy/extracted -o /home/site/wwwroot --platform python --platform-version 3.10 -p virtualenv_name=antenv --log-file /tmp/build-debug.log  -i /tmp/8db773a0e30ccc6 --compress-destination-dir | tee /tmp/oryx-build.log
Operation performed by Microsoft Oryx,
You can report issues at

Oryx Version: 0.2.20230508.1, Commit: 7fe2bf39b357dd68572b438a85ca50b5ecfb4592, ReleaseTagName: 20230508.1

Build Operation ID: 164fee7dc4083f79
Repository Commit : 6e78c534-da03-414e-acc1-e396b92b1405
OS Type           : bullseye
Image Type        : githubactions

Detecting platforms...
Detected following platforms:
  python: 3.10.8
Version '3.10.8' of platform 'python' is not installed. Generating script to install it...

Using intermediate directory '/tmp/8db773a0e30ccc6'.

Copying files to the intermediate directory...
Done in 0 sec(s).

Source directory     : /tmp/8db773a0e30ccc6
Destination directory: /home/site/wwwroot

Downloading and extracting 'python' version '3.10.8' to '/tmp/oryx/platforms/python/3.10.8'...
Detected image debian flavor: bullseye.
Downloaded in 2 sec(s).
Verifying checksum...
Extracting contents...
performing sha512 checksum for: python...
Done in 18 sec(s).

image detector file exists, platform is python..
OS detector file exists, OS is bullseye..
Python Version: /tmp/oryx/platforms/python/3.10.8/bin/python3.10
Creating directory for command manifest file if it does not exist
Removing existing manifest file
Python Virtual Environment: antenv
Creating virtual environment...
Activating virtual environment...
Running pip install...
[18:13:30+0000] Collecting Flask==2.3.2
[18:13:30+0000]   Downloading Flask-2.3.2-py3-none-any.whl (96 kB)
[18:13:30+0000]      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 96.9/96.9 kB 4.8 MB/s eta 0:00:00
[18:13:31+0000] Collecting itsdangerous>=2.1.2
[18:13:31+0000]   Downloading itsdangerous-2.1.2-py3-none-any.whl (15 kB)
[18:13:31+0000] Collecting click>=8.1.3
[18:13:31+0000]   Downloading click-8.1.3-py3-none-any.whl (96 kB)
[18:13:31+0000]      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 96.6/96.6 kB 5.4 MB/s eta 0:00:00
[18:13:31+0000] Collecting Werkzeug>=2.3.3
[18:13:31+0000]   Downloading Werkzeug-2.3.6-py3-none-any.whl (242 kB)
[18:13:31+0000]      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 242.5/242.5 kB 8.7 MB/s eta 0:00:00
[18:13:31+0000] Collecting blinker>=1.6.2
[18:13:31+0000]   Downloading blinker-1.6.2-py3-none-any.whl (13 kB)
[18:13:31+0000] Collecting Jinja2>=3.1.2
[18:13:31+0000]   Downloading Jinja2-3.1.2-py3-none-any.whl (133 kB)
[18:13:31+0000]      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.1/133.1 kB 6.9 MB/s eta 0:00:00
[18:13:32+0000] Collecting MarkupSafe>=2.0
[18:13:32+0000]   Downloading MarkupSafe-2.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
[18:13:33+0000] Installing collected packages: MarkupSafe, itsdangerous, click, blinker, Werkzeug, Jinja2, Flask
[18:13:35+0000] Successfully installed Flask-2.3.2 Jinja2-3.1.2 MarkupSafe-2.1.3 Werkzeug-2.3.6 blinker-1.6.2 click-8.1.3 itsdangerous-2.1.2

[notice] A new release of pip available: 22.2.2 -> 23.1.2
[notice] To update, run: pip install --upgrade pip
Not a vso image, so not writing build commands
Preparing output...

Copying files to destination directory '/tmp/_preCompressedDestinationDir'...
Done in 3 sec(s).
Compressing content of directory '/tmp/_preCompressedDestinationDir'...
Copied the compressed output to '/home/site/wwwroot'

Removing existing manifest file
Creating a manifest file...
Manifest file created.
Copying .ostype to manifest output directory.

Done in 70 sec(s).

Look for these important steps in the Oryx build:

  • Detected following platforms: python: 3.10.8
    That should match your runtime in the App Service configuration.
  • Running pip install...
    That should install all the requirements in your requirements.txt - if it didn't find your requirements.txt, then you won't see the packages installed.

If you see all those steps in the Oryx build, then that's a good sign that the build went well, and you can move on to checking the App Service logs.

Check the log stream

Under the Monitoring section of the side nav, select Log stream. Scroll to the timestamp corresponding to your most recent deploy.

The logs should start with pulling Docker images:

Screenshot of Log stream in App Service

Here are the full logs for a Flask app successfully starting in an App Service container:

2023-06-27T20:00:33.556Z INFO  - 3.10_20230519.2.tuxprod Pulling from appsvc/python
2023-06-27T20:00:33.559Z INFO  -  Digest: sha256:d7f1824d43ab89f90ec317f32a801ecffd4321a3d4a710593658be9bd980cd22
2023-06-27T20:00:33.560Z INFO  -  Status: Image is up to date for
2023-06-27T20:00:33.563Z INFO  - Pull Image successful, Time taken: 0 Minutes and 0 Seconds
2023-06-27T20:00:34.710Z INFO  - Starting container for site
2023-06-27T20:00:34.711Z INFO  - docker run -d --expose=8000 --name flask-server-core-7icehkhjdeox2-appservice_5_edde42ea -e WEBSITE_CORS_ALLOWED_ORIGINS=, -e WEBSITE_CORS_SUPPORT_CREDENTIALS=False -e WEBSITE_SITE_NAME=flask-server-core-7icehkhjdeox2-appservice -e WEBSITE_AUTH_ENABLED=False -e WEBSITE_ROLE_INSTANCE_ID=0 -e -e WEBSITE_INSTANCE_ID=a822bcb6dd314caab4bd83084cc7a3991e4965ec4f97b7ce99c0ca46861dc419 -e HTTP_LOGGING_ENABLED=1 -e WEBSITE_USE_DIAGNOSTIC_SERVER=False appsvc/python:3.10_20230519.2.tuxprod  
2023-06-27T20:00:37.357175818Z    _____                               
2023-06-27T20:00:37.357230418Z   /  _  \ __________ _________   ____  
2023-06-27T20:00:37.357235518Z  /  /_\  \\___   /  |  \_  __ \_/ __ \ 
2023-06-27T20:00:37.357239618Z /    |    \/    /|  |  /|  | \/\  ___/ 
2023-06-27T20:00:37.357243418Z \____|__  /_____ \____/ |__|    \___  >
2023-06-27T20:00:37.357247318Z         \/      \/                  \/ 
2023-06-27T20:00:37.357251218Z A P P   S E R V I C E   O N   L I N U X
2023-06-27T20:00:37.357258418Z Documentation:
2023-06-27T20:00:37.357261918Z Python 3.10.11
2023-06-27T20:00:37.357282418Z Note: Any data outside '/home' is not persisted
2023-06-27T20:00:41.641875105Z Starting OpenBSD Secure Shell server: sshd.
2023-06-27T20:00:41.799900179Z App Command Line not configured, will attempt auto-detect
2023-06-27T20:00:42.761658829Z Starting periodic command scheduler: cron.
2023-06-27T20:00:42.761688529Z Launching oryx with: create-script -appPath /home/site/wwwroot -output /opt/startup/ -virtualEnvName antenv -defaultApp /opt/defaultsite
2023-06-27T20:00:42.876778283Z Found build manifest file at '/home/site/wwwroot/oryx-manifest.toml'. Deserializing it...
2023-06-27T20:00:42.887163588Z Build Operation ID: 820645c3a1e60b5e
2023-06-27T20:00:42.890123289Z Oryx Version: 0.2.20230512.3, Commit: a81ce1fa16b6e03d37f79d3ba5e99cf09b28e4ef, ReleaseTagName: 20230512.3
2023-06-27T20:00:42.897199993Z Output is compressed. Extracting it...
2023-06-27T20:00:42.964545124Z Extracting '/home/site/wwwroot/output.tar.gz' to directory '/tmp/8db774903eba755'...
2023-06-27T20:00:46.203967540Z App path is set to '/tmp/8db774903eba755'
2023-06-27T20:00:46.728397586Z Detected an app based on Flask
2023-06-27T20:00:46.730162987Z Generating `gunicorn` command for 'app:app'
2023-06-27T20:00:46.770331805Z Writing output script to '/opt/startup/'
2023-06-27T20:00:47.050828437Z Using packages from virtual environment antenv located at /tmp/8db774903eba755/antenv.
2023-06-27T20:00:47.052387737Z Updated PYTHONPATH to '/opt/startup/app_logs:/tmp/8db774903eba755/antenv/lib/python3.10/site-packages'
2023-06-27T20:00:50.406265801Z [2023-06-27 20:00:50 +0000] [67] [INFO] Starting gunicorn 20.1.0
2023-06-27T20:00:50.434991028Z [2023-06-27 20:00:50 +0000] [67] [INFO] Listening at: (67)
2023-06-27T20:00:50.441222333Z [2023-06-27 20:00:50 +0000] [67] [INFO] Using worker: sync
2023-06-27T20:00:50.473174263Z [2023-06-27 20:00:50 +0000] [70] [INFO] Booting worker with pid: 70
2023-06-27T20:00:53.772011632Z - - [27/Jun/2023:20:00:53 +0000] "GET /robots933456.txt HTTP/1.1" 404 91 "-" "HealthCheck/1.0"
2023-06-27T20:00:55.268900825Z - - [27/Jun/2023:20:00:55 +0000] "GET /robots933456.txt HTTP/1.1" 404 91 "-" "HealthCheck/1.0"

2023-06-27T20:01:47.691011982Z - - [27/Jun/2023:20:01:47 +0000] "GET /hello HTTP/1.1" 200 183 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36 Edg/113.0.1774.57"

A few notable logs:

  • 2023-06-27T18:22:17.803525340Z Detected an app based on Flask
    This log indicates that Oryx auto-detected a Flask app (by inspecting the requirements.txt) file.
  • 2023-06-27T18:22:17.803557841Z Generating `gunicorn` command for 'app:app'
    This indicates the Oryx detected an file and assumes it has an app object inside it.
  • 2023-06-27T18:22:42.540158812Z [2023-06-27 18:22:42 +0000] [67] [INFO] Starting gunicorn 20.1.0
    That's the start of the gunicorn server serving the Flask app. After it starts, the logs should show HTTP requests.

If you aren't seeing the full logs, it's possible that your deploy happened too long ago and the portal has deleted some logs. In that case, open the Log stream and do another deploy, and you should see the full logs.

Alternatively, you can download the full logs from the Kudu interface. Select Advanced Tools from the side nav:

Screenshot of Azure Portal side nav with Advanced Tools selected

When the Kudu website loads, find the Current Docker Logs link and select Download as zip next to it:

Screenshot of website list of links

In the downloaded zip file, find the filename that starts with the most recent date and ends with "_default_docker.log":

Screenshot of extracted zip folder with files

Open that file to see the full logs, with the most recent logs at the bottom.