Friday, March 3, 2023

Deploying a containerized FastAPI app to Azure Container Apps

Based on my older post about deploying a containerized Flask app to Azure Container Apps, here's a similar process for deploying a FastAPI app instead.

First, some Docker jargon:

  • A Docker image is a multi-layered environment that is exactly the environment your app thrives in, such as a Linux OS with Python 3.11 and FastAPI installed. You can also think of an image as a snapshot or a template.
  • A Docker container is an instance of an image, which could run locally on your machine or in the cloud.
  • A registry is a place to host images. There are cloud hosted registries like DockerHub and Azure Container Registry. You can pull images down from those registries, or push images up to them.

These are the high-level steps:

  1. Build an image of the FastAPI application locally and confirm it works when containerized.
  2. Push the image to the Azure Container Registry.
  3. Create an Azure Container App for that image.

Build image of FastAPI app

I start from a very simple FastAPI app inside a main.py file:

import random

import fastapi

app = fastapi.FastAPI()

@app.get("/generate_name")
async def generate_name(starts_with: str = None):
    names = ["Minnie", "Margaret", "Myrtle", "Noa", "Nadia"]
    if starts_with:
        names = [n for n in names if n.lower().startswith(starts_with)]
    random_name = random.choice(names)
    return {"name": random_name}

I define the dependencies in a requirements.txt file. The FastAPI documentation generally recommends uvicorn as the server, so I stuck with that.

fastapi==0.92.0
uvicorn[standard]==0.20.0

Based on FastAPI's tutorial on Docker deployment, I add this Dockerfile file:

FROM python:3.11

WORKDIR /code

COPY requirements.txt .

RUN pip install --no-cache-dir --upgrade -r requirements.txt

COPY . .

EXPOSE 80

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80", "--proxy-headers"]

That file tells Docker to start from a base image which has python 3.11 installed, create a /code directory, install the package requirements, copy the code into the directory, expose port 80, and run the uvicorn server at port 80.

I also add a .dockerignore file to make sure Docker doesn't copy over unneeded files:

.git*
**/*.pyc
.venv/

I build the Docker image using the "Build image" option from the VS Code Docker extension. However, it can also be built from the command line:

docker build --tag fastapi-demo .

Now that the image is built, I can run a container using it:

docker run -d --name fastapi-container -p 80:80 fastapi-demo

The Dockerfile tells FastAPI to use a port of 80, so the run command publishes the container's port 80 as port 80 on the local computer. I visit localhost:80/generate_name to confirm that my API is up and running. 🏃🏽‍♀️

Deploying Option #1: az containerapp up

The Azure CLI has a single command that can take care of all the common steps of container app deployment: az container up.

From the app folder, I run the up command:

az containerapp up \
  -g fastapi-aca-rg \
  -n fastapi-aca-app \
  --registry-server pamelascontainerregistry.azurecr.io \
  --ingress external \
  --target-port 80 \
  --source .

That command does the following:

  1. Creates an Azure resource group named "fastapi-aca-rg". A resource group is basically a folder for all the resources it creates after.
  2. Creates a Container App Environment and Log Analytics workspace inside that group.
  3. Builds the container image using the local Dockerfile.
  4. Pushes the image to my existing registry (pamelascontainerregistry.azurecr.io). I'm reusing my old registry to save costs, but if I wanted the command to create a new one, I would just remove the registry-server argument.
  5. Creates a Container App "fastapi-aca-app" that uses the pushed image and allows external ingress on port 80 (public HTTP access).

When the steps are successful, the public URL is displayed in the output:

Browse to your container app at:
http://fastapi-aca-app.salmontree-4f877506.northcentralusstage.azurecontainerapps.io 

Whenever I update the app code, I run that command again and it repeats the last three steps. Easy peasy! Check the az containerapp up reference to see what additional options are available.

Deploying Option #2: Step-by-step az commands

If you need more customization of the deploying process than is possible with up, it's also possible to do each of those steps yourself using specific Azure CLI commands.

Push image to registry

I follow this tutorial to push an image to the registry, with some customizations.

I create a resource group:

az group create --location eastus --name fastapi-aca-rg

I already had a container registry from my containerized FastAPI app, so I reuse that registry to save costs. If I didn't already have it, I'd run this command to create it:

az acr create --resource-group fastapi-aca-rg \
  --name pamelascontainerregistry --sku Basic

Then I log into the registry so that later commands can push images to it:

az acr login --name pamelascontainerregistry

Now comes the tricky part: pushing an image to that repository. I am working on a Mac with an M1 (ARM 64) chip, but Azure Container Apps (and other cloud-hosted container runners) expect images to be built for an Intel (AMD 64) chip. That means I can't just push the image that I built in the earlier step, I actually have to build specifically for AMD 64 and push that image.

One way to do that is with the docker buildx command, specifying the target architecture and target registry location:

docker buildx build --push --platform linux/amd64 \
    -t pamelascontainerregistry.azurecr.io/fastapi-aca:latest .

However, a much faster way to do it is with the az acr build command, which uploads the code to cloud and builds it there:

az acr build --platform linux/amd64 \
    -t pamelascontainerregistry.azurecr.io/fastapi-aca:latest \
    -r pamelascontainerregistry .

⏱ The `docker buildx` command takes ~ 10 minutes, whereas the `az acr build` command takes only a minute. Nice!

Deploy to Azure Container App

Now that I have an image uploaded to a registry, I can create a container app for that image. I followed this tutorial.

I upgrade the extension and register the necessary providers:

az extension add --name containerapp --upgrade
az provider register --namespace Microsoft.App
az provider register --namespace Microsoft.OperationalInsights

Then I create an environment for the container app:

az containerapp env create --name fastapi-aca-env \
    --resource-group fastapi-aca-rg --location eastus

Next, I generate credentials to use for the next step:

az acr credential show --name pamelascontainerregistry

Finally, I create the container app, passing in the username and password from the credentials:

az containerapp create --name fastapi-aca-app \
    --resource-group fastapi-aca-rg \
    --image pamelascontainerregistry.azurecr.io/fastapi-aca:latest \
    --environment fastapi-aca-env \
    --registry-server pamelascontainerregistry.azurecr.io \
    --registry-username pamelascontainerregistry \
    --registry-password <PASSWORD HERE> \
    --ingress external \
    --target-port 80

The command returns JSON describing the created resource, which includes the URL of the created app in the "fqdn" property. That URL is also displayed in the Azure portal, in the overview page for the container app. I followed the URL, appended the API route "/generate_name", and verified it responded successfully. 🎉 Woot!

When I make any code updates, I re-build the image and tell the container app to update:

az acr build --platform linux/amd64 \
    -t pamelascontainerregistry.azurecr.io/fastapi-aca:latest \
    -r pamelascontainerregistry .

az containerapp update --name fastapi-aca-app \
  --resource-group fastapi-aca-rg \
  --image pamelascontainerregistry.azurecr.io/fastapi-aca:latest 

🐳 Now I'm off to containerize more apps!

Thursday, March 2, 2023

Rendering matplotlib charts in Flask

Way back in the day when I was working in Google developer relations, we released one of my favorite tools: the Google Charts API. It was an image API which, with the right set of query parameters, could produce everything from bar charts to "Google-o-meters". The team even added custom map markers to the API at one point. I pointedly asked team "do you promise not to deprecate this?" before using the map marker output in Google Maps API demos, since Google had started getting deprecation-happy. They promised not to, but alas, just a few years later, the API was indeed deprecated!

In honor of the fallen API, I created a sample app this week that serves a simple Charts API, currently outputting just bar charts and pie charts.

API URLs on left side and Chart API output on right side

To output the charts, I surveyed the landscape of Python charting options and decided to go with matplotlib, since I have at least some familiarity with it, plus it even has a documentation page showing usage in Flask.

Figure vs pyplot

When using matplotlib in Jupyter notebooks, the standard approach is to use pylot like so:

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.pie(data["values"], labels=data.get("labels"))

However, the documentation discourages use of pyplot in web servers and instead recommends using the Figure class, like so:

from matplotlib.figure import Figure

fig = Figure()
axes = fig.subplots(squeeze=False)[0][0]
axes.pie(data["values"], labels=data.get("labels"))

From Figure to PNG response

The next piece of the puzzle is turning the Figure object into a Flask Response containing a binary PNG file. I first save the figure into a BytesIO buffer from Python's io module, then seek to the beginning of that buffer, and finally use send_file to return the data with the correct mime-type.

from io import BytesIO

# Save it to a temporary buffer.
buf = BytesIO()
fig.savefig(buf, format="png")
buf.seek(0)
return send_file(buf, download_name="chart.png", mimetype="image/png")

You can see it all together in __init__.py, which also uses the APIFlask framework to parameterize and validate the routes.

Fast-loading Python Dev Containers

Since discovering Dev Containers and Github Codespaces a few months ago, I've become a wee bit of a addict. I love having an environment customized for each project, complete with all the tools and extensions, and with no concern for polluting the global namespace of my computer. However, there is one big drawback to loading a project in a Dev Container, whether locally in VS Code or online in Codespaces: it's slow! If the container isn't already created, it has to be built, and building a Docker container takes time.

So I want my Dev Containers to open fast, and pretty much all of mine are in Python. There are two ways I've used to build Python Dev Containers, and I now realize one is much faster than the other. Let's compare!

Python base image

Perhaps the most obvious approach is to use a Python-dedicated image, like Microsoft's Python devcontainer image. That just requires a Dockerfile like this one:

FROM --platform=amd64 mcr.microsoft.com/devcontainers/python:3.11

Universal container + python feature

However, there's another option: use a Python-less base image, like Microsoft's base devcontainer image and then add the "python" Feature on top. Features are a way to add common toolchains to a Dev Container, like for Python, Node.JS, Git, etc.

For this option, we start with the Dockerfile like this:

FROM --platform=amd64 mcr.microsoft.com/devcontainers/base:bullseye

And then, inside devcontainer.json, we add a "features" key and nest this feature definition inside:

"ghcr.io/devcontainers/features/python:1": {
    "version": "3.11"
}

Off to the races!

Which one of those options is better in terms of build time? You might already have a guess, but I wanted to get some empirical data behind my Dev Container decisions, so I compared the same project with the two options. For each run, I asked VS Code to build the container without a cache, and I recorded the resulting size and time from start to end of build. The results:

python image base + python feature
Base image size 1.29 GB 790.8 MB
Dev Container size 1.29 GB 1.62 GB
Time to build 12282 ms 94643 ms

The python image starts off larger than the base image (1.29 GB vs 790 MB), but after the build, the base+python container is larger at 1.62 GB. There's a huge difference in the time to build, with the python container building in 12,282ms while the base+python container requiring 8x as much time. So, features seem to take a lot of time to build. I'm not a Docker/Dev Container expert, so I won't attempt to explain why, but for now, I'll avoid features when possible. Perhaps the Dev Container team has plans for speeding up features in the future, so I'll keep my eyes peeled for updates on that front.

If you've been experimenting with Python Dev Containers as well and have learnings to share, let me know!

Hosting Python Web Apps on Azure: A Price-Off

When I started my job in June as a Microsoft Cloud Advocate for Python developers, one of my first goals was to port my old Google App Engine apps over to Azure. Those apps were doomed anyway, as they were using Python 2.7, so it was a great time to give them a make-over. I deploy most Azure samples to my corporate account, but I wanted to experience all the aspects of being an Azure developer, including the $$$, so I decided to host them on a personal Azure account instead. I also decided to deploy to a couple different hosting options, to see the differences.

The two apps:

  • pamelafox.org: My personal homepage, a Flask website that pulls data from a Google Spreadsheet (it's like a database, I swear!). It was previously on App Engine with memcached to cache the data pulls, so I ported it to Azure Container Apps with a CDN in front of the views.
  • translationtelephone.com: A game to translate a phrase between many languages. It was previously on App Engine with the default GAE database (BigTable-based), so I ported it to Azure App Service with a PostgreSQL Flexible Server.

Both of these are relatively low-traffic websites, so I would expect to pay very little for their hosting costs and for my traffic to be compatible with the cheaper SKUs of each Azure resource. Let's see how the costs have worked out for each of them.

Azure Container Apps + CDN

Here are the costs for January 2023, for all the resources involved in hosting pamelafox.org:

Azure resource SKU Monthly cost
Container Registry Basic $5.15
Azure Container Apps (No SKUs) $3.23
Content Delivery Network Microsoft CDN (Classic) $0.00

Perhaps surprisingly, the most expensive part is the Container Registry pricing. The least expensive SKU has a per-day pricing that can host up to 10 GB. My registry is storing only a single container that's only 0.33 GB, so I'm not maximizing that resource. If I had a few (or 30!) more container apps, I could re-use the same registry and share the cost.

The actual host, Azure Container Apps, is only $3.23, and that's likely helped by the Azure CDN in front. Azure Container Apps are priced according to usage (memory/vCPU), and the CDN means less usage of the ACA. The CDN cost only $0, which is my kind of number!

Azure App Service + PostgreSQL

Here are the costs for January 2023, for all the resources involved in hosting pamelafox.org:

Azure resource SKU Monthly cost
App Service Basic (B1) $12.11
DB for PostgreSQL Flexible Server Basic (B1MS) $0.00 (Free for now)

My costs for this website are higher due to App Service pricing per hour, regardless of requests/memory/CPU usage. For my region and currency, at the B1 tier, that's always around $12 per month. There is also a Free tier, but it doesn't support custom domains, so it's not an option for this site.

The PostgreSQL server is currently free for me, as I'm taking advantage of the first 12 months free. That only applies to one managed PostgreSQL server, any additional ones will cost me. Past the first year and first server, the cost would be ~$13 for the SKU and region I'm using.

Takeaways

Given the way that Azure Container Apps is priced, it seems like a better fit for my low-traffic websites, especially in conjunction with an aggressive CDN. I may port Translation Telephone over to ACA instead, to halve the costs there. However, I've actually already ported pamelafox.org to Azure Static Web Apps, which should be entirely free, since I've realized it's a bit of a waste to run a Flask app for a webpage that changes so infrequently. That's another great option for hobbyist websites.

If you're figuring our potential pricing for your Python apps on Azure, definitely read through the pricing pages, try the Azure pricing calculator, and make sure to set up a budget alert in your Azure account to remind you to continually monitor how much you're paying.

Monday, February 27, 2023

Testing APIFlask with schemathesis

When I asked developers on Mastodon "what framework do you use for building APIs?", the two most popular answers were FastAPI and Flask. So I set out building an API using Flask to see what the experience was like. I immediately found myself wanting a library like FastAPI to handle parameter validation for me, and I discovered APIFlask, a very similar framework that's still "100% compatible with the Flask ecosystem".


The Chart API

Using APIFlask, I wrote an API to generate Chart PNGs using matplotlib, with endpoints for both bar charts and pie charts. It could easily support many other kinds of charts, but I'll leave that as an exercise for the reader. 😉

API URLs on left side and Chart API output on right side

The following code defines the pie chart endpoint and schema:

class PieChartParams(Schema):
    title = String(required=False, validate=Length(0, 30))
    values = DelimitedList(Number, required=True)
    labels = DelimitedList(String)

@app.get("/charts/pie")
@app.input(PieChartParams, location="query")
def pie_chart(data: dict):
    fig = Figure()
    axes: Axes = fig.subplots(squeeze=False)[0][0]
    axes.pie(data["values"], labels=data.get("labels"))
    axes.set_title(data.get("title"))

    buf = BytesIO()
    fig.savefig(buf, format="png")
    buf.seek(0)
    return send_file(buf, download_name="chart.png", mimetype="image/png")

Using property-based testing

I like all my code to be fully tested, or at least, "fully" to my knowledge. I started off writing standard pytest unit tests for the endpoint (with much thanks to Github Copilot for filling in the tests). But I wasn't sure whether there were edge cases that I was missing, since my code wraps on top of matplotlib and I'm not a matplotlib expert.

What I wanted was property-based testing: automatically generated tests based on function parameter types, which make sure to include commonly missed edge cases (like negative numbers). In the past, I've used the hypothesis library for property-based testing. I wondered if there was a variant that was specially built for API frameworks, where the parameter types are declared in schemas, and lo-and-hold, I discovered that exact creature: schemathesis. It's built on top of hypothesis, and is perfect for FastAPI, APIFlask, or any similar framework.

The schemathesis library can be used from the command-line or inside pytest, using a fixture. I opted for the latter since I was already using pytest, and after reading their WSGI support docs, I came up with this file:

import pytest
import schemathesis
import hypothesis

from src import app

schema = schemathesis.from_wsgi("/openapi.json", app)

@schema.parametrize()
@hypothesis.settings(print_blob=True, report_multiple_bugs=False)
@pytest.mark.filterwarnings("ignore:Glyph:UserWarning")
def test_api(case):
    response = case.call_wsgi()
    case.validate_response(response)

There are a few additional lines in that file that you may not need: the hypothesis.settings fixture, and the pytest.mark.filterwarnings fixture. I added settings to increase the verbosity of the output, to help me replicate the issues, and I added filterwarnings to ignore the many glyph-related warnings coming out of matplotlib.

The API improvements

As a result of the schemathesis tests, I identified several parameters that could use additional validation.

I added both a min and max to the Number field in the values parameter:

values = DelimitedList(Number(validate=Range(min=0, max=3.4028235e38)), required=True)

I also added a custom validator for situations which couldn't be addressed with the built-in validators. The custom validator checks to make sure that the number of labels matches the number of values provided, and disallows a single value of 0.

@validates_schema
def validate_numbers(self, data, **kwargs):
    if "labels" in data and len(data["labels"]) != len(data["values"]):
        raise ValidationError("Labels must be specified for each value")
    if len(data["values"]) == 1 and data["values"][0] == 0:
        raise ValidationError("Cannot create a pie chart with single value of 0")

The full code can be seen in __init__.py.

I added specific tests for the custom validators to my unit tests, but I didn't do so for the built-in min/max validators, since I'm relying on the library's tests to ensure those work as expected.

Thanks to the property-based tests, users of this API will see a useful error instead of a 500 error. I'm a fan of property-based testing, and I hope to find ways to use it in future applications.

Wednesday, February 22, 2023

Managing Python dependency versions for web projects

Though I've worked on several production Python codebases, I've never been in charge of managing the dependencies. However, I now find myself developing many templates to help Python devs get started with web apps on Azure, and I want to set those up with best practices for dependency management. After discussing with my fellow Python advocates and asking on Mastodon, this seems to be the most commonly used approach:

  1. Pin all the production requirements for the web app. (Not necessary to pin development requirements, like linters.)
  2. Use a service like PyUp or Github Dependabot to notify you when a dependency can be upgraded.
  3. As long as tests all pass with the newest version, update to the new version. This assumes full test coverage, and as Brian Okken says, "you’re not testing enough, probably, test more." And if you really trust your tests, use a tool like Anthony Shaw's Dependa-lot-bot to auto-merge Github PRs when all checks pass.

I've now gone through and made that change in my web app templates, so here's what that looks like in an example repo Let's use flask-surveys-container-app, a containerized Flask app with a PostgreSQL database.


Pin the production requirements

My app already had a requirements.txt, but without any versions:

Flask
Flask-Migrate
Flask-SQLAlchemy
Flask-WTF
psycopg2
python-dotenv
SQLAlchemy
gunicorn
azure-keyvault-secrets
azure.identity

To figure out the current versions, I ran python3 -m pip freeze and copied the versions in:

Flask==2.2.3
Flask-Migrate==4.0.4
Flask-SQLAlchemy==3.0.3
Flask-WTF==1.1.1
psycopg2==2.9.5
python-dotenv==0.21.1
SQLAlchemy==2.0.4
gunicorn==20.1.0
azure-keyvault-secrets==4.6.0
azure-identity==1.12.0

Add Github dependabot

The first time that I set up Dependabot for a repo, I used the Github UI to automatically create the dependabot.yaml file inside the .github folder. After that, I copied the same file into every repo, since all my repos use the same options. Here's what the file looks like for an app that uses pip and stores requirements.txt in the root folder:

version: 2
updates:
  - package-ecosystem: "pip" # See documentation for possible values
    directory: "/" # Location of package manifests
    schedule:
      interval: "weekly"

If your project has a different setup, read through the Github docs to learn how to configure Dependabot.


Ensure the CI runs tests

For this system to work well, it really helps if there's an automated workflow that runs test and checks test coverage. Here's a snippet from the Python workflow file for the Flask app:

- name: Install dependencies
  run: |
    python -m pip install --upgrade pip
    pip install -r requirements-dev.txt
- name: Run Pytest tests
  run: pytest
  env:
    DBHOST: localhost
    DBUSER: postgres
    DBPASS: postgres
    DBNAME: postgres

To make sure the tests are actually testing the full codebase, I use coverage and make the tests fail if the coverage isn't high enough. Here's how I configure pyproject.toml to make that happen:

[tool.pytest.ini_options]
addopts = "-ra --cov"
testpaths = [
    "tests"
]
pythonpath = ['.']

[tool.coverage.report]
show_missing = true
fail_under = 100

Install Depend-a-lot-bot

Since I have quite a few Python web app repos (for Azure samples), I expect to soon see my inbox flooded with Dependabot pull requests. To help me manage them, I installed Dependa-lot-bot, which should auto-merge Github PRs when all checks pass.

I added .github/dependabot-bot.yaml file to tell the bot which packages are safe to merge:

safe:
 - Flask
 - Flask-Migrate
 - Flask-SQLAlchemy
 - Flask-WTF
 - psycopg2
 - python-dotenv
 - SQLAlchemy
 - gunicorn
 - azure-keyvault-secrets
 - azure-identity

Notably, that's every single package in this project's requirements, which may be a bit risky. Ideally, if my tests are comprehensive enough, they should notify me if there's an actual issue. If they don't, and an issue shows up in the deployed version, then that indicates my tests need to be improved.

I probably would not use this bot for a live website, like khanacademy.org, but it is helpful for the web app demo repos that I maintain.

So that's the process! Let me know if you have ideas for improvement or your own flavor of dependency management.

Monday, February 20, 2023

Loading multiple Python versions with Pyodide

As described in my last post, dis-this.com is an online tool for disassembling Python code. After I shared it last week in the Python forum, Guido asked if I could add a feature to switch Python versions, to see the difference in disassembly across versions. I was able to get it working for versions 3.9 - 3.11, but it was a little tricky due to the way Pyodide is designed.

I'm sharing my learnings here for anyone else building a similar tool in Pyodide.

Pyodide version ↔ Python version

Pyodide doesn't formally support multiple Python versions at once. The latest Pyodide version has the latest Python version that they've been able to support, as well as other architectural improvements. To get to older Python versions, you need to load older Pyodide versions that happened to support that version.

Here's a JS object mapping Python versions to Pyodide versions:

const versionMap = {
    '3.11': 'dev',
    '3.10': 'v0.22.1',
    '3.9': 'v0.19.1',
};

As you can see, 3.11 doesn't yet map to a numbered release version. According to the repository activity, v0.23 will be the numbered release once it's out. I will need to update that in the future.

Once I know what Python version the user wants, I append the script tag and call loadPyodide once loaded:

const scriptUrl = `https://cdn.jsdelivr.net/pyodide/${pyodideVersion}/full/pyodide.js`;
const script = document.createElement('script');
script.src = scriptUrl;
script.onload = async () => {
    pyodide = await loadPyodide({
        indexURL: `https://cdn.jsdelivr.net/pyodide/${pyodideVersion}/full/`,
        stdout: handleStdOut,
    });
    // Enable the UI for interaction
    button.removeAttribute('disabled');
};

Loading multiple Pyodide versions in same page

For dis-this, I want users to be able to change the Python version and see the disassembly in that different version. For example, they could start on 3.11 and then change to 3.10 to compare the output.

Originally, I attempted just calling the above code with the new Pyodide version. Unfortunately, that resulted in some funky errors. I figured it related to Pyodide leaking globals into the window object, so my next attempt was deleting those globals before loading a different Pyodide version. That actually worked a lot better, but still failed sometimes.

So, to be on the safe side, I made it so that changing the version number reloads the page. The website already supports encoding the state in the URL (via the permalink), so it wasn't actually too much work to add the "&version=" parameter to the URL and reload.

This code listens to the dropdown's change event and reloads the window to the new permalink:

document.getElementById('version-select').addEventListener('change', async () => {
    pythonVersion = document.getElementById('version-select').value;
    permalink.setAttribute('version', pythonVersion);
    window.location.search = permalink.path;
});

That permalink element is a Web Component that knows how to compute the correct path. Once the page reloads, this code grabs the version parameter from the URL:

wpythonVersion = new URLSearchParams(window.location.search).get('version') || '3.11';

The codebase for dis-this.com is relatively small, so you can also look through it yourself or fork it if you're creating a similar tool.

.