Friday, September 30, 2022

Deploying a containerized Flask app to Azure Container Apps

I've used Docker for many years as a local development environment (first for Khan Academy and then Berkeley CS61A), but until this week, I'd never personally made a Docker image or deployed one to production.

One of our offerings at Microsoft is Azure Container Apps, a way to run Docker containers in the cloud, which gives me a great excuse to get my feet wet in the world of containerization. 🐳

To share what I've learnt, I'll walkthrough the process of containerizing a Flask app and running it on Azure.

First, some Docker jargon:

  • A Docker image is a multi-layered environment that is exactly the environment your app thrives in, such as a Linux OS with Python 3.9 and Flask installed. You can also think of an image as a snapshot or a template.
  • A Docker container is an instance of an image, which could run locally on your machine or in the cloud.
  • A registry is a place to host images. There are cloud hosted registries like DockerHub and Azure Container Registry. You can pull images down from those registries, or push images up to them.

These are the high-level steps:

  1. Build an image of the Flask application locally and confirm it works when containerized.
  2. Push the image to the Azure Container Registry.
  3. Run an Azure Container App for that image.

Build image of Flask app

I started from a very simple app, the Flask app used by other Azure tutorials:
https://github.com/Azure-Samples/msdocs-python-flask-webapp-quickstart

Flask's development server is not suitable for production, so I brought in the Gunicorn server as a requirement in requirements.txt:

Flask==2.0.2
gunicorn

I put Gunicorn settings in gunicorn_config.py:

bind = "0.0.0.0:5000"
workers = 4
threads = 4
timeout = 120

Then I added this Dockerfile file in the root:

# syntax=docker/dockerfile:1
# syntax=docker/dockerfile:1

FROM python:3.9.13

WORKDIR /code

COPY requirements.txt .

RUN pip3 install -r requirements.txt

COPY . .

EXPOSE 5000

ENTRYPOINT ["gunicorn", "-c", "gunicorn_config.py", "app:app"]

That file tells Docker to start from a base image which has python 3.9.13 installed, create a /code directory, install the package requirements, copy the code into the directory, and then finally use Gunicorn run the Flask application..

I also added a .dockerignore file to make sure Docker doesn't copy over unneeded files:

.git*
**/*.pyc
.venv/

You can download the full code from this repository.

I built a Docker image using the "Build image" option from the VS Code Docker extension. However, it can also be built from the command line:

docker build --tag flask-demo .

Now that the image is built, I can run a container using it:

docker run -d -p 5000:5000 flask-demo

The Flask default port is 5000, so the run command must specify that for the connections to work.

Push image to registry

I followed this tutorial to push an image to the registry, with some customizations.

I created a resource group:

az group create --name flask-container-apps --location eastus

Then I created a registry using a unique name and logged in to that registry:

az acr create --resource-group flask-container-apps \
  --name pamelascontainerregistry --sku Basic

az acr login --name pamelascontainerregistry

Now comes the tricky part: pushing an image to that repository. I am working on a Mac with an M1 (ARM 64) chip, but Azure Container Apps (and other cloud-hosted container runners) expect images to be built for an Intel (AMD 64) chip. That means I can't just push the image that I built in the earlier step, I actually have to build specifically for AMD 64 and push that image.

One way to do that is with the docker buildx command, specifying the target architecture and target registry location:

docker buildx build --push --platform linux/amd64 \
    -t pamelascontainerregistry.azurecr.io/flask-demo:latest .

However, a much faster way to do it is with the az acr build command, which uploads the code to cloud and builds it there:

az acr build --platform linux/amd64 \
    -t pamelascontainerregistry.azurecr.io/flask-demo:latest \
    -r pamelascontainerregistry .

⏱ The `docker buildx` command took ~ 10 minutes, whereas the `az acr build` command took only a minute. Nice!

Run Azure Container App

Now that I have an image uploaded to a registry, I can create a container app for that image. I followed this tutorial.

I upgraded the extension and registered the necessary providers:

az extension add --name containerapp --upgrade
az provider register --namespace Microsoft.App
az provider register --namespace Microsoft.OperationalInsights

⏱ That takes a few minutes.

Then I created the resource group and container app environment:

az group create --name flask-container-apps -location canadacentral

az containerapp env create --name flask-container-environment \
    --resource-group flask-container-apps --location canadacentral

Next, I generated credentials so that the command line tool could log in to my registry:

az acr credential show --name pamelascontainerregistry

Finally, I created the container app:

az containerapp create --name my-container-app \
    --resource-group flask-container-apps \
    --image pamelascontainerregistry.azurecr.io/flask-demo:latest \
    --environment flask-container-environment \
    --registry-server pamelascontainerregistry.azurecr.io \
    --register-username pamelascontainerregistry \
    --registry-password $REGISTRY_PASSWORD \
    --ingress external \
    --target-port 5000

Once that deployed, I followed the URL from the Azure portal to view the website in the browser and verified it was all working. 🎉 Woot!

If I make any Flask code updates, all I need to do is re-build the image and tell the container app to update:

az acr build --platform linux/amd64 \
    -t pamelascontainerregistry.azurecr.io/flask-demo:latest \
    -r pamelascontainerregistry .

az containerapp update --name my-container-app \
  --resource-group my-container-apps \
  --image pamelascontainerregistry.azurecr.io/flask-demo:latest 

⏱ Those commands are fairly fast, about 30 seconds each.

🐳 Now I'm off to containerize more apps!

Thursday, September 22, 2022

Returning an image from an Azure Function App in Python

I wrote a tiny icon-writer package this week that uses pillow to generate simple text-only icons, and used that to generate images to replace the fancy logo'd icons on my Mac OS X dock.

After generating the icons on my own machine, I wanted to make a website that others could use to generate them, so I started off by writing an Azure Function App.

I considered a few options for what the function could return:

  1. It could store the image in an Azure Storage container and return the URL to the created image.
  2. It could return a base-64 encoded string, and I could serve that in a webpage img tag using a data URI.
  3. It could return an actual binary image with the mime-type set appropriately, to be used directly as the src of the img.

I went for option three, since I liked the idea of being able to test the API in the browser and instantly see an image, plus I wanted to make sure it could be done.

Here's the relevant code in the function:

img = icon_writer.write_icon(text, size=size, bgcolor=bgcolor, fontcolor=fontcolor)
img_byte_arr = io.BytesIO()
img.save(img_byte_arr, format='PNG')
img_byte_arr = img_byte_arr.getvalue()
return func.HttpResponse(img_byte_arr, mimetype='image/png')

It creates a new pillow Image using my write_icon function and saves that into a BytesIO object as a PNG. It converts that object into a bytes array using BytesIO.getvalue(). Finally, it returns an HttpResponse with the bytes array as the body and a mime-type of 'image/png'.

See the full function code on Github.

Then I can call the function in a webpage just like this:

const image = document.createElement("img");
image.src = `https://iconwriterfunction-apim.azure-api.net/icon-writer-function/IconWriter?${params}`;
document.getElementById("output").appendChild(image);

That src attribute actually uses an API Management service URL, which protects the Funtions App endpoint with a CORS policy.

You can try the website here.

Wednesday, September 21, 2022

Preparing a Django app for deployment on Azure App Service

I recently went through the process of creating a simple Django app and deploying it to Azure App Service, and discovered I had to make several changes to get the app working in a production environment. Many of those changes are common across production environments and described in the Django deployment checklist, but a few details are Azure-specific.

Use environment variables

A best practice is to store settings in environment variables, especially sensitive settings like database authentication details. You can set environment variables locally with export on the command-line, but a more repeatable local strategy is to put them in a file and load them from that file using the python-dotenv package.

First, add python-dotenv to your requirements.txt. Here's what mine looked like:

Django==4.1.1
psycopg2
python-dotenv

Then create a .env file with environment variables and their local settings:

FLASK_ENV=development
DBNAME=quizsite
DBHOST=localhost
DBUSER=pamelafox
DBPASS=

Note that it's completely fine for this file to get checked in, since these are only the local values, and your production DB should definitely have a different password than your local DB. 😬

Now adjust your current settings.py to use these environment variables (if it wasn't already):

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql_psycopg2",
        'NAME': os.environ['DBNAME'],
        'HOST': os.environ['DBHOST'],
        'USER': os.environ['DBUSER'],
        'PASSWORD': os.environ['DBPASS']
    }
}

To make sure those environment variables are actually loaded in when running locally, you need to edit manage.py. Only the local environment should get its variables from that file, so add a check to see if 'WEBSITE_HOSTNAME' is a current environment variable. That variable gets set by the Azure build system when it deploys an app, so it will always be set on production and it should not get set locally.

import os
import sys

from dotenv import load_dotenv

def main():
    """Run administrative tasks."""
    
    is_prod = 'WEBSITE_HOSTNAME' in os.environ
    if not is_prod:
        print("Loading environment variables from .env file")
        load_dotenv('./.env')

    try:
        from django.core.management import execute_from_command_line
    except ImportError as exc:
        raise ImportError(
            "Couldn't import Django. Are you sure it's installed and "
            "available on your PYTHONPATH environment variable? Did you "
            "forget to activate a virtual environment?"
        ) from exc
    execute_from_command_line(sys.argv)


if __name__ == '__main__':
    main()

Use production settings

After those changes, the app should run fine locally, but it's not ready for production. There are a number of settings that should be different in production mode, such as DEBUG, SECRET_KEY,ALLOWED_HOSTS, CSRF_TRUSTED_ORIGINS, and DATABASES.

A typical way to customize the settings for production is to add a new file that imports all the previous settings and overrides only the handful needed for production. Here's what my production.py looks like:

from .settings import *
import os

DEBUG = False
SECRET_KEY = os.environ['SECRET_KEY']

# Configure the domain name using the environment variable
# that Azure automatically creates for us.
ALLOWED_HOSTS = [os.environ['WEBSITE_HOSTNAME']]
CSRF_TRUSTED_ORIGINS = ['https://'+ os.environ['WEBSITE_HOSTNAME']] 


DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': os.environ['DBNAME'],
        # DBHOST is only the server name, not the full URL
        'HOST': os.environ['DBHOST'] + ".postgres.database.azure.com",
        'USER': os.environ['DBUSER'],
        'PASSWORD': os.environ['DBPASS']
    }
}

That file also uses the 'WEBSITE_HOSTNAME' environment variable, this time for setting the values of 'ALLOWED_HOSTS' and 'CSRF_TRUSTED_ORIGINS'.

Now you need to make sure the production settings get used when the app is running in a production environment.

First modify wsgi.py, since Azure uses WSGI to serve Django applications.

from django.core.wsgi import get_wsgi_application

is_prod = 'WEBSITE_HOSTNAME' in os.environ
settings_module = 'quizsite.production' if is_prod else 'quizsite.settings'
os.environ.setdefault("DJANGO_SETTINGS_MODULE", settings_module)

application = get_wsgi_application()

You also need to modify manage.py since Azure calls it to run Django commands on the production server, like manage.py collectstatic.

import os
import sys

from dotenv import load_dotenv

def main():
    is_prod = 'WEBSITE_HOSTNAME' in os.environ
    if not is_prod:
        print("Loading environment variables from .env file")
        load_dotenv('./.env')

    settings_module = "quizsite.production" if is_prod else 'quizsite.settings'
    os.environ.setdefault('DJANGO_SETTINGS_MODULE', settings_module)
    
    try:
        from django.core.management import execute_from_command_line
    except ImportError as exc:
        raise ImportError(
            "Couldn't import Django. Are you sure it's installed and "
            "available on your PYTHONPATH environment variable? Did you "
            "forget to activate a virtual environment?"
        ) from exc
    execute_from_command_line(sys.argv)

if __name__ == '__main__':
    main()

Deploy!

Your app should be ready for deployment, code-wise. You can follow the instructions in a tutorial like Deploy a Python web app with PostgreSQL in Azure, but with your own app instead of the sample app. That tutorial includes instructions for setting environment variables on production, using either the VS Code extension or the Azure portal.

You may decide to store the SECRET_KEY value inside the Azure Key vault and retrieve it from there instead, following this KeyVault tutorial.

If you want to see all the recommended changes in context, check out the repository for my sample app .

Tuesday, September 20, 2022

How I setup a Python project

As I prepare for my new role at Microsoft Cloud Advocacy for Python, I've been spinning up a bunch of Python projects over the last few months and learning about tools to improve my development workflow. I'm sharing my project setup process in this post to help other developers start new Python projects and to find out how my process can improve.

My setup process always takes place in a Mac OS X environment, but much of this should be the same across operating systems.

Setup IDE

I use vim when editing single files, but when working on multi-file projects I much prefer VS Code. It's fast, it's got great built-in capabilities, and the extensions are incredibly helpful for domain specific tasks.

If it's my first time on a machine, then I need to:

Create folder

Now I make a folder for my project, either via the command line:

mkdir my-python-project 

Or in VS Code, by selecting File > Open Folder … and clicking New Folder in the window that pops up.

Initialize version control

I always use Git for version control in my projects for local source code history, and then use Github to backup the code to the cloud.

First I create the local Git repository:

git init

Then I create a new Github repo and add it as a remote for my local repo:

git remote add origin https://github.com/pamelafox/my-python-project-sept7.git 

Finally I create a .gitignore file to tell Git what files/folders to exclude from source control. I'll start it off with a single line to ignore any __pycache__ folders created automatically by the interpreter.

__pycache__/ 

You can learn more on Git/Github from this Github tutorial.

Setup virtual environment

A virtual environment makes it easy to install the dependencies of one Python project without worrying about conflict with dependencies of other projects. It is possible to develop in Python outside of a virtual environment (in the global environment of your system), but that has bitten me so many times that I now always jump into a virtual environment.

Python 3 has built-in support for virtual environments using the venv module. I run this command to create a virtual environment:

python3 -m venv .venv

That creates the environment in the .venv folder, so that's where all the project dependencies will get installed. You can name that folder something else, but .venv or venv are the most standard.

Next I need to actually start the virtual environment, to make sure my terminal is issuing all commands from inside it:

source .venv/bin/activate 

I don't want to check the project's third-party dependencies into my own Git repo, however, both because that's a lot of files to track and because that code should stay in the third-party repos, so I add another line to the .gitignore file:

.venv/

Install requirements

I always install project dependencies with pip, which downloads them from the Python package index at pypi.org. However, it's best practice to declare dependencies in a requirements.txt file and install from that file, so that anyone else setting up the project will end up with the same dependencies.

I create the requirements.txt file and seed it with any packages I know I need:

numpy
opencv-python
scikit-image
matplotlib

It's also possible to specify the version of each package, but I typically use the latest stable version.

Once that file exists, I ask pip to install everything in it:

pip install -r requirements.txt 

Coding time!

This might be the point where I start writing code, or I may wait until I've done the full setup.

Regardless of when I start code writing, this is a typical folder layout for a non-web Python project:

I have one main module, the entry point, which imports functions or classes from helper modules. Then I put all my tests in a folder, with one test file per module.

At this point, I should be able to actually run the code I've written:

python main.py 

I don't have to worry about distinguishing between the python command vs. python3 command anymore, as I'm running that command inside my virtual environment, and it will always use the version of Python that initialized it.

Install development requirements

Our code might work, but is it good code?

Fortunately, there are many tools in the Python ecosystem to help us write better code with a consistent code style.

My favorite tools are all third-party packages, so they need to be installed before I can use them. A best practice is to list them in a separate file, like requirements-dev.txt, to make a distinction between the packages needed for production and the additional packages needed for local development.

I create requirements-dev.txt and list the tool dependencies:

-r requirements.txt 
black==22.3.0
pytest==7.1.2 
coverage==6.4.1 
pytest-cov==3.0.0 
pre-commit 
flake8 

*I only include version numbers for the tools here so that you can see how that's specified.

The first line tells pip to also install everything in requirements.txt. By including that line, I can now run this single command to install both the prod and dev dependencies:

pip install -r requirements-dev.txt 

Run linter

A linter is a tool that looks for "lint" in your code: common coding errors or style issues. The most commonly used Python linter is flake8 which is actually a wrapper around three smaller tools. It's possible to just use flake8 out of the box with its default configuration, but many developers (including myself) like to customize it.

I add a .flake8 file in the project root with some initial options:

[flake8] 
ignore = D203
max-complexity = 10
max-line-length = 100
exclude = .venv
count = True
statistics = True
show_source = True
  • The ignore option tells it what error codes to ignore (that particular one is just a whitespace style nit).
  • The max-complexity option tells it how complex a nested conditional can be (I.e. how many levels).
  • The max-line-length option specifies the maximum number of characters for each line (a very personal decision!).
  • The exclude option tells it not to scan the .venv folder, since we don't want to know about issues in the third-party dependencies. Hopefully they're running a linter themselves!
  • The count option says we want to see the number of errors, while statistics say we also want a breakdown, and then show_source makes it show the code responsible for any errors.

Now I can run flake8. I actually run it twice, for different levels of errors, first checking for the most serious issues:

flake8 . --select=E9,F63,F7,F82 

The error code E9 comes from the wrapped pycodestyle library error codes and refers to any parse errors (IndentationError/SyntaxError) or detectable IO errors.

The other error codes are from the wrapped pyflakes library: F63 checks for common issues with comparison operators, F7 covers misplaced statements, and F82 looks for undefined names.

Now I run it again, this time checking for all issues, where the rest is more likely to be style issues. The style violations are based on the official PEP 8 style guide, so they are fairly agreed upon in the Python community.

flake8 . 

If I see style issues at this point, I might fix them or I m ight wait until I introduce the next tool which can autofix many style issues for me. 😀

Tool configuration

The Python community recently decided on a best practice for tool configuration: putting all options for all tools inside a single file, pyproject.toml. Unfortunately, flake8 doesn't yet follow that practice, but the rest of my tools do, so I now create an empty pyproject.toml in my root folder.

Run formatter

A formatter is a tool that will reformat code according to its own standards and your configured options. The most popular formatter is black, which is PEP 8 compliant and highly opinionated.

I configure black by adding this section to pyproject.toml:

[tool.black] 
line-length = 100 
target-version = ['py39'] 

Those options tell black that I'd like code lines to be no longer than 100 characters and that I'm writing code in Python 3.9.

Now, before I run black for the first time, I like to make sure all my code is committed, so that I can easily see the effect of black. I might also run it in --check mode first, to make sure it doesn't make any undessiirable changes or edit unexpected files:

black --verbose --check . tests 

Sometimes I discover I need to explicitly exclude some Python files from black's formatting rampage. In that case, I add an exclude option to pyproject.toml:

extend-exclude = ''' 
( 
  ^/migrations\/ 
) 
''' 

If all looks good, I'll run black for real:

black --verbose . tests/ 

Run tests

The built-in testing framework for Python is unittest, but another popular framework is pytest. That's my new preferred framework.

I configure that by adding another section to pyproject.toml:

[tool.pytest.ini_options] 
addopts = "-ra --cov"
testpaths = ["tests"] 
pythonpath = ['.']

The addopts option adds arguments to the pytest command, --ra for displaying extra test summary info and --cov for running coverage.py while running tests. That --cov option works only because I coverage and pytest-cov earlier. The testpaths option tells pytest where to find the tests, and pythonpath helps tests figure out where to import the modules from.

Now I can run all the tests:

pytest 

That will show me which tests passed or failed. If I want to see the coverage report, I can run:

coverage report

Install pre-commit hook

All of those tools are great, but it's way too easy to forget to run them. That's why pre-commit hooks are handy; they'll run commands before allowing a git commit to go through.

I like to use the pre-commit framework to set up hooks as it supports a wide range of commands across multiple languages (and many of my projects are multi-language, like Python + JS).

I create a .pre-commit-config.yaml file to describe my desired hooks:

repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v2.3.0
    hooks:
    -   id: check-yaml
    -   id: end-of-file-fixer
    -   id: trailing-whitespace
-   repo: https://github.com/psf/black
    rev: 22.3.0
    hooks:
    -   id: black
        args: ['--config=./pyproject.toml']
-   repo: https://gitlab.com/pycqa/flake8
    rev: 4.0.1
    hooks:
    -   id: flake8
        exclude: '.venv/'
        args: ['--config=.flake8']

Those hooks run black, flake8, as well as some generally useful checks for YAML syntax and trailing whitespace issues.

Now I have to actually install the hooks, which will update my local Git configuration:

pre-commit install

I can do a test commit now to see if it spots any issues. Most likely I'll have already seen any issues in earlier commits, but it's good to make sure the hooks are run.

You might have noticed that I did not run any tests in the pre-commit hooks. That's because tests can often take a number of minutes to run, and for many repositories, running tests on every commit would significantly slow down development and discouraging frequent commits. However, I will run tests using the next tool.

Setup Github action

Using Github actions, I can make sure certain commands are always run upon updates to the Github repo.

To configure an action, I create a python.yaml file inside a newly created .github/workflows folder and describe the desired action following their format:

name: Python checks
on: [push, pull_request]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
        - uses: actions/checkout@v3
        - name: Set up Python 3
          uses: actions/setup-python@v3
          with:
            python-version: 3.9
        - name: Install dependencies
          run: |
            python -m pip install --upgrade pip
            pip install -r requirements-dev.txt
        - name: Lint with flake8
          run: |
            # stop the build if there are Python syntax errors or undefined names
            flake8 . --select=E9,F63,F7,F82
            # exit-zero treats all errors as warnings.
            flake8 . --exit-zero
        - name: Check formatting with black
          uses: psf/black@stable
          with:
            src: ". tests/"
            options: "--check --verbose"
        - name: Run unit tests
          run: |
            pytest

The on: [push, pull_request] tells Github to run the action whenever code is pushed to any branch or whenever pull requests are opened. The jobs section describes just one job, "build", which runs on an Ubuntu box with Python 3.9 installed, installs my project requirements, runs flake8, runs black, and finally, runs pytest.

All together now

You can see my sample repository here:
https://github.com/pamelafox/mypythonproject-sept7th

You can also see how I've used this sort of setup in apps which involve other dependencies as well, like:

Let me know what your project setup is like and if you have suggestions for improving my workflow. I'm always learning! 😀

.