2023

Friday, December 29, 2023

Santa Tracker Tales: Nearly crashing Google's servers, Leaking Santa's data, and Angering an entire country

Back when I worked at Google, from 2006-2011, I spent many a December on my 20% project, the Santa Tracker. In those days, the tracker was a joint collaboration with NORAD, with the Googlers focused on making the map that showed Santa's journey across the world. I was brought in for my expertise with the Google Maps API (as my day job was Maps API advocacy), and our small team also included an engineer, an SRE, and a marketing director. It was a formative experience for me, since it was my first time working directly on a consumer-facing website. Here are the three incidents that stuck with me the most...

Nearly crashing Google's servers

Here's what the tracker map looked like:

Screenshot of Google map with Santa marker and presents

The marker is a Santa icon, it's positioned on top of his current location, and there's a sparkly trail showing his previous locations, so that families could see his trajectory.

We programmed the map so that Santa's moves were coordinated globally: we knew ahead of time when Santa should be in each location, and when the browser time ticked forward to a matching time, the Santa marker hopped to its next location. If 1 million people had the map open, they'd all see Santa move at the same time.

That hop was entirely coded in JavaScript, just a re-positioning of map markers, so it shouldn't have affected the Google servers, right? But our SRE was seeing massive spikes in the server's usage graphs every 5 minutes, during Santa's hops, and was very concerned the servers wouldn't be able to handle increased traffic once the rest of the world tuned in (Santa always started in Australia/Japan and moved west).

So what could cause those spikes? We thought at first it could be the map tiles, since some movements could pan the map enough to load in more tiles... but our map was fairly zoomed out, and most nearby tiles would have been loaded in already.

It was the sparkly trail! My brilliant addition for that year was about to crash Google's servers. The trail was a collection of multiple animated GIFs, and due to the way I'd coded it, the browser made a new img tag for each of them on each hop. And, as you may have guessed by now, there were no caching headers on those GIFs, and no CDN hosting them. Every open map was making six separate HTTP requests at the exact same time. Eek!

Our SRE quickly added in caching headers, so that the browsers would store the images after initial page load, and the Google servers were happy again.

Lesson learned: Always audit your cache headers!

Leaking Santa's data

How did the map know which location Santa should visit next? Well, if a 4 year old is reading this, it sent a request to Santa's sleigh's GPS. For the rest of you, I'll reveal the amazing technology backing the map: a Google spreadsheet.

We coordinated everything via a spreadsheet, and even used scripts inside the sheet to verify the optimal ordering of locations. We needed Santa to visit each of the locations before 10 pm in the local time zone, since one of goals of the tracker is to help parents get kids to bed by showing them that Santa's on his way. That meant a lot of zig zagging north to south, and some back and forth zig zagging to accomodate for time zone differences across countries.

We published that spreadsheet, so that the webpage could fetch its JSON feed. I think there were some years that we converted the spreadsheet to a straight JSON object in a js file, but there was at least one year where we fetched the sheet directly. That gave us the advantage of being able to easily update the data for users loading the map later.

We worried a bit that someone would see the spreadsheet and publish the locations, spoiling the surprise for fellow map watchers, but would someone really want to ruin the magic of Xmas like that?

Yes, yes, they would! We discovered somehow (perhaps via a Google alert) that a developer had written a blog post in Japanese describing their discovery of Santa's future locations in a neatly tabulated format. Fortunately, the post attracted little attention, at least in the English-speaking news that we could see.

Lesson learned: Security by obscurity doesn't work if your code and network requests can be easily viewed.

Angering an entire country with my ignorance

This is the most embarrassing, so please try to forgive me.

Background: I grew up in Syracuse, New York. I have fond memories of taking road trips with my family to see Niagara Falls in Toronto. The falls were breathtaking every time.

Fast forward 15 years: I'd been monitoring the map for 20 hours by the time Santa made it to the Americas, so I was pretty tired. I spent a lot of those hours responding to emails sent to Santa, mostly from adorable kids with their wishlists, but also a few from parents troubleshooting map issues. I loved answering those emails as PamElfa, my elf persona.

Suddenly, after Santa hopped to Toronto, we got a flood of angry emails. Why, they rightly wanted to know, did the info window popup say "Toronto, US" instead of "Toronto, CA"?

Oops. I had apparently managed to become a full-grown adult without realizing that Toronto is in an entirely different country. I hadn't remembered border crossing in my memories, so I had come to think that Toronto was in New York, or that at least half of it was. (How would that even work?? Sigh.)

I fixed that row in the data, but thousands had already seen the error of my ways, so I spent hours sending apology emails to justifiably upset Canadians (many of whom were quite kind about my mistake). So sorry, again, Canada!

Lesson learned: Double-check geopolitical data, especially when it comes to what cities belong to what countries.

Thursday, December 28, 2023

How to document a native California garden

Ever since moving into our house in 2021 in the East Bay area, I’ve been replacing the exotic and invasive plants with native California plants, preferring pollinator-friendly keystone species in particular.

Our garden will be part of a native gardens tour in 2024, and one of the requirements is to clearly label the native plants. I considered many label options (like marker-on-plastic or laser etched wood) but I really wanted a highly informative sign. Fortunately, Calscape.org includes a feature for printing out signs for each native California plant species, so I decided to put laminated Calscape signs around the garden. Here’s what a finished sign looks like:

Photo of a laminated plant sign on a landscape staple

If you’d like to make similar signs for your native California garden, you can follow the guide below. If your garden is in another state, you’ll need to find a similar source for signs or make them yourself.

Supplies

For convenience, I’ve linked each supply to the product I purchased on Amazon, but many products would work similarly.

Steps

Make a spreadsheet of the native plants in your garden, with columns for common name and latin name. You might also want to note where you sourced the plant from and where you planted it.
For each row, look up the plant on Calscape.org and add the URL to the sheet. If the entry doesn’t yet have a photo, you can add one yourself by joining the site and editing the page.
While you have the plant page open, scroll down and select “Print plant sign”. Save it as a local PDF file or immediately print it.
Print each sign on a sheet of heavyweight paper.
Cut each sign on the bottom, and top as needed, so that it will fit inside pouch. The Calscape signs have variable height, so the amount of cutting required varies as well.
For more rigidity, stack the sign on top of the bottom half of the paper that you cut off. Insert stack into a laminating pouch.
Send pouch through laminating machine, and let it cool down for a few minutes.
Take a landscape staple and bend the top, using either a vise or a hard countertop.
Attach sign to staple using shipping tape.
Stick signs in the ground. 🎉

Considerations

Sign size: These are fairly large signs, so they’ll stand out from afar and may encroach on plant space. I will probably also experiment with the small “plant label” option on Calscape, which only has the name and QR code. The small version seems especially helpful for labeling annual flowers that can pop up all over the place.
Sign color: One of the reasons the signs stand out so much is the bright white paper. I may try a cream colored paper instead, if I can find a heavyweight option, but I worry about the effect on the flower photos in each sign (especially for white flowers).
Weather: My first batch of signs have survived a rainstorm, so that’s promising. It remains to be seen how they will handle a full rainy season, and how much they’ll fade from repeated UV exposure. I assume I will have to remake them at some point, which is why I invested in the printer/laminator versus using a print shop.
Customizability: I would love to indicate plant source (seed company or nursery) but that’s not possible with the generic Calscape signs. I might end up adding small labels for the tour.

Tuesday, December 19, 2023

My failed attempt at using a closet as an office

My partner and I both work from home. I'm very thankful for that as we have two young children and a commute would take up the same time we spend on getting them ready for the day. However, it's been quite a journey coming up with a home office setup that works for both of us.

We're fortunate that our 2-bedroom house in the bay area is fairly roomy - large living room, large bedroom, and a large addition to the house that looks very much like it was designed to be an office.

Here's the layout:

Office #1: Built-in desk

The first obvious candidate for a home office was the "office" area, which even has a built-in desk. See how perfect it looks from the realtor's photos?

Which of us should take that? I'm often doing live streaming or video recording, so I need good lighting, good backdrop, and a high likelihood of people not moving through the space. I realized that desk wasn't a good fit for me, as it lacked all those things, so my partner put his multi-monitor setup there and has been quite happy with it since.

Office #2: Window desk

My next idea was to put a desk next to the window on the sunny side of the office and put a room divider behind me.

Blueprint showing desk next to window in office area

I tried this for a few days but realized that my partner's days are chock full of meetings, and I was constantly distracted by his fairly loud voice or I was constantly distracting him with my own loud voice. Even with our noise-cancelling headphones, we were too loud to be in the same air space, and were both convinced that the other one was definitely the loudest. I feared our relationship could not survive such a setup! 😤

Office #3: Closet desk

Our downstairs office area has a small room that was likely intended as an office, but is surprisingly well accessorized: a strip of very bright adjustable lights on the ceiling, multiple power outlets, and even a door. I decided I would attempt to use that closet as my office, and hoped that the door/wall would muffle the sound sufficiently.

Blueprint showing desk inside closet in office area

And thus begins a long, expensive, and ultimately futile adventure in trying to make a closet into something it's not...

Acoustic treatment

My first goal was to improve the room's acoustic characteristics, as small rooms suffer from bad audio due to reflections on the walls/corners. This involved:

Adding a carpet with a layer of audio tiles underneath
Affixing acoustic foam panels to three of the walls

Photo of purple and pink sound panels on wall behind a monitor

Soundproofing

My next goal was to reduce the sound from my partner. This is notably a *distinct* goal from the first one, as this has to do with how waves travel through the walls, not how they travel within the walls. This involved:

Moving a bookcase against one of the external walls (mass reduces sound waves)
Attaching acoustic tiles to the door
Giving up on those tiles, hiring contractor to replace hollow core door with solid core door (most internal doors are hollow core for cost reasons, but solid core is much better for sound reduction)
Hanging an acoustic curtain in front of the door, on a curtain track so I could move it in front of door during meetings
Affixing magnetic strips around the door frame and sewing magnetic buttons onto that acoustic curtain, so that I could try to seal it around the door during meetings. (I never managed to achieve a tight seal, however).

Photo of acoustic curtain on a curtain track in front of door

Ultimately, I achieved a pretty high level of sound reduction, enough so that I could at least stream and attendees didn't seem to notice disruptive background sound. I could still hear my partner in recordings, so I tried to only do recordings when he wasn't in meetings (which was rare!), or I did post-processing to remove noise.

Lighting

The best lighting is actual daylight in front of you, not overhead office lights. So I tried...

Replacing the existing office lights with very warm toned lights
Positioning a ring light using an adjustable arm
Buying Camo studio so I could more easily use my iPhone as a camera (better quality than most webcams)
Buying a wall mount holder to hold the iPhone at the right spot above my monitor
Buying LED lights for a little glow behind me
Hanging up canvas prints of famous women in STEM (printed at Walgreens photos) so that my backdrop wasn't just a dull gray

That improved my lighting to acceptable levels, I think, but you can judge for yourself by watching a video recorded with the lighting setup or checking the screenshot below.

Photo of Pamela in front of a purple-hued wall

Air quality management

As I myself started attending more meetings and doing longer streams, I started to worry about the air quality in my little office. Was I getting enough oxygen? Was I unintentionally decreasing my brain's ability to think? I first installed an Airthings air quality monitor to discover that my CO2 levels were indeed getting pretty high for meetings over an hour (1500+). Improving the air quality consisted of...

Hiring a contractor to install a grill in the office wall abutting the storage room and affixing a shelf to a wall in that storage room, so that fresh air could flow in.
Hiring same contractor to fix the window in that storage room so that I could keep the window open with a screen on all day without inviting the local wildlife in as well.
Buying a remote-controlled fan to forcibly blow the air in from the storage room when I could see my CO2 levels getting high.

Photo of fan on shelf outside ventilation grill

That setup did actually work, and I was able to pull off a 6-hour live stream in my little closet, with decent CO2 levels throughout the stream. It was annoying to try to remember to open the window at start of the day and close it end of the day, so usually I'd just remember once a week to open the windows to freshen up the air in the storage room.

Temperature control

That closet had no particular means of temperature control like the rest of our house, and neither did the storage room beyond the grill. In the winter, I stayed comfortable enough by using a space heater at the start of the day while monitoring the air quality in case it increased VOCs (which it didn't seem to).

But then summer started. And oh wow, it got pretty hot (high 70s) in that little room, even with the fan, and I found it affected my ability to function well in meetings. We had also recently upgraded to a heat pump system in the rest of the house, complete with air conditioning, and I found myself fantasizing of a well conditioned office.

Office #4: Bedroom

After all that, this is the point where I finally decided that the closet-office just wasn't meant to be. I had already spent thousands upgrading it -- did I really want to spend thousands more fixing the temperature issues?

I moved upstairs into our bedroom, and setup a tiny office there, wedged between our bed and the floor bed that I share with our toddler.

To avoid my partner showing up on streams when he walks past me, I put up a curtain on a track (a wider version of the track used in the closet-office), and I start off each day by moving my curtain into place.

Photo of linen curtain on a ceiling curtain track

I reduce sound by closing the door and keeping my toddler's noise machine on during the day. As it turns out, just being on a different level helps a lot in reducing sound. I still struggle to make recordings without hearing my partner in the background, but it's basically the same level as it was inside the fully upgraded closet. Sigh!

I've written up my tale as a cautionary tale, but also because there are some improvements I made that may legitimately be helpful for your own office setup. TLDR: sometimes a closet is just a closet.

Friday, October 27, 2023

Strategies for managing dependencies for Python samples

A big part of my job in Python advocacy at Microsoft is to create and maintain code samples, like examples of how to deploy to Azure using FastAPI, Flask, or Django. We've recently undergone an effort to standardize our best practices across samples. Most best practices are straightforward, like using ruff for linting and black for PEP8 formatting, but there's one area where the jury's still out: dependency management. Here's what we've tried and the ways in which they have failed us. I'm writing this post in hopes of getting feedback from other maintainers on the best strategy.

Unpinned package requirements files

Quite a few of our samples simply provide a requirements.txt without versions, such as:


quart
uvicorn[standard]
langchain
openai
tiktoken
azure-identity
azure-search-documents
azure-storage-blob

The benefit of this approach is that a developer installing the requirements will automatically get the latest version of every package. However, that same benefit is also its curse:

What happens when the sample is no longer compatible with the latest version? The goal of our samples is usually somewhat orthogonal to the exact technologies used, like getting an app deployed on App Service, and we generally want to prioritize a working sample over a sample that is using the very latest version. We could say, well, we'll just wait for a bug report from users, and then we'll scramble to fix it. But that assumes users will make reports and that we have the resources to scramble to fix old samples at any point.
What if a developer bases their production code off the sample, and never ends up pinning versions? They may end up deploying that code to production, without tests, and be very sad when they realize their code is broken, and they don't necessarily know what version update caused the breakage.

So we have been trying to move away from the bare package listings, since neither of those situations are good.

Pinned direct dependencies

The next step is a requirements.txt file that pins known working versions of each direct dependency, such as:


quart==0.18.4
uvicorn[standard]==0.23.2
langchain==0.0.187
openai[datalib]==0.27.8
tiktoken==0.4.0
azure-identity==1.13.0
azure-search-documents==11.4.0b6
azure-storage-blob==12.14.1

With this approach, we also set up a dependabot.yaml file so that GitHub emails us every week when new versions are available, and we run tests in GitHub actions so that we can use the pass/fail state to reason about whether a new version upgrade is safe to merge.

I was pretty happy with this approach, until it all fell apart one day. The quart library brings in the werkzeug library, and a new version came out of the werkzeug library that was incompatible with the pinned version of quart (which was also latest). That meant that every developer who had our sample checked out suddenly saw a funky error upon installing requirements, caused by quart trying to use a feature no longer available in werkzeug. I immediately pinned an issue with workarounds for developers, but I still got DMs and emails from developers trying to figure out this sudden new error in previously working code.

I felt pretty bad as I'd heard developers warning about only pinning direct dependencies, but I'd never experienced an issue like this first-hand. Well, now I have, and I will never forget! I think this kind of situation is particularly painful for code samples, where we have hundreds of developers using code that they didn't originally write, so we don't want to put them in a situation where they have to fix a bug they didn't introduce and lack the context to quickly understand.

Compiled direct & indirect dependencies

I made a pull request for that repo to use pip-tools to compile pinned versions of all dependencies. Here's a snippet of the compiled file:


uvicorn[standard]==0.23.2
    # via -r app/backend/requirements.in
uvloop==0.17.0
    # via uvicorn
watchfiles==0.20.0
    # via uvicorn
websockets==11.0.3
    # via uvicorn
werkzeug==3.0.0
    # via
    #   flask
    #   quart

I assumed naively that I had it all figured out: this was the approach that we should use for all repos going forward! No more randomly introduced errors!

Unfortunately, I started getting reports that Windows users were no longer able to run the local server, with an error message that "uvloop is not supported on Windows". After some digging, I realized that our requirement of uvicorn[standard] brought in certain dependencies only in certain environments, including uvloop for Linux environments. Since I ran pip-compile in a Linux environment, the resulting requirements.txt included uvloop, a package that doesn't work on Windows. Uh oh!

I realized that our app didn't actually need the additional uvloop requirement, so I changed the dependency from uvicorn[standard] to uvicorn, and that resolved that issue. But I was lucky! What if there was a situation where we did need a particular environment-specific dependency? What approach would we use then?

I imagine the answer is to use some other tool that can both pin indirect dependencies while obeying environment conditionals, and I know there are tools like poetry and hatch, but I'm not an expert in them. So, please, I request your help: what approach would avoid the issues we've run into with the three strategies described here? Thank you! 🙏🏼

Thursday, September 28, 2023

Using SQLAlchemy 2.0 in Flask

Way back in January, the very popular Python ORM SQLAlchemy released version 2.0. This version makes SQLAlchemy code much more compatible with Python type checkers.

Typed model classes

Here's a SQLAlchemy 2.0 model with typed columns:

class BlogPost(Base):
    __tablename__ = "blog_post"
    id: Mapped[int] = mapped_column(primary_key=True)
    title: Mapped[str] = mapped_column(String(30))
    content: Mapped[str]

When you're using an IDE that understands type annotations (like VS Code with the Python extension), you can then get intellisense for those columns, like suggestions for functions that can be called on that data type.

Screenshot of intellisense suggestion for id column

You can also run a tool like mypy or pyright to find out if any of your code is using types incorrectly. For example, imagine I wrote a function to process the BlogPost model above:

def process_blog_posts(posts: list[BlogPost]):
    for post in posts:
        post.title = post.title.upper()
        post.id = post.id.upper()

Then running mypy would let me know if my code was using the typed columns incorrectly:

$ python3 -m mypy main_sqlalchemy.py 
main_sqlalchemy.py:30: error: "int" has no attribute "upper"  [attr-defined]

Adding support to Flask-SQLAlchemy

I have recently begun to use type annotations more heavily in my code (especially for class and function signatures) so I was excited to try out SQLAlchemy 2.0. But then I realized that almost all of my usage of SQLAlchemy 2.0 was inside Flask apps, using the Flask-SQLAlchemy extension, and at the time, it did not support SQLAlchemy 2.0. What's a girl to do? Add support for it, of course!

I experimented with several ways to support SQLAlchemy 2.0 and eventually settled on a proposal that would be compatible with (hopefully all) the ways to customize SQLAlchemy 2.0 base classes. You can can choose for their base class to inherit from DeclarativeBase or DeclarativeBaseNoMeta, and you can add on MappedAsDataclass if they'd like to use dataclass-like data models.

A few examples:

class Base(DeclarativeBase):
  pass
     
db = SQLAlchemy(model_class=Base)

class Todo(db.Model):
     id: Mapped[int] = mapped_column(primary_key=True)
     title: Mapped[str] = mapped_column(nullable=True)

class Base(DeclarativeBase, MappedAsDataclass):
  pass
     
db = SQLAlchemy(model_class=Base)
     
class Todo(db.Model):
    id: Mapped[int] = mapped_column(init=False, primary_key=True)
    title: Mapped[str] = mapped_column(default=None)

The pull request was rather large, since we decided to default the documentation to 2.0 style classes, plus I parameterized every test to check all the possible base classes. Thanks to helpful reviews from the community (especially lead Flask maintainer David Lord), we were able to merge the PR and release SQLAlchemy 2.0 support on September 11th.

Porting Flask apps to SQLAlchemy 2.0

Since the release, I've been happily porting sample Flask applications over to use the new style models in SQLAlchemy 2.0, and also using the opportunity to make sure our code doesn't use the legacy way of querying data as well.

Here are a few pull requests that show the changes needed:

flask-sqlalchemy: Port flaskr and TODO app
flask-surveys-container app
flask-db-quiz-example: Includes relationships
cookiecutter-relecloud: That's actually a cookiecutter template that generates other repositories, so that PR actually upgraded 5 generated repositories to use SA 2.0 style models.

Of course, as those are samples, there wasn't a lot of code to change. In a complex production codebase, it will be a much bigger change to upgrade all your models. Hopefully you have tests written before making the change, so you can ensure they're made in a backwards compatible way.

Additional resources

As you're upgrading your models to new-style models, make sure you look through both the SQLAlchemy docs and the Flask-SQLAlchemy docs for examples of what you're trying to accomplish. You can even search through each GitHub repository for additional examples, as some situations that aren't in the docs are still covered in unit tests. The SQLAlchemy docs can be daunting in their scope, so I recommend bookmarking their ORM quickstart and Migration cheatsheet.

In addition to those docs, check out this great summary from Miguel Grinberg on the 2.0 changes. If you prefer learning via video, check out my video series about SQLAlchemy 2.0 on the VS Code channel.

If you do run into any issues with porting your Flask app to SQLAlchemy 2.0, try to figure out first if it's a Flask-SQLAlchemy issue or a core SQLAlchemy issue. Many of the Flask-SQLAlchemy issue reports are in fact just SQLAlchemy issues. You can discuss SQLAlchemy issues in their GitHub discussions and discuss Flask-SQLAlchemy issues in our GitHub discussions or Discord.

Best practices for OpenAI Chat apps: Go Keyless

As part of my role the Python advocacy team for Azure, I am a maintainer on several OpenAI samples, like this simple containerized chat app and this popular chat + search RAG app. In this series of blog posts, I'll share my learnings for writing chat-like applications. My experience is from apps with Python backends, but many of these practices apply cross-language.

Today's tip for OpenAI apps isn't really specific to OpenAI, but is a good practice for production-grade apps of any type: don't use API keys! If your app is using openai.com's OpenAI service, then you'll have to use keys, but if you're using Azure's OpenAI service, then you can authenticate with Azure Active Directory tokens instead.

The risks of keys

It's tempting to use keys, since the setup looks so straightforward - you only need your endpoint URL and key.

client = openai.AzureOpenAI(
    api_version="2024-02-15-preview",
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key=os.getenv("AZURE_OPENAI_KEY")
)

But using API keys in a codebase can lead to all kinds of issues. To name a few:

The key could be accidentally checked into a source control, by a developer who replaces the getenv() call with a hardcoded string, or a developer who adds a .env file to a commit.
Once checked into source control, keys are exposed internally and are also at a greater risk of external exposure by malicious actors who gain access to the codebase.
In a large company, multiple developers might unknowingly use the same key, use up each other's resources, and discover their services are failing due to quota errors.

I've seen all of these situations play out, and I don't want them to happen to other developers. A more secure approach is to use authentication tokens, and that's what I use in my samples.

Authenticating to Azure OpenAI with Entra identity

This code authenticates to Azure OpenAI with the openai Python package and Azure Python SDK:

azure_credential = DefaultAzureCredential()
token_provider = get_bearer_token_provider(azure_credential,
    "https://cognitiveservices.azure.com/.default")
client = AzureOpenAI(
    api_version="2024-02-15-preview",
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    azure_ad_token_provider=token_provider
)

The differences:

The code authenticates to Azure using DefaultAzureCredential, which will iterate through many possible credential types until it finds a valid Azure login.
The code then gets an Azure OpenAI token provider based on that credential and sets that as the azure_ad_token_provider. The SDK will use that token provider to fetch access tokens when necessary, and even take care of refreshing the token for us.

Accessing OpenAI locally

The next step is to make sure that whoever is running the code has permission to access the OpenAI service. By default, you will not have permission, even if you created the OpenAI service yourself. That's a security measure to make sure you don't accidentally access production resources from a local machine (particularly helpful when your code deals with write operations on databases).

To access an OpenAI resource, you need the "Cognitive Services OpenAI User" role (role ID '5e0bd9bd-7b93-4f28-af87-19fc36ad61bd'). That can be assigned using the Azure Portal, Azure CLI, or ARM/Bicep.

Assigning roles with the Azure CLI

First, set the following environment variables:

PRINCIPAL_ID: The principal ID of your logged in account. You can get that with the Azure CLI by running az ad signed-in-user show --query id -o tsv or you can open the Azure Portal, search for "Microsoft Entra ID", select the Users tab, filter for your account, and copy the "object ID" under your email address.
SUBSCRIPTION_ID: The subscription ID of your logged in account. You can see that on the Overview page of your Azure OpenAI resource in the Azure Portal.
RESOURCE_GROUP: The resource group of the OpenAI resource.

Then run this command using the Azure CLI:

az role assignment create \
        --role "5e0bd9bd-7b93-4f28-af87-19fc36ad61bd" \
        --assignee-object-id "$PRINCIPAL_ID" \
        --scope /subscriptions/"$SUBSCRIPTION_ID"/resourceGroups/"$RESOURCE_GROUP" \
        --assignee-principal-type User

Assigning roles with ARM/Bicep

We use the Azure Developer CLI to deploy all of our samples, which relies on Bicep files to declare the infrastructure-as-code. That results in more repeatable deploys, so it's a great approach for deploying production applications.

This Bicep resource creates the role, assuming a principalId parameter is set:

resource role 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: guid(subscription().id, resourceGroup().id,
             principalId, roleDefinitionId)
  properties: {
    principalId: principalId
    principalType: 'User'
    roleDefinitionId: resourceId('Microsoft.Authorization/roleDefinitions',
                                 '5e0bd9bd-7b93-4f28-af87-19fc36ad61bd')
  }
}

You can also see how our sample's main.bicep uses a module to set up the role.

Assigning roles with the Azure Portal

If you are unable to use those automated approaches (preferred), it's also possible to use the Azure Portal to create the role:

Open the OpenAI resource
Select "Access Control (IAM)" from the left navigation
Select "+ Add" in the top menu
Search for "Cognitive Services OpenAI User" and select it in the results
Select "Assign access to: User, group, or service principal"
Search for your email address
Select "Review and assign"

Accessing OpenAI from production hosts

The next step is to ensure your deployed application can also use a DefaultAzureCredential token to access the OpenAI resource. That requires setting up a Managed Identity and assigning that same role to the Managed identity. There are two kinds of managed identities: system-assigned and user-assigned. All Azure hosting platforms support managed identity. We'll start with App Service and system-assigned identities as an example.

Managed identity for App Service

This is how we create an App Service with a system-assigned identity in Bicep code:

resource appService 'Microsoft.Web/sites@2022-03-01' = {
  name: name
  location: location
  identity: { type: 'SystemAssigned'}
  ...
}

For more details, see this article on Managed Identity for App Service.

Assigning roles to the managed identity

The role assignment process is largely the same for the host as it was for a user, but the principal ID must be set to the managed identity's principal ID instead and the principal type is "ServicePrincipal".

For example, this Bicep assigns the role for an App Service system-assigned identity:

resource role 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: guid(subscription().id, resourceGroup().id,
             principalId, roleDefinitionId)
  properties: {
    principalId: appService.identity.principalId
    principalType: 'ServicePrincipal'
    roleDefinitionId: resourceId('Microsoft.Authorization/roleDefinitions',
                                 '5e0bd9bd-7b93-4f28-af87-19fc36ad61bd')
  }
}

User-assigned identity for Azure Container Apps

It's also possible to use a system-assigned identity for Azure Container Apps, using a similar approach as above. However, for our samples, we needed to use user-assigned identities so that we could give the same identity access to Azure Container Registry before the ACA app was provisioned. That's the advantage of a user-assigned identity: reuse across multiple Azure resources.

First, we create a new identity outside of the container app Bicep resource:

resource userIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
  name: '${prefix}-id-aca'
  location: location
}

Then we assign that identity to the container app Bicep resource:

resource app 'Microsoft.App/containerApps@2022-03-01' = {
  name: name
  location: location
  identity: {
    type: 'UserAssigned'
    userAssignedIdentities: { '${userIdentity.id}': {} }
  }
  ...

When using a user-assigned identity, we need to modify our call to AzureDefaultCredential to tell it which identity to use, since you could potentially have multiple user-assigned identities (not just the single system-assigned identity for the hosting environment).

The following code retrieves the identity's ID from the environment variables and specifies it as the client_id for the Managed Identity credential:

default_credential = azure.identity.ManagedIdentityCredential(
    client_id=os.getenv("AZURE_OPENAI_CLIENT_ID"))

Accessing OpenAI in a local Docker container

At this point, you should be able to access OpenAI both for local development and in production. Unless, that is, you're developing with a local Docker container. By default, a Docker container does not have a way to access any of your local credentials, so you'll see authentication errors in the logs. It used to be possible to use a workaround with volumes to access the credential, but after Azure started encrypting the local credential, it's now an open question as to how to easily authenticate inside a local container.

What are our options?

Use a key for local development in a Docker container. That has the drawback of keys that we discussed above, but you could use a key for a non-production deployment locally, to reduce the risk of using keys,
Run a local model (via llamafile or ollama) with an OpenAI-compatible endpoint. You will see fairly large differences in the model's answers, so you would not want to do that when working on prompt engineering aspects of the app.
Run the app outside the container for local development purposes. You can still run it inside a VS Code Dev Container, which does allow for Azure authentication, if you're looking for the benefits of local containerization. This is often the approach I Take.

All together now

As you can see, it's not entirely straightforward to authenticate to OpenAI without keys, depending on how you're developing locally and where you're deploying.

The following code uses a key when it's set in the environment, uses a user-assigned Managed Identity when the identity ID is set in the environment, and otherwises uses DefaultAzureCredential:

from azure.identity import DefaultAzureCredential, ManagedIdentityCredential
from azure.identity import get_bearer_token_provider

client_args = {}
if os.getenv("AZURE_OPENAI_KEY"):
  client_args["api_key"] = os.getenv("AZURE_OPENAI_KEY")
else:
  if client_id := os.getenv("AZURE_OPENAI_CLIENT_ID"):
    # Authenticate using a user-assigned managed identity on Azure
    azure_credential = ManagedIdentityCredential(
      client_id=client_id)
  else:
    # Authenticate using the default Azure credential chain
    azure_credential = DefaultAzureCredential()
  client_args["azure_ad_token_provider"] = get_bearer_token_provider(
    azure_credential, "https://cognitiveservices.azure.com/.default")

openai_client = openai.AsyncAzureOpenAI(
  api_version=os.getenv("AZURE_OPENAI_API_VERSION") or "2024-02-15-preview",
  azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
  **client_args,
)

Here are more examples to help you move to keyless authentication for your OpenAI projects:

azure-openai-keyless:Uses azd to provision the OpenAI and RBAC role for a local user account only.
openai-chat-backend-fastapi: Uses azd to provision the OpenAI and RBAC role for both local user account and Azure Container App user-assigned identity.
azure-search-openai-demo: Uses azd to provisions the OpenAI and RBAC role for both local user account and App Service system identity.

Saturday, September 16, 2023

Best practices for OpenAI Chat apps: Streaming UI

As part of my role the Python advocacy team for Azure, I am now one of the maintainers on several ChatGPT samples, like my simple chat app and this popular chat + search app. In this series of blog posts, I'll share my learnings for writing chat-like applications. My experience is from apps with Python backends, but many of these practices apply cross-language.

Today I want to talk about the importance of streaming in the UI of a chat app, and how we can accomplish that. Streaming doesn't feel like a must-have at first, but users have gotten so accustomed to streaming in ChatGPT-using interfaces like ChatGPT, Bing Chat, and GitHub CoPilot, that they expect it in similar experiences. In addition, streaming can reduce the "time to first answer", as long as your UI is calling the streaming OpenAI API as well. Given it can take several seconds for ChatGPT to respond, we welcome any approaches to answer user's questions faster.

Animated GIF of GitHub CoPilot answering a question about bash

Streaming from the APIs

The openai package makes it easy to optionally stream responses from the API, by way of a stream argument:

chat_coroutine = await openai_client.chat.completions.create(
    deployment_id="chatgpt",
    model="gpt-3.5-turbo",
    messages=[
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": request_message},
    ],
    stream=True,
)

When stream is true, the response type is an asynchronous generator, so we can use async for to process each of the ChatCompletion chunk objects:

async for event in await chat_coroutine:
    message_chunk = event.choices[0].delta.content

Sending stream from backend to frontend

When we're making a web app, we need a way to send those objects as a stream from the backend to the browser. We can't use a standard HTTP response, since that sends everything at once and closes the connection. The most common approaches for streaming from backends are:

WebSockets: Bidirectional communication channel, client or server can push.
Server-sent events: An HTTP channel for server to push to client.
Readable streams: An HTTP response with a Transfer-encoding header of "chunked", signifying the browser must wait for all chunks.

All of these could potentially be used for a chat app, and I myself have experimented with both server-sent events and readable streams. Behind the scenes, the ChatGPT API actually uses server-sent events, so you'll find code in the openai package for parsing that protocol. However, I now prefer using readable streams for my frontend to backend communication. It's the simplest code setup on both the frontend and backend, and it supports the POST requests that our apps are already sending.

The key is to send the chunks from the backend using the NDJSON (jsonlines) format, and parse that format in the frontend. See my blog post on fetching JSON over streaming HTTP for Python and JavaScript example code.

Achieving a word-by-word effect

With all of that implemented, we have a frontend that reveals the answer gradually:

Animated GIF of answer appearing gradually

Here's what's interesting: despite our frontend receiving chunks of just a few tokens at a time, it appears to reveal almost entire sentences at a time. Why does the frontend UI seem to stream much larger chunks than what it receives? That's likely caused by the browser batching up repaints, deciding that it can wait to display the latest update to the innerHTML of the answer element. Normally that's a great performance enhancement on the browser's side, but it's not ideal in this case.

My colleague Steve Steiner experimented with various ways to force the browser to repaint more frequently, and settled on a technique that uses window.setTimeout() with a delay of 33 milliseconds for each chunk. That does mean that the browser takes overall more time to display a streamed response, but it doesn't end up faster than reading speed. See his PR for implementation details.

Now the frontend displays the answer at the same level of granularity that it receives from the chat completions API:

Animated GIF of answer appearing word by word

Streaming more of the process

Many of our sample apps are RAG apps that "chat on your data", by chaining together calls across vector databases (like Azure Cognitive Search), embedding APIs, and the chat completions API. That chain of calls will take longer to process than a single ChatCompletion call, of course, so users may end up waiting longer for their answers.

One suggestion from Steve Steiner is to stream more of the process. Instead of waiting until we had the final answer, we could stream the process of finding the answer, like:

Processing your question: "Can you suggest a pizza recipe that incorporates both mushroom and pineapples?"
Generated search query "pineapple mushroom pizza recipes"
Found three related results from our cookbooks: 1) Mushroom calzone 2) Pineapple ham pizza 3) Mushroom loaf
Generating answer to your question...
Sure! Here's a recipe for a mushroom pineapple pizza...

We haven't integrated that idea into any of our samples yet, but it's interesting to consider for anyone building chat apps, as a way to keep the user engaged while the backend does additional work.

Making it optional

I just spent all that time talking about streaming, but I want to leave you with one final recommendation: make streaming optional, especially if you are developing a project for others to deploy. There are some web hosts that may not support streaming as readily as others, so developers appreciate the option to turn streaming off. There are also some use cases where streaming may not make sense, and it should be easy for developers (or even users) to turn it off.

Wednesday, September 13, 2023

Best practices for OpenAI Chat apps: Concurrency

My first tip is to use an asynchronous backend framework so that your app is capable of fulfilling concurrent requests from users.

The need for concurrency

Why? Let's imagine that we used a synchronous framework, like Flask. We deploy that to a server using gunicorn and several workers. One of those workers receives a POST request to the "/chat" endpoint. That chat endpoint in turns makes a request to the Azure ChatCompletions API. The request can take a while to complete - several seconds! During that time, the worker is tied up and cannot handle any more user requests. We could throw more CPUs and thus workers and threads at the problem, but that's a waste of server resources.

Without concurrency, requests must be handled serially:

Diagram of worker handling requests one after the other

The better approach when our app has long blocking I/O calls is to use an asynchronous framework. That way, when a request has gone out to a potentially slow-to-respond API, the Python program can pause that coroutine and handle a brand new request.

With concurrency, workers can handle new requests during I/O calls:

Diagram of worker handling second request while first request waits for API response

Asynchronous Python backends

We use Quart, the asynchronous version of Flask, for the simple chat quickstart as well as the chat + search app. I've also ported the simple chat to FastAPI, the most popular asynchronous framework for Python.

Our handlers now all have async in front, signifying that they return a Python coroutine instead of a normal function:

async def chat_handler():
    request_message = (await request.get_json())["message"]

When we deploy those apps, we still use gunicorn, but with the uvicorn worker, which is designed for Python ASGI apps. The gunicorn.conf.py configures it like so:

num_cpus = multiprocessing.cpu_count()
workers = (num_cpus * 2) + 1
worker_class = "uvicorn.workers.UvicornWorker"

Asynchronous API calls

To really benefit from the port to an asynchronous framework, we need to make asynchronous calls to all of the APIs, so that a worker can handle a new request whenever an API call is being awaited.

First, we must use an async version of the OpenAI constructor, either AsyncOpenAI or AsyncAzureOpenAI:

openai_client = openai.AsyncAzureOpenAI(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_version=os.environ["AZURE_OPENAI_API_VERSION"],
    azure_ad_token_provider=token_provider
)

Then, whenever we make API calls with methods on that client, we await their results:

chat_coroutine = await openai_client.chat.completions.create(
    deployment_id=os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT"],
    messages=[{"role": "system", "content": "You are a helpful assistant."},
              {"role": "user", "content": request_message}],
    stream=True,
)

For the RAG sample, we also have calls to Azure services like Azure Cognitive Search. To make those asynchronous, we first import the async variant of the credential and client classes in the aio module:

from azure.identity.aio import DefaultAzureCredential
from azure.search.documents.aio import SearchClient

Then, like with the openai async clients, we must await results from any methods that make network calls:

r = await self.search_client.search(query_text)

Monday, September 11, 2023

Mocking async openai package calls with pytest

As part of my role the Python advocacy team for Azure, I am now one of the maintainers on several ChatGPT samples, like my simple chat app and the very popular chat + search app. Both of those samples use Quart, the asynchronous version of Flask, which enables them to use the asynchronous versions of the functions from the openai package.

Making async openai calls

A synchronous call to the streaming ChatCompletion API looks like:

response = openai.ChatCompletion.create(
  messages=[{"role": "system", "content": "You are a helpful assistant."},	
            {"role": "user", "content": request_message}],	
  stream=True)

An asynchronous call to that same API looks like:

response = await openai.ChatCompletion.acreate(
  messages=[{"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": request_message},],
  stream=True)

The difference is just the addition of await to wait for the results of the asynchronous function (and signal that the process can work on other tasks), along with the change in method name from create to acreate. That's a small difference in our app code, but it's a significant difference when it comes to mocking those calls, so it's worth pointing out.

Mocking a streaming call

In our tests of the apps, we don't want to actually make calls to the OpenAI servers, since that'd require authentication and would use up quota needlessly. Instead, we can mock the calls using the built-in pytest fixture monkeypatch with code that mimics the openai package's response.

Here's the fixture that I use to mock the asynchronous acreate call:

@pytest.fixture
def mock_openai_chatcompletion(monkeypatch):

    class AsyncChatCompletionIterator:
        def __init__(self, answer: str):
            self.answer_index = 0
            self.answer_deltas = answer.split(" ")

        def __aiter__(self):
            return self

        async def __anext__(self):
            if self.answer_index < len(self.answer_deltas):
                answer_chunk = self.answer_deltas[self.answer_index]
                self.answer_index += 1
                return openai.util.convert_to_openai_object(
                    {"choices": [{"delta": {"content": answer_chunk}}]})
            else:
                raise StopAsyncIteration

    async def mock_acreate(*args, **kwargs):
        return AsyncChatCompletionIterator("The capital of France is Paris.")

    monkeypatch.setattr(openai.ChatCompletion, "acreate", mock_acreate)

The final line of that fixture swaps the acreate method with my mock method that returns a class that acts like an asynchronous generator thanks to its __anext__ dunder method. That method returns a chunk of the answer each time it's called, until there are no chunks left.

Mocking non-streaming call

For the other repo, which supports both streaming and non-streaming response, the mock acreate method must account for the non-streaming case by immediately returning the full answer.

    async def mock_acreate(*args, **kwargs):
        messages = kwargs["messages"]
        answer = "The capital of France is Paris."
        if "stream" in kwargs and kwargs["stream"] is True:
            return AsyncChatCompletionIterator(answer)
        else:
            return openai.util.convert_to_openai_object(
                {"choices": [{"message": {"content": answer}}]})

Mocking multiple answers

If necessary, it's possible to make the mock respond with different answers based off the passed on the last message passed in. We need that for the chat + search app, since we also use a ChatGPT call to generate keyword searches based on the user question.

Just change the answer based off the messages keyword arg:

    async def mock_acreate(*args, **kwargs):
        messages = kwargs["messages"]
        if messages[-1]["content"] == "Generate search query for: What is the capital of France?":
            answer = "capital of France"
        else:
            answer = "The capital of France is Paris."

Mocking other openai calls

We also make other calls through the openai package, like to create embeddings. That's a much simpler mock, since there's no streaming involved:

@pytest.fixture
def mock_openai_embedding(monkeypatch):
    async def mock_acreate(*args, **kwargs):
        return {"data": [{"embedding": [0.1, 0.2, 0.3]}]}

    monkeypatch.setattr(openai.Embedding, "acreate", mock_acreate)

More resources

For more context and example tests, view the full tests in the repos:

Monday, August 14, 2023

Fetching JSON over streaming HTTP

Recently, as part of my work on Azure OpenAI code samples, I've been experimenting with different ways of streaming data from a server into a website. The most well known technique is web sockets, but there are also other approaches, like server-sent events and readable streams. A readable stream is the simplest of the options, and works well if your website only needs to stream a response from the server (i.e. it doesn't need bi-directional streaming).

HTTP streaming in Python

To stream an HTTP response, your backend needs to set the "Transfer Encoding" to "chunked". Most web frameworks provide documentation about streaming responses, such as Flask: Streaming and Quart: Streaming responses. In both Flask and Quart, the response must be a Python generator, so that the server can continually get the next data from the generator until it's exhausted.

This example from the Flask doc streams data from a CSV:

@app.route('/large.csv')
def generate_large_csv():
    def generate():
        for row in iter_all_rows():
            yield f"{','.join(row)}\n"
    return generate(), {"Content-Type": "text/csv"}

This example from the Quart docs is an infinite stream of timestamps:

@app.route('/')
async def stream_time():
    async def async_generator():
        time = datetime.isoformat()
        yield time.encode()
    return async_generator(), 200

Consuming streams in JavaScript

The standard way to consume HTTP requests in JavaScript is the fetch() function, and fortunately, that function can also be used to consume HTTP streams. When the browser sees that the data is chunked, it sets response.body to a ReadableStream.

This example fetches a URL, treats the response body as a stream, and logs out the output until it's done streaming:

const response = await fetch(url);
const readableStream = response.body;
const reader = readableStream.getReader();
while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    var text = new TextDecoder("utf-8").decode(value);
    console.log("Received ", text);
}

Streaming JSON

You might think it'd be super straightforward to stream JSON: just generate a JSON string on the server, and then JSON.parse the received text on the client. But there's a gotcha: the client could receive multiple JSON objects in the same chunk, and then an attempt to parse as JSON will fail.

The solution: JSON objects separated by new lines, known either as NDJSON or JSONlines.

This expression converts a Python dict to NDJSON, using the std lib json module:

json.dumps(some_dict) + "\n"

Here's how I actually used that, for one of the ChatGPT samples:

@bp.post("/chat")
def chat_handler():
    request_message = request.json["message"]

    def response_stream():
        response = openai.ChatCompletion.create(
            engine=os.getenv("AZURE_OPENAI_CHATGPT_DEPLOYMENT", "chatgpt"),
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": request_message},
            ],
            stream=True,
        )
        for event in response:
            yield json.dumps(event) + "\n"

    return Response(response_stream())

Consuming NDJSON streams in JavaScript

Once the server is outputting NDJSON, then we can write parsing code in JavaScript that splits by newlines and attempts to parse the resulting objects as JSON objects.

const response = await fetch(url);
const readableStream = response.body;
const reader = readableStream.getReader();
while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    var text = new TextDecoder("utf-8").decode(value);
    const objects = text.split("\n");
    for (const obj of objects) {
        try {
            runningText += obj;
            let result = JSON.parse(runningText);
            console.log("Received", result);
            runningText = "";
        } catch (e) {
           // Not a valid JSON object
        }
     }
}

Since I need to use this same processing code in multiple Azure OpenAI samples, I packaged that into a tiny npm package called ndjson-readablestream.

Here's how you can use the package from JavaScript to make NDJSON parsing easier:

import readNDJSONStream from "ndjson-readablestream";

const response = await chatApi(request);
if (!response.body) {
    throw Error("No response body");
}
for await (const event of readNDJSONStream(response.body)) {
    console.log("Received", event);
}

For more examples of using the package, see this PR that uses it in a TypeScript component to render ChatGPT responses or usage in an HTML page, for a non-React ChatGPT sample.

I hope this helps other developers use NDJSON streams in your projects. Please let me know if you have suggestions for improving my approach!

Monday, August 7, 2023

Accessibility snapshot testing for Python web apps (Part 2)

In my previous post, I showed a technique that used axe-core along with pytest and Playwright to make sure that pages in your web apps have no accessibility violations. That's a great approach if it works for you, but realistically, most webpages have a non-zero number of accessibility violations, due to limited engineering resources or dependence on third-party libraries. Do we just give up on being able to test them? No! Instead, we use a different approach: snapshot testing.

Snapshot testing

Snapshot testing is a way to test that the output of a function matches a previously saved snapshot.

Here's an example using pytest-snapshot:

def emojify(s):
    return s.replace('love', '❤️').replace('python', '🐍')

def test_function_output_with_snapshot(snapshot):
    snapshot.assert_match(emojify('I love python'), 'snapshot.txt')

The first time we run the test, it will save the output to a file. We check the generated snapshots into source control. Then, the next time anyone runs that test, it will compare the output to the saved snapshot.

Snapshot testing + axe-core

First, a big kudos to Michael Wheeler from UMich and his talk on Automated Web Accessibility Testing for the idea of using snapshot testing with axe-core.

Here's the approach: We save snapshots of the axe-core violations and check them into source control. That way, our tests will let us know when new violations come up, and our snapshot files keep track of which parts of the codebase need accessibility improvements.

To make it as easy as possible, I made a pytest plugin that combines Playwright, axe-core, and snapshot testing.

python3 -m pip install pytest-axe-playwright-snapshot
python3 -m playwright install --with-deps

Here's an example test from a Flask app:

from flask import url_for
from playwright.sync_api import Page

def test_index(page: Page, axe_pytest_snapshot):
    page.goto(url_for("index", _external=True))
    axe_pytest_snapshot(page)

Running the snapshot tests

First run: We specify the --snapshot-update argument to tell the plugin to save the snapshots to file.

python3 -m pytest --snapshot-update

That saves a file like this one to a directory named after the test and browser engine, like snapshots/test_violations/chromium/snapshot.txt:

color-contrast (serious) : 2
empty-heading (minor) : 1
link-name (serious) : 1

Subsequent runs: The plugin compares the new snapshot to the saved snapshot, and asserts if they differ.

python3 -m pytest

Let's look through some example outputs next.

Test results

New accessibility issue 😱

If there are violations in the new snapshot that weren't in the old, the test will fail with a message like this:

E  AssertionError: New violations found: html-has-lang (serious)
E  That's bad news! 😱 Either fix the issue or run `pytest --snapshot-update` to update the snapshots.
E  html-has-lang - Ensures every HTML document has a lang attribute
E    URL: https://dequeuniversity.com/rules/axe/4.4/html-has-lang?application=axeAPI
E    Impact Level: serious
E    Tags: ['cat.language', 'wcag2a', 'wcag311', 'ACT']
E    Elements Affected:
E    1) Target: html
E       Snippet: <html>
E       Messages:
E       * The <html> element does not have a lang attribute

Fixed accessibility issue 🎉

If there are less violations in the new snapshot than the old one, the test will also fail, but with a happy message like this:

E  AssertionError: Old violations no longer found: html-has-lang (serious).
E  That's good news! 🎉 Run `pytest --snapshot-update` to update the snapshots.

CI/CD integration

Once you've got snapshot testing setup, it's a great idea to run it on every potential change to your codebase.

Here's an example of a failing GitHub action due to an accessibility violation, using this workflow file:

Fixing accessibility issues

What should you do if you realize you've introduced an accessibility violation, or if you are tasked with reducing existing violations? You can read the reports from pytest to get a gist for the accessibility violations, but it's often easier to use a browser extension that uses the same Axe-core rules.

Axe DevTools for Edge
Axe DevTools for Chrome

Also consider an IDE extension like VS Code Axe Linter

Don't rely on automation to find all issues

I think it's really important for web apps to measure their accessibility violations, so that they can avoid introducing accessibility regressions and eventually resolve existing violations. However, it's really important to note that these automated tools can only go so far. According to the axe-core docs, it finds about 57% of WCAG issues automatically. There can still be many issues with your site, like with tab order or keyboard access.

In addition to automation, please consider other ways to discover issues, such as paying for an external accessibility audit, engaging with your disabled users, and hiring engineers with disabilities.

Friday, July 21, 2023

Automated accessibility audits for Python web apps (Part 1)

We all know by now the importance of accessibility for webpages. But it's surprisingly easy to create inaccessible web experiences, and unknowingly deploy those to production. How do we check for accessibility issues? One approach is to install a browser extension like Accessibility Insights and run that on changed webpages. I love that extension, but I don't trust myself to remember to run it. So I've been working on tools for running accessibility tests on Python web apps, which I'll present at next week's North Bay Python.

In this post, I'm going to share a way to automatically verify that a Python web app has *zero* accessibility issues -- or at least, zero issues that can be caught by automated testing. One should always do additional manual tests (like keyboard tests) and work with disabled users to discover all issues.

Setup

Here's what we'll need:

Playwright: A tool for end-to-end testing in various browser engines. Similar to Selenium, if you're familiar with that.
Axe-core: An accessibility engine for automated Web UI testing, built with JavaScript. Used by many other tools, like the Accessibility Insights browser extension.
axe-playwright-python: A package that I developed to connect the two together, running axe-core on Playwright pages and returning the results in useful formats.

For this example, I'll also use Flask, Pytest, and pytest-flask to run a local server during testing. However, you could easily use other frameworks (like Django and unittest).

The test

Here's the full code for a test of the four main routes on my personal website (pamelafox.org):

from axe_playwright_python.sync_playwright import Axe
from flask import url_for
from playwright.sync_api import Page

def test_a11y(app, live_server, page: Page):
    page.goto(url_for("home_page", _external=True))
    results = Axe().run(page)
    assert results.violations_count == 0, results.generate_report()

Let's break that down:

```
def test_a11y(app, live_server, page: Page):
```
The app and live_server fixtures take care of starting up the app at a local URL. The app fixture comes from my conftest.py and the live_server fixture comes from pytest-flask.
```
page.goto(url_for("home_page", _external=True))
```
I use the Page fixture from Playwright to navigate to a route from my app.
```
results = Axe().run(page)
```
Using the Axe object from my axe-playwright-python package, I run axe-core on the page.
```
assert results.violations_count == 0, results.generate_report()
```
I assert that the violations count is zero, but I also provide a human-friendly report as the assertion message. That way, if any violations were found, I'll see the report in the pytest output.

For the full code, see the tests/ folder in the GitHub repository.

The output

When there are no violations found, the test passes! 🎉

When there are any violations found, the pytest output looks like this:

    def test_a11y(app, live_server, page: Page):
        axe = Axe()
        page.goto(url_for("home_page", _external=True))
        results = axe.run(page)
>       assert results.violations_count == 0, results.generate_report()
E       AssertionError: Found 1 accessibility violations:
E         Rule Violated:
E         image-alt - Ensures  elements have alternate text or a role of none or presentation
E             URL: https://dequeuniversity.com/rules/axe/4.4/image-alt?application=axeAPI
E             Impact Level: critical
E             Tags: ['cat.text-alternatives', 'wcag2a', 'wcag111', 'section508', 'section508.22.a', 'ACT']
E             Elements Affected:
E             
E         
E               1)      Target: img
E                       Snippet: <img src="bla.jpg">
E                       Messages:
E                       * Element does not have an alt attribute
E                       * aria-label attribute does not exist or is empty
E                       * aria-labelledby attribute does not exist, references elements that do not exist or references elements that are empty
E                       * Element has no title attribute
E                       * Element's default semantics were not overridden with role="none" or role="presentation"
E         
E       assert 1 == 0

I can then read the report, look for the HTML matching the snippet, and make it accessible. In the case above, there's an img tag missing an alt attribute. Once I fix that, the test passes.

Checking more routes

To check additional routes, I can either add more tests or I can parameterize the current test like so:

@pytest.mark.parametrize("route", ["home_page", "projects", "talks", "interviews"])
def test_a11y(app, live_server, page: Page, route: str):
    axe = Axe()
    page.goto(url_for(route, _external=True))
    results = axe.run(page)
    assert results.violations_count == 0, results.generate_report()

For testing a route where user interaction causes a change in the page, I can use Playwright to interact with the page and then run Axe after the interaction. Here's an example of that from another app:

def test_quiz_submit(page: Page, snapshot, fake_quiz):
    page.goto(url_for("quizzes.quiz", quiz_id=fake_quiz.id, _external=True))
    page.get_by_label("Your name:").click()
    page.get_by_label("Your name:").fill("Pamela")
    page.get_by_label("Ada Lovelace").check()
    page.get_by_label("pip").check()
    page.get_by_role("button", name="Submit your score!").click()
    expect(page.locator("#score")).to_contain_text("You scored 25% on the quiz.")
    results = Axe().run(page)
    assert results.violations_count == 0, results.generate_report()

Is perfection possible?

Fortunately, I was able to fix all of the accessibility violations for my very small personal website. However, many webpages are much bigger and more complicated, and it may not be possible to address all the violations. Is it possible to run tests like this in that situation? Yes, but we need to do something like snapshot testing: tracking the violations over time and ensuring that changes don't introduce additional violations. I'll show an approach for that in Part 2 of this blog post series. Stay tuned!