Wednesday, July 10, 2024

Should you use Quart or FastAPI for an AI app?

As I have discussed previously, it is very important to use an async framework when developing apps that make calls to generative AI APIs, so that your backend processes can concurrently handle other requests while they wait for the (relatively slow) response from the AI API.

Diagram of worker handling second request while first request waits for API response

Async frameworks

There are a few options for asynchronous web frameworks for Python developers:

  • FastAPI: A framework that was designed to be async-only from the beginning, and an increasingly popular option for Python web developers. It's particularly well suited to APIs, because it includes Swagger (OpenAPI) for auto-generated documentation based off type annotations.
  • Quart: The async version of the popular Flask framework. It is now actually built on Flask, so it brings it in as a dependency and reuses what it can. It tries to mimic the Flask interface as much as possible, with exceptions only when needed for better async support.
  • Django: The default for Django is a WSGI app with synchronous views, but it is now possible to write async views as well and run the Django app as an ASGI app.

Quart vs. FastAPI

So which framework should you choose? Since I have not personally used Django with async views, I'm going to focus on comparing Quart vs. FastAPI, as I have used them for a number of AI-on-Azure samples.

  • If you already have Flask apps, it is much easier to turn them into Quart apps than FastAPI apps, given the purposeful similarity of Quart to Flask. You may run into issues if you are using many Flask extensions, however, since not all of them have been ported to Quart.
  • In my experience, Quart is easier to use if your app includes static files / HTML routes. It is possible to use FastAPI for a full webapp, but it is harder. That said, I've figured it out in a few projects, such as rag-postgres-openai-python so you can look at that approach for inspiration.
  • FastAPI has built-in API documentation. To do that with Quart, you need to use Quart-Schema. That extension is fairly straightforward to use, and I have successfully used it with Quart apps, but it is certainly easier with FastAPI.
  • Quart has a good number of extensions available, largely due to many extensions being forked from Flask extensions. There is less of an extension ecosystem for FastAPI, perhaps because there is not an established extension mechanism. There are many tutorials and discussion posts that show how to implement features in FastAPI, however, thanks to the popularity of FastAPI.
  • The performance between Quart and FastAPI should be fairly similar, though I haven't done tests to directly compare the two. The most standard way to run them is with gunicorn and a uvicorn worker, but it is now possible to run uvicorn directly, as of the latest uvicorn release. Another server is hypercorn, created by the Quart creator, but I haven't used that in production myself.
  • Quart is an open-source project that is part of the Pallets ecosystem, and primarily maintained by @pgjones. FastAPI is also an open-source project, primarily maintained by @tiangolo, who recently received funding to work on monetization strategies. Both of them are regularly maintained at this point.

Both frameworks are solid options, with different benefits. Share any experiences you've had in the comments!

No comments: