Thursday, March 12, 2026

Can MCP choose my outfit?

When I was a kid, one of my first Java applets was a UI for choosing outfits by mixing and matching different articles of clothing. Now, with the advent of agents and MCP, I realized that I could make a modern, more dynamic version: an MCP server that can find relevant clothing based off a user query, and render matching clothing as a slideshow. Let's walk through the experience and code powering it.


Searching for relevant clothing

After connecting VS Code to my closet MCP server, I ask a query like:

i am presenting at PyAI about MCP, do I have MCP themed clothing? show me the best option.

GitHub Copilot decides that it can use the closet MCP server to answer that question, and it calls the image_search tool with these arguments:

{
  "query": "MCP Model Context Protocol themed clothing",
  "max_results": 5
}

The tool call returns a mix of binary files - thumbnails for each matching article of clothing, and structured data- a JSON containing filename, display name, and description for each article.

{
  "results": [
    {
      "filename": "IMG_3234.jpg",
      "display_name": "IMG_3234.jpg",
      "description": "The image shows a black sleeveless dress hanging on a white hanger against a plain wall. The dress has a printed text on the front that reads: \"YOU DOWN WITH MCP? Yeah, you know me!\" The first line is in large white uppercase letters, and the second line is in smaller pink cursive letters. The dress has a fitted top and a flared skirt."
    },...

Here's what that looks like in the GitHub Copilot chat interface. Notice that Copilot attaches the images, so I can actually click on them to see each result directly in VS Code, as if they were a file in the workspace.

Now let's look at the code powering that tool call. I built the server using FastMCP, so I declare my tools by wrapping functions in mcp.tool() decorator and annotating the arguments with types and helpful descriptions. Inside the function, I use Azure AI Search with hybrid retrieval on both the text query and the query's vector, against a target index that has multimodal embeddings for the images plus LLM-generated descriptions for the images. The tool returns a result that contains both the binary files and the structured content.

@mcp.tool()
async def image_search(
  query: Annotated[
    str, "Text description of images to find (e.g., 'red dress')"
  ],
  max_results: Annotated[int, "Max number of images to return (1-20)"] = 5,
) -> ToolResult:
  """
  Search for images matching a natural language query.
  Returns the image data and descriptions.
  """
  results = await search_client.search(
    search_text=query,
    top=max_results,
    vector_queries=[VectorizableTextQuery(
        k_nearest_neighbors=max_results, fields="embedding", text=query)],
    select="metadata_storage_path,verbalized_image")

  blob_service_client = get_blob_service_client()

  files: list[File] = []
  image_results: list[dict[str, str]] = []
  result_index = 0
  async for result in results:
    result_index += 1
    url = result["metadata_storage_path"]
    description = result.get("verbalized_image")
    container_name, blob_name = get_blob_reference_from_url(url)
    blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
    stream = await blob_client.download_blob()
    image_bytes = await stream.readall()
    image_format = get_image_format(url)
    display_name = os.path.basename(blob_name)
    file_basename = Path(display_name).stem
    thumbnail_bytes = resize_image_bytes(image_bytes, image_format)
    files.append(File(data=thumbnail_bytes, format=image_format, name=file_basename))
    image_results.append({
      "filename": blob_name,
      "display_name": display_name,
      "description": description})

  return ToolResult(
    content=files,
    structured_content={
      "query": query,
      "results": image_results})

Displaying selected clothing

Once the agent finds possible matching clothing, it then reasons over the results and selects the best of those results. If the agent is using a multimodal LLM, like most modern frontier models, it's able to reason above both the image content and the image descriptions. It can then render its top choices directly in the UI, using an MCP app that renders a JavaScript-powered slideshow of images.

Here's what that looks like in GitHub Copilot chat:

Let's check out the code that powers that MCP app. An app is actually a kind of tool, so we once again wrap a Python function in @mcp.tool. However, this time, we specify that it's an app with an AppConfig with an associated resource for the image viewer HTML. Inside that function, we fetch the images from Azure Blob Storage based off their filename, return both the binary data for the images and structured content that includes the filename and mime-type of each image.

@mcp.tool(
  app=AppConfig(resource_uri=IMAGE_VIEW_URI)
)
async def display_image_files(
  filenames: Annotated[list[str], "List of image filenames to retrieve"]
) -> ToolResult:
  """Fetch images by filename and render them in a carousel display."""
  blob_service_client = get_blob_service_client()

  image_blocks: list[types.ImageContent] = []
  image_results: list[dict[str, str]] = []
  for filename in filenames:
    blob_client = blob_service_client.get_blob_client(container=IMAGE_CONTAINER_NAME, blob=filename)
    stream = await blob_client.download_blob()
    image_bytes = await stream.readall()
    mime_type = get_image_mime_type(filename)
    image_blocks.append(
      types.ImageContent(
        type="image",
        data=base64.b64encode(image_bytes).decode("utf-8"),
        mimeType=mime_type))
    image_results.append({
      "filename": filename,
      "mimeType": mime_type})

  return ToolResult(
    content=image_blocks,
    structured_content={
      "images": image_results,
  })

Next we need to define the resource that serves up the image viewer HTML page. We wrap a Python function in @mcp.resource, assign it a "ui://" URL that is unique for our MCP server, and declare what servers are allowed in its Content-Security Policy (CSP):

@mcp.resource(
    IMAGE_VIEW_URI,
    app=AppConfig(csp=ResourceCSP(resource_domains=["https://unpkg.com"])),
)
def image_view() -> str:
    """Render images returned by display_image_files as an MCP App."""
    return load_image_viewer_html()

Finally, we need the actual HTML that will render inside the iframed app. This tiny webpage brings in ext-apps, a JavaScript package which manages bidirectional communication with the MCP client. In our JavaScript, we declare an App instance, define the ontoolresult callback, and connect the app. That callback receives the images from the tool result and renders them inside the HTML. Note that apps also can communicate back, but that wasn't necessary for this UI.

<!DOCTYPE html>
<html>
<body>
  <div id="carousel">
    <button id="prev" type="button" aria-label="Previous">&#8249;</button>
    <div id="frame"></div>
    <button id="next" type="button" aria-label="Next">&#8250;</button>
    <span id="counter" aria-live="polite"></span>
  </div>
  <script type="module">
    import { App } from "https://unpkg.com/@modelcontextprotocol/ext-apps@0.4.0/app-with-deps";

    const app = new App({ name: "Image Viewer", version: "1.0.0" });

    let images = [];
    let index = 0;

    const frame = document.getElementById("frame");
    const prevBtn = document.getElementById("prev");
    const nextBtn = document.getElementById("next");
    const counter = document.getElementById("counter");

    function show(i) {
      index = i;
      const img = images[index];
      frame.innerHTML = "";
      const el = document.createElement("img");
      el.src = `data:${img.mimeType || "image/jpeg"};base64,${img.data}`;
      el.alt = "Blob image";
      frame.appendChild(el);
      prevBtn.disabled = index === 0;
      nextBtn.disabled = index === images.length - 1;
      counter.textContent = images.length > 1 ? `${index + 1} / ${images.length}` : "";
    }

    prevBtn.addEventListener("click", () => { if (index > 0) show(index - 1); });
    nextBtn.addEventListener("click", () => { if (index < images.length - 1) show(index + 1); });

    app.ontoolresult = ({ content }) => {
      images = (content || []).filter((block) => block.type === "image");
      if (images.length > 0) show(0);
    };

    await app.connect();
  </script>
</body>
</html>

Putting together the final outfit

If I want more ideas of how to put together my outfit, I can keep asking questions that will prompt the agent to call the MCP server. For example, my first follow-up question was:

great, i love the pink, matches pydantic-ai colors. can you find some pink accessories to go with it?

Then, after it suggested some nice accessories, I finished with:

sounds good. i also need a jacket to keep me warm. show me my final outfit.

To show me my final outfit, it called the display_image_files tool with only the selected articles of clothing - jacket, dress, and earrings. I can navigate through them with the arrows:

MCP app rendering a jacket inside VS Code

How'd the outfit work out? Pretty great!

Try it yourself!

The full MCP server code is available in the Azure-Samples/image-search-aisearch, along with a minimal frontend for image searching and data ingestion via Azure AI Search indexer with Azure OpenAI LLMs (for describing the images) and Azure AI Vision (for multi-modal embeddings of the images). The code can be used for any images, not just pictures of your clothing.

Here are ways you could improve it:

  • Use an image-generation model: visualize the head-to-toe outfit on a mannequin (instead of showing each item separately in the carousel)
  • Optimize token consumption: currently, since it returns each image thumbnail when searching, and images require a lot of tokens to represent them, conversations can easily exceed the context window. You could experiment with smaller images, higher compression, or other approaches.
  • Add user login: my MCP server is a public endpoint, but most people don't want their closet (or private images) to be public knowledge. You can add on key-based auth or OAuth using the FastMCP auth providers, as I described in the MCP auth livestream.

Have fun, and let me know if you build your own version!

Wednesday, March 11, 2026

Learnings from the PyAI conference

I recently spoke at the PyAI conference, put on by the good folks at Prefect and Pydantic, and I learnt so much from the talks I attended. Here are my top takeaways from the sessions that I watched:


AI Evals Pitfalls

Hamel Husain

  • View slides
  • Hamel cautioned against blindly using automated evaluation frameworks and built-in evaluators (like helpfulness and coherence).
  • Instead, we should adopt a data science approach to evaluation: explore the data, discover what's actually breaking, identify the most important metric, and iterate as new data comes in.
  • We shouldn't just trust an LLM-as-a-judge to be given accurate scores. Instead, we should validate it like we would validate a ML classifier- with labeled data, train/dev/test splits, and precision/recall metrics. LLM-judges should always give pass/fail results, instead of 1-5 scores, so that there's no ambiguity in their judgment.
  • When generating synthetic data, first come up with dimensions (such as persona), generate combinations based off dimensions, and convert those into realistic queries.
  • Hamel created evals-skills, a collection of skills for coding agents that can be run against evaluation pipelines to find issues like poorly designed LLM-judges.

Build Reasonable Software

Jeremiah Lowin (FastMCP/Prefect)

  • Write your Python programs in a way that coding agents can reason about them, so that they can more easily maintain and build them. For example, FastMCP v2 SDK was not well designed (bad abstractions) so a new CodeMod feature required 4,000 lines of code. In the new FastMCP v3 SDK (same functional API, different abstractions backing it), the same feature only required 500 lines of code.
  • To make Python FastMCP servers more Pythonic, Jeremiah is developing a new package for MCP apps which includes the most common UIs (forms/tables/charts), called PreFab: https://github.com/PrefectHQ/prefab

Panel: Open Source in the Age of AI

Guido van Rossum (CPython), Samuel Colvin (Pydantic), Sebastián Ramírez (FastAPI), Jeremiah Lowin (FastMCP)

  • OSS maintainers are overwhelmed by AI Slop PRs. As one maintainer said, "Don't expect someone else to be the first one to read your code". Each maintainer is coming up with different systems/bots/heuristics to detect and triage PRs (like FastMCP auto-rejects PRs that are too long!). Some maintainers are going to turn off PRs entirely, as now permitted by GitHub.
  • Samuel's opinion: GitHub should add a "human identity" vs "user identity", as well as a user reputation system where reputation is based off how many useful contributions you've made (or a "sloppiness" metric).

Do developer tools matter to agents?

Zanie Blue (Astral)

  • Astral is considering ways to make their tools more agent-friendly. For example, their error messages for ty are currently fairly long and include ASCII arrows pointing to the code in question, and they suspect the agents may not need all of that in their context.
  • Astral is also re-prioritizing based off the move towards 100% agentic coding, with less emphasis on tools that would be used solely by a developer who is manually typing. For example, they were once considering adding a "review" feature to review each ruff suggestion one-by-one, but that seems unlikely to be used by developers these days.
  • Astral may now be able to take advantage of agent's ability to reason over whether proposed ruff fixes are safe. Currently, ruff only auto-fixes code when it knows that the code change can't introduce any unwanted changes (like comment deletions), and it marks other fixes as "unsafe". Now ruff could add more unsafe fixes, knowing that an LLM could decide whether it was actually a safe change.

Context Engineering for MCP Servers

Till Döhmen (MotherDuck)

  • Till walked through the multi-step process of developing MCP servers to allow developers to interact with their MotherDuck databases. The server started with a single "query" tool, which later split into multiple tools, including "list_databases" and "list_tables". They had to offer dedicated schema-exploration tools since DuckDB uses a different syntax than PostgreSQL, and the agents kept suggesting PostgreSQL syntax that didn't work.
  • They also added a tool to search the documentation (powered by the same search used by their website) and a tool that teaches the agent how to create "dive"s, a visualization of the database state.
  • One of their big struggles is the lack of MCP spec support across clients: the MCP spec is so rich and full of features, but only a handful of clients support those features. It's hard for them to take advantage of the new features, knowing their users may be using a client that does not support them.

Controlling the wild: from tool calling to computer use

Samuel Colvin (Pydantic)

  • Samuel built Monty to be a minimal implementation of Python for agents to use. It intentionally does not support all of the Python standard lib (like sockets/file open), but does include a way to call back to functions on the host. When using monty, you do not need to setup a separate sandbox.
  • Monty is not designed to run full applications - it's designed to run Python code generated by agents.
  • The models vary in how successfully they call monty in a REPL loop- Opus 4.5 works the best, Opus 4.6 works worse, presumably due to the RLHF process teaching 4.6 to execute code in a particular way.
  • github.com/pydantic/monty

What's new in FastAPI for AI

Sebastián Ramírez (FastAPI)

  • View slides
  • There's now a VS Code extension for FastAPI, built by my brilliant former colleague, Savannah Ostrowski. It makes it easy to navigate to different routes in your app, and it adds a CodeLens for navigating from pytest tests back to the route that they're testing.
  • FastAPI has built-in support for streaming JSON lines! Just yield an AsyncIterable. I plan to port my FastAPI streaming chat apps to this approach, pronto.
  • In pyproject.toml, you can now specify the FastAPI entrypoint, so that the fastapi command knows exactly where your FastAPI app is.

Context Engineering 2.0: MCP, Agentic RAG & Memory

Simba Khadder (Redis)

  • Redis is adding many features to specifically help developers who are creating apps with generative AI. For example, they've added a semantic caching of queries, based off a fine-tuned BERT model, so that developers don't have to pay every time someone says "good morning" to a chatbot. Anyone can use semantic caching in open-source Redis by bringing your own LLMs, but the fine-tuned model is available only for Redis Cloud.

Friday, January 16, 2026

Using on-behalf-of flow for Entra-based MCP servers

In December, we presented a series about MCP, culminating in a session about adding authentication to MCP servers. I demoed a Python MCP server that uses Microsoft Entra for authentication, requiring users to first login to the Microsoft tenant before they could use a tool. Many developers asked how they could take the Entra integration further, like to check the user's group membership or query their OneDrive. That requires using an "on-behalf-of" flow, also known as "delegation" in OAuth, where the MCP server uses the user's identity to call another API, like the Microsoft Graph API. In this blog post, I will explain how to use Entra with OBO flow in a Python FastMCP server.

How MCP servers can use Entra authentication

The MCP authorization specification is based on OAuth2, but with some additional features tacked on top. Every MCP client is actually an OAuth2 client, and each MCP server is an OAuth2 resource server.

Diagram of OAuth 2.1 entities with MCP client and server

MCP auth adds these features to help clients determine how to authorize a server:

  • Protected resource metadata (PRM): Implemented on the MCP server, provides details about the authorization server and method
  • Authorization server metadata: Implemented on the authorization server, gives URLs for OAuth2 endpoints

Additionally, to allow MCP servers to work with arbitrary MCP clients, MCP auth supports either of these client registration methods:

  • Dynamic Client Registration (DCR): Implemented on the authorization server, it can register new MCP clients as OAuth2 clients, even if it hasn't seen them before.
  • Client ID Metadata Documents (CIMD): An alternative to DCR, this requires both the MCP client to make a CIMD document available on a server, and requires the authorization server to fetch the CIMD document for details about the client.

Microsoft Entra does support authorization server metadata, but it does not support either DCR or CIMD. That's actually fine if you are building an MCP server that's only going to be used with pre-authorized clients, like if the server will only be used with VS Code or with a specific internal MCP client. But, if you are building an MCP server that can be used with arbitrary MCP clients, then either DCR or CIMD is required. So what do we do?

Fortunately, the FastMCP SDK implements DCR on top of Entra using an OAuth proxy pattern. FastMCP acts as the authorization server, intercepting requests and forwarding to Entra when needed, and storing OAuth client information in a designated database (like in-memory or Cosmos DB).

⚠️ Warning: This proxy approach is intended only for development and testing scenarios. For production deployments, Microsoft recommends using pre‑registered client applications where client identifiers and permissions are explicitly created, reviewed, and approved on a per-app basis.

Diagram of OAuth proxy pattern

Let's walk through the steps to set that up.

Registering the server with Entra

Before the server can use Entra to authorize users, we need to register the server with Entra via an app registration. We can do registration using the Azure Portal, Azure CLI, Microsoft Graph SDK, or even Bicep. In this case, I use the Python MS Graph SDK as it allows me to specify everything programmatically.

First, I create the Entra app registration, specifying the sign-in audience (single-tenant), redirect URIs (including local MCP server and VS Code redirect URIs), and the scopes for the exposed API.

request_app = Application(
  display_name="FastMCP Server App",
  sign_in_audience="AzureADMyOrg",  # Single tenant
  web=WebApplication(
   redirect_uris=[
        "http://localhost:8000/auth/callback",
        "https://vscode.dev/redirect",
        "http://127.0.0.1:33418",
        "https://deployedurl.com/auth/callback"
    ],
  ),
  api=ApiApplication(
    oauth2_permission_scopes=[
      PermissionScope(
        id=uuid.UUID("{" + str(uuid.uuid4()) + "}"),
        admin_consent_display_name="Access FastMCP Server",
        admin_consent_description="Allows access to the FastMCP server as the signed-in user.",
        user_consent_display_name="Access FastMCP Server",
        user_consent_description="Allow access to the FastMCP server on your behalf",
        is_enabled=True,
        value="mcp-access",
        type="User",
      )],
    requested_access_token_version=2,  # Required by FastMCP
  )
)
app = await graph_client.applications.post(request_app)

await graph_client.applications.by_application_id(app.id).patch(
  Application(identifier_uris=[f"api://{app.app_id}"]))

Thanks to that configuration, when an MCP client like VS Code requests an OAuth2 token, it will request a token with the scope "api://{app.app_id}/mcp-access", and the FastMCP server will validate that incoming tokens contain that scope.

Next, I create a Service Principal for that Entra app registration, which represents the Entra app in my tenant

request_principal = ServicePrincipal(app_id=app.app_id, display_name=app.display_name)
await graph_client.service_principals.post(request_principal)

I need a way for the server to prove that it can use that Entra app registration, so I register a secret:

password_credential = await graph_client.applications.by_application_id(app.id).add_password.post(
  AddPasswordPostRequestBody(
    password_credential=PasswordCredential(display_name="FastMCPSecret")))

Ideally, I would like to move away from secrets, as Entra now has support for using federated identity credentials for Entra app registrations instead, but that form of credential isn't supported yet in the FastMCP SDK. If you choose to use a secret, make sure that you store the secret securely.

Granting admin consent

This next step is only necessary when our MCP server wants to use an OBO flow to exchange access tokens for other resource server tokens (Graph API tokens, in this case). For the OBO flow to work, the Entra app registration needs permission to call the Graph API on behalf of users. If we controlled the client, we could force it to request the required scopes as part of the initial login dialog. However, since we are configuring this server to work with arbitrary MCP clients, we don't have that option. Instead, we grant admin consent to the Entra app for the necessary scopes, such that no Graph API consent dialog is needed.

This code grants the admin consent to the associated service principal for the Graph API resource and scopes:

server_principal = await graph_client.service_principals_with_app_id(app.app_id).get()
grant = GrantDefinition(
    principal_id=server_principal.id,
    resource_app_id="00000003-0000-0000-c000-000000000000", # Graph API
    scopes=["User.Read", "email", "offline_access", "openid", "profile"],
    target_label="server application")
resource_principal = await graph_client.service_principals_with_app_id(grant.resource_app_id).get()
desired_scope = grant.scope_string()
await graph_client.oauth2_permission_grants.post(
  OAuth2PermissionGrant(
    client_id=grant.principal_id,
    consent_type="AllPrincipals",
    resource_id=resource_principal.id,
    scope=desired_scope))

If our MCP server needed to use an OBO flow with another resource server, we could request additional grants for those resources and scopes.

Our Entra app registration is now ready for the MCP server, so let's move on to see the server code.

Using FastMCP servers with Entra

In our MCP server code, we configure FastMCP's built in AzureProvider based off the details from the Entra app registration process:

auth = AzureProvider(
    client_id=os.environ["ENTRA_PROXY_AZURE_CLIENT_ID"],
    client_secret=os.environ["ENTRA_PROXY_AZURE_CLIENT_SECRET"],
    tenant_id=os.environ["AZURE_TENANT_ID"],
    base_url=entra_base_url, # MCP server URL
    required_scopes=["mcp-access"],
    client_storage=oauth_client_store, # in-memory or Cosmos DB
)

To make it easy for our MCP tools to access an identifier for the currently logged in user, we define a middleware that inspects the claims of the current token using FastMCP's get_access_token() and sets the "oid" (Entra object identifier) in the state:

class UserAuthMiddleware(Middleware):
    def _get_user_id(self):
        token = get_access_token()
        if not (token and hasattr(token, "claims")):
            return None
        return token.claims.get("oid")

    async def on_call_tool(self, context: MiddlewareContext, call_next):
        user_id = self._get_user_id()
        if context.fastmcp_context is not None:
            context.fastmcp_context.set_state("user_id", user_id)
        return await call_next(context)

    async def on_read_resource(self, context: MiddlewareContext, call_next):
        user_id = self._get_user_id()
        if context.fastmcp_context is not None:
            context.fastmcp_context.set_state("user_id", user_id)
        return await call_next(context)

When we initialize the FastMCP server, we set the auth provider and include that middleware:

mcp = FastMCP("Expenses Tracker",
  auth=auth,
  middleware=[UserAuthMiddleware()])

Now, every request made to the MCP server will require authentication. The server will return a 401 if a valid token isn't provided, and that 401 will prompt the MCP client to kick off the MCP authorization flow.

Inside each tool, we can grab the user id from the state, and use that to customize the response for the user, like to store or query items in a database.

@mcp.tool
async def add_user_expense(
    date: Annotated[date, "Date of the expense in YYYY-MM-DD format"],
    amount: Annotated[float, "Positive numeric amount of the expense"],
    description: Annotated[str, "Human-readable description of the expense"],
    ctx: Context,
):
  """Add a new expense to Cosmos DB."""
  user_id = ctx.get_state("user_id")
  if not user_id:
    return "Error: Authentication required (no user_id present)"
  expense_item = {
    "id": str(uuid.uuid4()),
    "user_id": user_id,
    "date": date.isoformat(),
    "amount": amount,
    "description": description
  }
  await cosmos_container.create_item(body=expense_item)

Using OBO flow in FastMCP server

Now we have everything we need to use an OBO flow inside the MCP tools, when desired. To make it easier to exchange and validate tokens, we use the Python MSAL SDK, configuring a ConfidentialClientApplication similarly to how we set up the FastMCP auth provider:

confidential_client = ConfidentialClientApplication(
    client_id=os.environ["ENTRA_PROXY_AZURE_CLIENT_ID"],
    client_credential=os.environ["ENTRA_PROXY_AZURE_CLIENT_SECRET"],
    authority=f"https://login.microsoftonline.com/{os.environ['AZURE_TENANT_ID']}",
    token_cache=TokenCache(),
)

Inside the tool that requires OBO, we ask MSAL to exchange the MCP access token for a Graph API access token:

access_token = get_access_token()
graph_resource_access_token = confidential_client.acquire_token_on_behalf_of(
  user_assertion=access_token.token, scopes=["https://graph.microsoft.com/.default"]
)
graph_token = graph_resource_access_token["access_token"]

Once we successfully acquire the token, we can use that token with the Graph API, for any operations permitted by the scopes in the admin consent granted earlier. For this example, we call the Graph API to check whether the logged in user is a member of a particular Entra group, and restrict tool usage if not:

async with httpx.AsyncClient() as client:
  url = ("https://graph.microsoft.com/v1.0/me/transitiveMemberOf/microsoft.graph.group"
    f"?$filter=id eq '{group_id}'&$count=true")
  response = await client.get(
    url,
    headers={
      "Authorization": f"Bearer {graph_token}",
      "ConsistencyLevel": "eventual",
  })
  data = response.json()
  membership_count = data.get("@odata.count", 0)

You could imagine many other ways to use an OBO flow however, like to query for more details from the Graph API, upload documents to OneDrive/SharePoint/Notes, send emails, and more!

All together now

For the full code, check out the open source python-mcp-demos repository, and follow the deployment steps for Entra. The most relevant code files are:

  • auth_init.py: Creates the Entra app registration, service principal, client secret, and grants admin consent for OBO flow.
  • auth_update.py: Updates the app registration's redirect URIs after deployment, adding the deployed server URL.
  • auth_entra_mcp.py: The MCP server itself, configured with FastMCP's AzureProvider and tools that use OBO for group membership checks.

I want to reiterate once more that the OAuth proxy approach is intended only for development and testing scenarios. For production deployments, Microsoft recommends using pre‑registered client applications where client identifiers and permissions are explicitly created, reviewed, and approved on a per-app basis. I hope that in the future, Entra will formally support MCP authorization via the CIMD protocol, so that we can build MCP servers with Entra auth that work with MCP clients in a fully secure and production-ready way.

As always, please let me know if you have further questions or ideas for other Entra integrations.

Friday, December 19, 2025

Watch the recordings from my Python + MCP series

MCP is one of the fastest growing technologies in the Generative AI space this year, and the first AI related standard that the industry has really embraced wholeheartedly. I just gave a three-part live stream series all about Python + MCP. I showed how to:

  • Build MCP servers in Python using FastMCP
  • Deploy them into production on Azure (Container Apps and Functions)
  • Add authentication, using either Keycloak and Microsoft Entra as the OAuth provider

All of the materials from our series are available and linked below:

  • Video recordings of each stream
  • Powerpoint slides
  • Open-source code samples complete with Azure infrastructure and 1-command deployment

If you're an instructor, feel free to use the slides and code examples in your own classes. 
Spanish speaker? My colleague delivered a fantastic Spanish version of the series.

Building MCP servers with FastMCP

YouTube video
📺 Watch YouTube recording

In the intro session of our Python + MCP series, we dive into MCP (Model Context Protocol). This open protocol makes it easy to extend AI agents and chatbots with custom functionality, making them more powerful and flexible. We demonstrate how to use the Python FastMCP SDK to build an MCP server running locally. Then we consume that server from chatbots like GitHub Copilot in VS Code, using it's tools, resources, and prompts. Finally, we discover how easy it is to connect AI agent frameworks like Langchain and Microsoft agent-framework to the MCP server.

Deploying MCP servers to the cloud

YouTube video
📺 Watch YouTube recording

In our second session of the Python + MCP series, we deploy MCP servers to the cloud! We walk through the process of containerizing a FastMCP server with Docker and deploying to Azure Container Apps. Then we instrument the MCP server with OpenTelemetry and observe the tool calls using Azure Application Insights and Logfire. Finally, we explore private networking options for MCP servers, using virtual networks that restrict external access to internal MCP tools and agents.

Authentication for MCP servers

YouTube video
📺 Watch YouTube recording

In our third session of the Python + MCP series, we explore the best ways to build authentication layers on top of your MCP servers. We start off simple, with an API key to gate access, and demonstrate a key-restricted FastMCP server deployed to Azure Functions. Then we move on to OAuth-based authentication for MCP servers that provide user-specific data. We dive deep into MCP authentication, which is built on top of OAuth2 but with additional requirements like PRM and DCR/CIMD, which can make it difficult to implement fully. We demonstrate the full MCP auth flow in the open-souce identity provider KeyCloak, and show how to use an OAuth proxy pattern to implement MCP auth on top of Microsoft Entra.

Friday, October 31, 2025

Watch the recordings from my Python + AI series

My colleague and I just wrapped up a live series on Python + AI, a nine-part journey diving deep into how to use generative AI models from Python. I gave the english streams while my colleague Gwen gave the spanish streams (and I hung out in her live chat, working on my technical spanish!).

The series introduced multiple types of models, including LLMs, embedding models, and vision models. We dug into popular techniques like RAG, tool calling, and structured outputs. We assessed AI quality and safety using automated evaluations and red-teaming. Finally, we developed AI agents using popular Python agents frameworks and explored the new Model Context Protocol (MCP).

To apply the concepts, we put together code examples that run for free thanks to GitHub Models, a service that provides free models to every GitHub account holder for experimentation and education. The examples are also compatible with local models (via Ollama), Azure OpenAI, or OpenAI.com models.

Even if you missed the live series, you can still access all the material using the links below! If you're an instructor, feel free to use the slides and code examples in your own classes.


Python + AI: Large Language Models

YouTube video
📺 Watch recording

In this session, we explore Large Language Models (LLMs), the models that power ChatGPT and GitHub Copilot. We use Python to interact with LLMs using popular packages like the OpenAI SDK and LangChain. We experiment with prompt engineering and few-shot examples to improve outputs. We also demonstrate how to build a full-stack app powered by LLMs and explain the importance of concurrency and streaming for user-facing AI apps.


Python + AI: Vector embeddings

YouTube video
📺 Watch recording

In our second session, we dive into a different type of model: the vector embedding model. A vector embedding is a way to encode text or images as an array of floating-point numbers. Vector embeddings enable similarity search across many types of content. In this session, we explore different vector embedding models, such as the OpenAI text-embedding-3 series, through both visualizations and Python code. We compare distance metrics, use quantization to reduce vector size, and experiment with multimodal embedding models.


Python + AI: Retrieval Augmented Generation

YouTube video
📺 Watch recording

In our third session, we explore one of the most popular techniques used with LLMs: Retrieval Augmented Generation. RAG is an approach that provides context to the LLM, enabling it to deliver well-grounded answers for a particular domain. The RAG approach works with many types of data sources, including CSVs, webpages, documents, and databases. In this session, we walk through RAG flows in Python, starting with a simple flow and culminating in a full-stack RAG application based on Azure AI Search.


Python + AI: Vision models

YouTube video
📺 Watch recording

Our fourth session is all about vision models! Vision models are LLMs that can accept both text and images, such as GPT-4o and GPT-4o mini. You can use these models for image captioning, data extraction, question answering, classification, and more! We use Python to send images to vision models, build a basic chat-with-images app, and create a multimodal search engine.


Python + AI: Structured outputs

YouTube video
📺 Watch recording

In our fifth session, we discover how to get LLMs to output structured responses that adhere to a schema. In Python, all you need to do is define a Pydantic BaseModel to get validated output that perfectly meets your needs. We focus on the structured outputs mode available in OpenAI models, but you can use similar techniques with other model providers. Our examples demonstrate the many ways you can use structured responses, such as entity extraction, classification, and agentic workflows.


Python + AI: Quality and safety

YouTube video
📺 Watch recording

This session covers a crucial topic: how to use AI safely and how to evaluate the quality of AI outputs. There are multiple mitigation layers when working with LLMs: the model itself, a safety system on top, the prompting and context, and the application user experience. We focus on Azure tools that make it easier to deploy safe AI systems into production. We demonstrate how to configure the Azure AI Content Safety system when working with Azure AI models and how to handle errors in Python code. Then we use the Azure AI Evaluation SDK to evaluate the safety and quality of output from your LLM.


Python + AI: Tool calling

YouTube video
📺 Watch recording

In the final part of the series, we focus on the technologies needed to build AI agents, starting with the foundation: tool calling (also known as function calling). We define tool call specifications using both JSON schema and Python function definitions, then send these definitions to the LLM. We demonstrate how to properly handle tool call responses from LLMs, enable parallel tool calling, and iterate over multiple tool calls. Understanding tool calling is absolutely essential before diving into agents, so don't skip over this foundational session.


Python + AI: Agents

YouTube video
📺 Watch recording

In the penultimate session, we build AI agents! We use Python AI agent frameworks such as the new agent-framework from Microsoft and the popular LangGraph framework. Our agents start simple and then increase in complexity, demonstrating different architectures such as multiple tools, supervisor patterns, graphs, and human-in-the-loop workflows.


Python + AI: Model Context Protocol

YouTube video
📺 Watch recording

In the final session, we dive into the hottest technology of 2025: MCP (Model Context Protocol). This open protocol makes it easy to extend AI agents and chatbots with custom functionality, making them more powerful and flexible. We demonstrate how to use the Python FastMCP SDK to build an MCP server running locally and consume that server from chatbots like GitHub Copilot. Then we build our own MCP client to consume the server. Finally, we discover how easy it is to connect AI agent frameworks like LangGraph and Microsoft agent-framework to MCP servers. With great power comes great responsibility, so we briefly discuss the security risks that come with MCP, both as a user and as a developer.

Thursday, September 18, 2025

Filter the tools from MCP servers

What I like about MCP servers: they give me lots of great tools that can make my agents more powerful, with very little work on my side. 🎉

What I don't like about MCP servers: they give me TOO many tools! I usually only need a handful of tools for a task, but a server can expose dozens. 😿

The problems with too many tools:

  • LLM confusion. The LLM will be presented with the tool definition for every single tool in the server, and it needs to decide which tool (if any) is the best for the job. That's a hard decision for an LLM - it's always better to make it easier for the LLM by narrowing the tool list.
  • Increased tokens. The tool call definitions require more tokens, which can cost more money, increase latency, and potentially even go over the context window limit of the model.
  • Destructive actions. A server may include tools that are read-only, just sending down data to serve as context, but many servers expose tools that do write operations, like the GitHub MCP server's tools for creating issues, closing issues, pushing branches, and many more. It's possible your task requires some of those write ops, but you generally want to be very explicit about whether an agent is allowed to take action that can actually change something about your accounts and environments. Otherwise, you can be in for a nasty surprise when the agent took actions that you weren't expecting. (Ask me how I know...)

Fortunately, there is almost always a way to configure agents to only allow a subset of the tools from an MCP server. In this blog post, I'll share ways to filter tools in my favorite agentic coder, GitHub Copilot in VS Code, plus two popular AI agent frameworks, Langchain v1 and Pydantic AI.


Agentic coding with GitHub Copilot in VS Code

Global configuration

When you are using agent mode in VS Code, configure the tools by selecting the gear icon near the chat input window.

That will pop-up a window showing all your available tools, coming from both installed MCP servers and VS Code extensions. You can select or de-select at the extension/server level, the toolset level, and the individual tool level.

When you configure that tool selection, that affects all the interactions in Agent mode.

What if you want different tool subsets for different kinds of tasks - like when you're planning a feature versus fixing a bug?

Custom modes

That's where custom chat modes come in. We define a modename.mode.md file that provides a prompt, a preferred model, and an allowed list of tools.

For example, here's the start of my fixer.mode.md file for fixing issues:

---
description: 'Fix and verify issues in app'
model: GPT-5
tools: ['extensions', 'codebase', 'usages', 'vscodeAPI', 'problems', 'changes', 'testFailure', 'fetch', 'findTestFiles', 'searchResults', 'githubRepo', 'runTests', 'runCommands', 'runTasks', 'editFiles', 'runNotebooks', 'search', 'new', 'create_pull_request', 'get_issue', 'get_issue_comments', 'get-library-docs', 'playwright', 'pylance mcp server']
---

# Fixer Mode Instructions

You are in fixer mode. When given an issue to fix, follow these steps:

1. **Gather context**: Read error messages/stack traces/related code. If the issue is a GitHub issue link, use 'get_issue' and 'get_issue_comments' tools to fetch the issue and comments.
2. **Make targeted fix**: Make minimal changes to fix the issue. Do not fix any issues that weren't identified. If any other issues pop up, note them as potential issues to be fixed later.
3. **Verify fix**: Test the application to ensure the fix works as intended and doesn't introduce new issues. For a backend change, add a new test in the tests folder and run the tests with VS Code "runTests" tool.

VS Code detects all of the custom chat modes in my project, and then lists custom modes as options in the mode picker in the chat window:

When I'm in that mode, I have full confidence that the agent will only use the tools from that list, and I often customize the mode prompt with additional guidance on using the allowed tools, for optimal results.

Python AI agent frameworks

At this point, most AI agent frameworks have built-in support for pointing an agent at an MCP server, giving that agent the ability to use the tools from the server. A growing number of the frameworks also make it possible to filter the list of tools from the server. If you use a framework that doesn't yet make it possible, file an issue and let them know it's important to you. Let's look at two examples from popular frameworks.

Langchain v1

Langchain v1 is the latest major version of Langchain, and it is a very agent-centric SDK. It's still in alpha testing as of September 2025, so we need to explicitly install the alpha version of the langchain package in order to use it.

To use a langchain agent with MCP, we install the langchain_mcp_adapters package, and then construct an MCP client for the server.

The client below connects to the GitHub MCP server using a fine-grained personal access token:

mcp_client = MultiServerMCPClient(
  {
    "github": {
      "url": "https://api.githubcopilot.com/mcp/",
      "transport": "streamable_http",
      "headers": {"Authorization": f"Bearer {os.getenv('GITHUB_TOKEN', '')}"},
    }
  }
)

Note that I configured that access token with the minimal access needed for the desired tools, which is another best practice to avoid unintended actions.

Then I fetch the list of tools from the server, and filter that list to only keep the 4 tools needed for the task:

tools = await mcp_client.get_tools()
allowed_tool_names = ("list_issues", "search_code", "search_issues", "search_pull_requests")
filtered_tools = [t for t in tools if t.name in allowed_tool_names]

Finally, I create an agent with my prompt and filtered tool list:

agent = create_agent(base_model, prompt=prompt, tools=filtered_tools)

For a full example, see langchainv1_mcp_github.py. If you're using Langchain v1, you may also be interested in their human-in-the-loop middleware, which can prompt human confirmation only when certain tools are called. That way, you could give your agent access to write operations, but ensure that a human approved each of those actions.


Pydantic AI

Pydantic AI is an agents framework from the Pydantic team that puts a big focus on type safety and observability.

Pydantic AI includes MCP support out of the box, so we only need to install pydantic-ai. We configure the target MCP server with the URL and authorization headers:

server = MCPServerStreamableHTTP(
  url="https://api.githubcopilot.com/mcp/",
  headers={"Authorization": f"Bearer {os.getenv('GITHUB_TOKEN', '')}"}
)

Next we create a FilteredToolset on the MCP server by defining a lambda function that filters by name:

allowed_tool_names = ("list_issues", "search_code", "search_issues", "search_pull_requests")
filtered_tools = server.filtered(
  lambda ctx, tool: tool.name in allowed_tool_names)

Finally, we point the agent at the filtered tool set:

agent = Agent(model, system_prompt=prompt, toolsets=[filtered_tools])

For a full example, see pydantic_mcp_github.py.

As you can see, we can achieve the same tool filtering in multiple frameworks with a similar approach.

Monday, August 18, 2025

How I learn about generative AI

I do not consider myself an expert in generative AI, but I now know enough to build full-stack web applications on top of generative AI models, evaluate the quality of those applications, and decide whether new models or frameworks will be useful. These are the resources that I personally used for getting up to speed with generative AI.

AI foundation

Let's start first with the long-form content: books and videos that gave me a more solid foundation.


AI Engineering

By Chip Huyen

This book is a fantastic high-level overview of the AI Engineering industry from an experienced ML researcher. I recommend that everybody read this book at some point in your learning journey. Despite Chip's background in ML, the book is very accessible - no ML background is needed, though a bit of programming with LLMs would be a good warm-up for the book. I loved how Chip included both research and industry insights, and her focus on the need for evaluation in the later chapters. Please, read this book!
book cover for Build a Large Language Model
Build a Large Language Model

By Sebastian Raschka

This book is a deep dive into building LLMs from scratch using Python and Pytorch, and includes a GitHub repository with runnable code. I found it helpful to see that LLMs are all about matrix manipulation, and to wrap my head around how the different layers in the LLM architecture map to matrices. I recommend it to Python developers who want to understand concepts like the transformer architecture, or even just common LLM parameters like temperature and top p. If you're new to Pytorch, this book thankfully includes an intro in the appendix, but I also liked the Deep Learning with PyTorch book.


Zero to Hero

By Andrej Karpathy

This video series builds neural networks from scratch, entirely in Jupyter notebooks. Andrej is a fantastic teacher, and has a great way of explaining complex topics. Admittedly, I have not watched every video from start to finish, but every time I do watch a video from Andrej, I learn so much. Andrej also gives great talks at conferences, like his recent one about how software is changing due to LLMs.

thumbnail for Vector search videos Vector Similarity Search

By James Briggs

This video series goes into technical details of vector search and database technologies. I watched several of the videos when I was trying to understand the different indexes (like HNSW/IVF), and I liked the explanations from James more than any others I found. James also actually tried out the different indexes and parameters, and shared performance findings, so I came away with both conceptual and practical knowledge.

AI news

Now that I have a foundation, how do I find out what's happening in the generative AI space?

  • Company chats: We have one at Microsoft specifically around Generative AI, and another around AI-assisted Coding (with GitHub Copilot, in our case). We share external news and our own experiences in those channel. If you don't have such a channel, start one!
  • Newsletters from AI companies like Langchain and Pydantic AI. Even if I'm not actively using a framework, seeing what they're working on is a good indication of what's happening in the space.
  • LinkedIn
  • HackerNews

Plus a few folks in particular...

  • Simon Willison: Whenever a new model comes out, Simon finds time to experiment with the model and publish his findings on his blog. Simon is also the creator of the popular llm CLI (which I played with recently), and is quick to integrate new model support into the CLI. Check his about page for the many ways you can subscribe to his posts.
  • Gergeley Orsosz: Gergely sends a newsletter with deep dives into industry topics, and many of the topics this year have been around generative AI and the use of AI for software development. I finally upgraded to a paid subscription so I can read his full posts. Fortunately, my employer reimburses the subscription as "Professional Development", and your employer may as well.
  • Hamel Husain: An expert in LLM evaluation and RAG systems. I first discovered him via his massive Evals FAQ landing on the frontpage of HackerNews, and then subscribed to his blog, which is chock full of deep technical techniques. Hamel co-teaches a paid course on evals with a UC Berkeley professor, and often shares notes for free from that course on his blog.
  • Eleanor Berger and Isaac Flath: This team of software developers has been digging deep into agentic coding for the last year, experimenting with every new tool, model, and workflow, and sharing their learnings in blog posts. They also run a full paid course, which I haven't tried, but is reviewed highly.
  • Drew Breunig: A technologist with thoughtful blog posts about LLMs and their practicality in industry.
  • Gary Marcus: Gary is an ML researcher, and is skeptical of the current hype around generative AI. His posts are about the current flaws with transformer-based LLMs, security issues, and quality failures. I like to read Gary's posts as a healthy counterbalance to everything else I consume.

Practice, practice, practice

How do I practice what I've learnt? With projects, of course, the best form of learning! I have learnt the most from maintaining our RAG-on-Azure solution as so many developers have shared their RAG trials and tribulations in the issue tracker. However, I've also learnt from the many other projects I've put on my GitHub, like trying out the myriad Python AI frameworks, and building agents to automate boring every-day tasks.

I recommend starting with low-risk projects that are personally useful for you, and where you have domain expertise, so that you can reason about whether your LLM-based solution is truly giving you high quality output. LLMs can be very helpful, but they can also be very inaccurate: the trick is to find the scenarios where they are both helpful and accurate.

I don't always have time to create a whole new project using a new AI technology, but I at least try to spend an hour trying things out. I ran an internal "AI Study Hour" with colleagues for several months, where we would just poke around documentation and get the basic examples working. Lately I've been doing similar study hours on my YouTube channel, since I figure other people may want to study along with me. 😊

Sharing what I've learnt

When I learn new technologies, my goal is then to share what I learn with others - that's why I like being a developer advocate, as it gives me an excuse to continually learn and share. I recently put on a Python + AI video series with my colleague (who gave it in Spanish), which is designed to be a great introductory series for Python developers who are new to generative AI. We followed that with a Python + MCP video series, and are planning to go deep into agents in our 2026 series. You can find my other talks on my website. There's always more to learn!