Thursday, April 2, 2026

Building MCP servers with Entra ID and pre-authorized clients

The Model Context Protocol (MCP) gives AI agents a standard way to call external tools, but things get more complicated when those tools need to know who the user is. In this post, I’ll show how to build an MCP server with the Python FastMCP package that authenticates users with Microsoft Entra ID when they connect from a pre-authorized client such as VS Code.

If you need to build a server that works with any MCP clients, read my previous blog post. With Microsoft Entra as the authorization server, supporting arbitrary clients currently requires adding an OAuth proxy in front, which increases security risk. This post focuses on the simpler pre-authorized-client path instead.

MCP auth

Let’s start by digging into the MCP auth spec, since that explains both the shape of the flow and the constraints we run into with Entra.

The MCP specification includes an authorization protocol based on OAuth 2.1, so an MCP client can send a request that includes a Bearer token from an authorization server, and the MCP server can validate that token.

Diagram showing an MCP client sending a request with a bearer token in the Authorization header to an MCP server

In OAuth 2.1 terms, the MCP client is acting as the OAuth client, the MCP server is the resource server, the signed-in user is the resource owner, and the authorization server issues an access token. In this case, Entra will be our authorization server. We can't necessarily use any OAuth-compatible authorization servers, as MCP auth requires more than just the core OAuth 2.1 functionality.

Diagram mapping MCP roles to OAuth roles: MCP client as OAuth client, MCP server as resource server, signed-in user as resource owner, and Entra as authorization server

In OAuth, the authorization server needs a relationship with the client. MCP auth describes three options:

  • Pre-registration: the auth server has a pre-existing relationship and has the client ID in its database already
  • CIMD (Client Identity Metadata Document): the MCP client sends the URL of its CIMD, a JSON document that describes its attributes, and the auth server bases its interactions on that information.
  • DCR (Dynamic Client Registration): when the auth server sees a new client, it explicitly registers it and stores the client information in its own data. DCR is now considered a "legacy" path, as the hope is for CIMD to be the supported path in the future.

For each MCP scenario - each combination of MCP server, MCP client, and authorization server - we need to determine which of those options are viable and optimal. Here's one way of thinking through it:

Comparison diagram showing which MCP client and authorization server combinations support pre-registration, CIMD, or DCR

VS Code supports all of MCP auth, so its MCP client includes both CIMD and DCR support. However, the Microsoft Entra authorization server does not support CIMD or DCR. That leaves us with only one official option: pre-registration. If we desperately need support for arbitrary clients, it is possible to put a CIMD/DCR proxy in front of Entra, as discussed in my previous blog post, but the Entra team discourages that approach due to increased security risks.

When using pre-registration, the auth flow is relatively simple (but still complex, because hey, this is OAuth!):

  • User asks to use auth-restricted MCP server
  • MCP client makes a request to MCP server without a bearer token
  • MCP server responds with an HTTP 401 and a pointer to its PRM (Protected Resource Metadata) document
  • MCP client reads PRM to discover the authorization server and options
  • MCP client redirects to authorization server, including its client ID
  • User signs into authorization server
  • Authorization server returns authorization code
  • MCP client exchanges authorization code for access token
  • Authorization server returns access token
  • MCP client re-tries original request, but now with bearer token included
  • MCP server validates bearer token and returns successfully

Here's what that looks like:

Sequence diagram of the pre-registered OAuth flow between the user, VS Code MCP client, MCP server, and Microsoft Entra authorization server

Now let's dig into the code for implementing MCP auth with the pre-registered VS Code client.

Registering the MCP server with Entra

Before the server can use Entra to authorize users, we need to register the server with Entra via an app registration. We can do registration using the Azure Portal, Azure CLI, Microsoft Graph SDK, or even Bicep. In this case, I use the Python MS Graph SDK as it allows me to specify everything programmatically.

First, I create the Entra app registration, specifying the sign-in audience (single-tenant) and configuring the MCP server as a protected resource:

scope_id = str(uuid.uuid4())
Application(
  display_name="Entra App for MCP server",
  sign_in_audience="AzureADMyOrg",
  api=ApiApplication(
    requested_access_token_version=2,
    oauth2_permission_scopes=[
      PermissionScope(
        admin_consent_description="Allows access to the MCP server as the signed-in user.",
        admin_consent_display_name="Access MCP Server",
        id=scope_id,
        is_enabled=True,
        type="User",
        user_consent_description="Allow access to the MCP server on your behalf.",
        user_consent_display_name="Access MCP Server",
        value="user_impersonation")
    ],
    pre_authorized_applications=[
      PreAuthorizedApplication(
        app_id=VSCODE_CLIENT_ID,
        delegated_permission_ids=[scope_id],
      )]))

The api parameter is doing the heavy lifting, ensuring that other applications (like VS Code) can request permission to access the server on behalf of a user. Here's what each parameter does:

  • requested_access_token_version=2: Entra ID has two token formats (v1.0 and v2.0). We need v2.0 because that's what FastMCP's token validator expects.
  • oauth2_permission_scopes: This defines a permission called user_impersonation that MCP clients can request when connecting to your server. It's the server saying: "I accept tokens that let an MCP client act on behalf of a signed-in user." Without at least one scope defined, no MCP client can obtain a token for your server — Entra wouldn't know what permission to grant. The name user_impersonation is a convention (we could call it anything), but it clearly signals that the MCP client is accessing your server as the user, not as itself.
  • pre_authorized_applications: This list tells Entra which client applications are pre-approved to request tokens for this server’s API without showing an extra consent prompt to the user. In this case, I list VS Code’s application ID and tie it to the user_impersonation scope, so VS Code can request a token for the MCP server as the signed-in user.

Thanks to that configuration, when VS Code requests a token, it will request a token with the scope "api://{app_id}/user_impersonation", and the FastMCP server will validate that incoming tokens contain that scope.

Next, I create a Service Principal for that Entra app registration, which represents the Entra app in my tenant

request_principal = ServicePrincipal(app_id=app.app_id, display_name=app.display_name)
await graph_client.service_principals.post(request_principal)

Securing credentials for Entra app registrations

I also need a way for the server to prove that it can use that Entra app registration. There are three options:

  • Client secret: Easiest to set up, but since it's a secret, it must be stored securely, protected carefully, and rotated regularly.
  • Certificate: Stronger than a client secret and generally better suited for production, but it still requires certificate storage, renewal, and lifecycle management.
  • Managed identity as Federated Identity Credential (MI-as-FIC): No stored secret, no certificate to manage, and usually the best choice when your app is hosted on Azure. No support for local development however.

I wanted the best of both worlds: easy local development on my machine, but the most secure production story for deployment on Azure Container Apps. So I actually created two Entra app registrations, one for local with client secret, and one for production with managed identity.

Here's how I set up the password for the local Entra app:

password_credential = await graph_client.applications.by_application_id(app.id).add_password.post(
  AddPasswordPostRequestBody(
    password_credential=PasswordCredential(display_name="FastMCPSecret")))

It's a bit trickier to set up the MI-as-FIC, since we first need to provision the managed identity and associate that with our Azure Container Apps resource. I set all of that up in Bicep, and then after provisioning completes, I run this code to configure a FIC using the managed identity:

fic = FederatedIdentityCredential(
    name="miAsFic",
    issuer=f"https://login.microsoftonline.com/{tenant_id}/v2.0",
    subject=managed_identity_principal_id,
    audiences=["api://AzureADTokenExchange"],
)

await graph_client.applications.by_application_id(
    prod_app_id
).federated_identity_credentials.post(fic)

Since I now have two Entra app registrations, I make sure that the environment variables in my local .env point to the secret-secured local Entra app registration, and the environment variables on my Azure Container App point to the FIC-secured prod Entra app registration.

Granting admin consent

This next step is only necessary if the MCP server uses the on-behalf-of (OBO) flow to exchange the incoming access token for a token to a downstream API, such as Microsoft Graph. In this case, my demo server uses OBO so it can query Microsoft Graph to check the signed-in user's group membership.

The earlier code added VS Code as a pre-authorized application, but that only allows VS Code to obtain a token for the MCP server itself; it does not grant the MCP server permission to call Microsoft Graph on the user's behalf. Because the MCP sign-in flow in VS Code does not include a separate consent step for those downstream Graph scopes, I grant admin consent up front so the OBO exchange can succeed.

This code grants the admin consent to the associated service principal for the Graph API resource and scopes:

server_principal = await graph_client.service_principals_with_app_id(app.app_id).get()
graph_principal = await graph_client.service_principals_with_app_id(
    "00000003-0000-0000-c000-000000000000" # Graph API
).get()
await graph_client.oauth2_permission_grants.post(
    OAuth2PermissionGrant(
        client_id=server_principal.id,
        consent_type="AllPrincipals",
        resource_id=graph_principal.id,
        scope="User.Read email offline_access openid profile",
    )
)

If our MCP server needed to use an OBO flow with another resource server, we could request additional grants for those resources and scopes.

Our Entra app registration is now ready for the MCP server, so let's move on to see the server code.

Using FastMCP servers with Entra

In our MCP server code, we configure FastMCP's RemoteAuthProvider based on the details from the Entra app registration process:

from fastmcp.server.auth import RemoteAuthProvider
from fastmcp.server.auth.providers.azure import AzureJWTVerifier

verifier = AzureJWTVerifier(
    client_id=ENTRA_CLIENT_ID,
    tenant_id=AZURE_TENANT_ID,
    required_scopes=["user_impersonation"],
)
auth = RemoteAuthProvider(
    token_verifier=verifier,
    authorization_servers=[f"https://login.microsoftonline.com/{AZURE_TENANT_ID}/v2.0"],
    base_url=base_url,
)

Notice that we do not need to pass in a client secret at this point, even when using the local Entra app registration. FastMCP validates the tokens using Entra's public keys - no Entra app credentials needed.

To make it easy for our MCP tools to access an identifier for the currently logged in user, we define a middleware that inspects the claims of the current token using FastMCP's get_access_token() and sets the "oid" (Entra object identifier) in the state:

class UserAuthMiddleware(Middleware):
    def _get_user_id(self):
        token = get_access_token()
        if not (token and hasattr(token, "claims")):
            return None
        return token.claims.get("oid")

    async def on_call_tool(self, context: MiddlewareContext, call_next):
        user_id = self._get_user_id()
        if context.fastmcp_context is not None:
            await context.fastmcp_context.set_state("user_id", user_id)
        return await call_next(context)

    async def on_read_resource(self, context: MiddlewareContext, call_next):
        user_id = self._get_user_id()
        if context.fastmcp_context is not None:
            await context.fastmcp_context.set_state("user_id", user_id)
        return await call_next(context)

When we initialize the FastMCP server, we set the auth provider and include that middleware:

mcp = FastMCP("Expenses Tracker", auth=auth, middleware=[UserAuthMiddleware()])

Now, every request made to the MCP server will require authentication. The server will return a 401 if a valid token isn't provided, and that 401 will prompt the VS Code MCP client to kick off the MCP authorization flow.

Screenshot of the VS Code prompt asking the user to sign in before using the authenticated MCP server

Inside each tool, we can grab the user id from the state, and use that to customize the response for the user, like to store or query items in a database.

@mcp.tool
async def add_user_expense(
    date: Annotated[date, "Date of the expense in YYYY-MM-DD format"],
    amount: Annotated[float, "Positive numeric amount of the expense"],
    description: Annotated[str, "Human-readable description of the expense"],
    ctx: Context,
):
  """Add a new expense to Cosmos DB."""
  user_id = await ctx.get_state("user_id")
  if not user_id:
    return "Error: Authentication required (no user_id present)"
  expense_item = {
    "id": str(uuid.uuid4()),
    "user_id": user_id,
    "date": date.isoformat(),
    "amount": amount,
    "description": description
  }
  await cosmos_container.create_item(body=expense_item)

Using OBO flow in FastMCP server

Remember when we granted admin consent for the Entra app registration earlier? That means we can use an OBO flow inside the MCP server, to make calls to the Graph API on behalf of the signed-in user.

To make it easier to exchange and validate tokens, we use the Python MSAL SDK and configure a ConfidentialClientApplication.

When using the local secret-secured Entra app registration, this is all we need to set it up:

from msal import ConfidentialClientApplication

confidential_client = ConfidentialClientApplication(
  client_id=entra_client_id,
  client_credential=os.environ["ENTRA_DEV_CLIENT_SECRET"],
    authority=f"https://login.microsoftonline.com/{os.environ['AZURE_TENANT_ID']}",
    token_cache=TokenCache(),
)

When using the production FIC-secured Entra app registration, we need a function that returns tokens for the managed identity:

from msal import ManagedIdentityClient, TokenCache, UserAssignedManagedIdentity

mi_client = ManagedIdentityClient(
  UserAssignedManagedIdentity(client_id=os.environ["AZURE_CLIENT_ID"]),
  http_client=requests.Session(),
  token_cache=TokenCache())

def _get_mi_assertion():
  result = mi_client.acquire_token_for_client(resource="api://AzureADTokenExchange")
  if "access_token" not in result:
    raise RuntimeError(f"Failed to get MI assertion: {result.get('error_description', 'unknown error')}")
  return result["access_token"]

confidential_client = ConfidentialClientApplication(
  client_id=entra_client_id,
  client_credential={"client_assertion": _get_mi_assertion},
  authority=f"https://login.microsoftonline.com/{os.environ['AZURE_TENANT_ID']}",
  token_cache=TokenCache())

Inside any code that requires OBO, we ask MSAL to exchange the MCP access token for a Graph API access token:

graph_resource_access_token = confidential_client.acquire_token_on_behalf_of(
  user_assertion=access_token.token,
  scopes=["https://graph.microsoft.com/.default"]
)
graph_token = graph_resource_access_token["access_token"]

Once we successfully acquire the token, we can use that token with the Graph API, for any operations permitted by the scopes in the admin consent granted earlier. For this example, we call the Graph API to check whether the logged in user is a member of a particular Entra group:

client = httpx.AsyncClient()
url = ("https://graph.microsoft.com/v1.0/me/transitiveMemberOf/microsoft.graph.group"
      f"?$filter=id eq '{group_id}'&$count=true")
response = await client.get(
  url,
  headers={
    "Authorization": f"Bearer {graph_token}",
    "ConsistencyLevel": "eventual",
  })
data = response.json()
membership_count = data.get("@odata.count", 0)
is_admin = membership_count > 0

FastMCP 3.0 now provides a way to restrict tool visibility based on authorization checks, so I wrapped the above code in a function and set it as the auth constraint for the admin tool:

async def require_admin_group(ctx: AuthContext) -> bool:
  graph_token = exchange_for_graph_token(ctx.token.token)
  return await check_user_in_group(graph_token, admin_group_id)

@mcp.tool(auth=require_admin_group)
async def get_expense_stats(ctx: Context):
    """Get expense statistics. Only accessible to admins."""
    ...

FastMCP will run that function both when an MCP client requests the list of tools, to determine which tools can be seen by the current user, and again when a user tries to use that tool, for an added just-in-time security check.

This is just one way to use an OBO flow however. You can use it directly inside tools, like to query for more details from the Graph API, upload documents to OneDrive/SharePoint/Notes, send emails, etc.

All together now

For the full code, check out the open source azure-cosmosdb-identity-aware-mcp-server repository. The most relevant files for the Entra authentication setup are:

  • auth_init.py: Creates the Entra app registrations for production and local development, defines the delegated user_impersonation scope, pre-authorizes VS Code, creates the service principal, and grants admin consent for the Microsoft Graph scopes used in the OBO flow.
  • auth_postprovision.py: Adds the federated identity credential (FIC) after deployment so the container app's managed identity can act as the production Entra app without storing a client secret.
  • main.py: Implements the MCP server using FastMCP's RemoteAuthProvider and AzureJWTVerifier for direct Entra authentication, plus OBO-based Microsoft Graph calls for admin group membership checks.

As always, please let me know if you have further questions or ideas for other Entra integrations.

Acknowledgements: Thank you to Matt Gotteiner for his guidance in implementing the OBO flow and review of the blog post.

No comments: