Thursday, March 12, 2026

Can MCP choose my outfit?

When I was a kid, one of my first Java applets was a UI for choosing outfits by mixing and matching different articles of clothing. Now, with the advent of agents and MCP, I realized that I could make a modern, more dynamic version: an MCP server that can find relevant clothing based off a user query, and render matching clothing as a slideshow. Let's walk through the experience and code powering it.


Searching for relevant clothing

After connecting VS Code to my closet MCP server, I ask a query like:

i am presenting at PyAI about MCP, do I have MCP themed clothing? show me the best option.

GitHub Copilot decides that it can use the closet MCP server to answer that question, and it calls the image_search tool with these arguments:

{
  "query": "MCP Model Context Protocol themed clothing",
  "max_results": 5
}

The tool call returns a mix of binary files - thumbnails for each matching article of clothing, and structured data- a JSON containing filename, display name, and description for each article.

{
  "results": [
    {
      "filename": "IMG_3234.jpg",
      "display_name": "IMG_3234.jpg",
      "description": "The image shows a black sleeveless dress hanging on a white hanger against a plain wall. The dress has a printed text on the front that reads: \"YOU DOWN WITH MCP? Yeah, you know me!\" The first line is in large white uppercase letters, and the second line is in smaller pink cursive letters. The dress has a fitted top and a flared skirt."
    },...

Here's what that looks like in the GitHub Copilot chat interface. Notice that Copilot attaches the images, so I can actually click on them to see each result directly in VS Code, as if they were a file in the workspace.

Now let's look at the code powering that tool call. I built the server using FastMCP, so I declare my tools by wrapping functions in mcp.tool() decorator and annotating the arguments with types and helpful descriptions. Inside the function, I use Azure AI Search with hybrid retrieval on both the text query and the query's vector, against a target index that has multimodal embeddings for the images plus LLM-generated descriptions for the images. The tool returns a result that contains both the binary files and the structured content.

@mcp.tool()
async def image_search(
  query: Annotated[
    str, "Text description of images to find (e.g., 'red dress')"
  ],
  max_results: Annotated[int, "Max number of images to return (1-20)"] = 5,
) -> ToolResult:
  """
  Search for images matching a natural language query.
  Returns the image data and descriptions.
  """
  results = await search_client.search(
    search_text=query,
    top=max_results,
    vector_queries=[VectorizableTextQuery(
        k_nearest_neighbors=max_results, fields="embedding", text=query)],
    select="metadata_storage_path,verbalized_image")

  blob_service_client = get_blob_service_client()

  files: list[File] = []
  image_results: list[dict[str, str]] = []
  result_index = 0
  async for result in results:
    result_index += 1
    url = result["metadata_storage_path"]
    description = result.get("verbalized_image")
    container_name, blob_name = get_blob_reference_from_url(url)
    blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
    stream = await blob_client.download_blob()
    image_bytes = await stream.readall()
    image_format = get_image_format(url)
    display_name = os.path.basename(blob_name)
    file_basename = Path(display_name).stem
    thumbnail_bytes = resize_image_bytes(image_bytes, image_format)
    files.append(File(data=thumbnail_bytes, format=image_format, name=file_basename))
    image_results.append({
      "filename": blob_name,
      "display_name": display_name,
      "description": description})

  return ToolResult(
    content=files,
    structured_content={
      "query": query,
      "results": image_results})

Displaying selected clothing

Once the agent finds possible matching clothing, it then reasons over the results and selects the best of those results. If the agent is using a multimodal LLM, like most modern frontier models, it's able to reason above both the image content and the image descriptions. It can then render its top choices directly in the UI, using an MCP app that renders a JavaScript-powered slideshow of images.

Here's what that looks like in GitHub Copilot chat:

Let's check out the code that powers that MCP app. An app is actually a kind of tool, so we once again wrap a Python function in @mcp.tool. However, this time, we specify that it's an app with an AppConfig with an associated resource for the image viewer HTML. Inside that function, we fetch the images from Azure Blob Storage based off their filename, return both the binary data for the images and structured content that includes the filename and mime-type of each image.

@mcp.tool(
  app=AppConfig(resource_uri=IMAGE_VIEW_URI)
)
async def display_image_files(
  filenames: Annotated[list[str], "List of image filenames to retrieve"]
) -> ToolResult:
  """Fetch images by filename and render them in a carousel display."""
  blob_service_client = get_blob_service_client()

  image_blocks: list[types.ImageContent] = []
  image_results: list[dict[str, str]] = []
  for filename in filenames:
    blob_client = blob_service_client.get_blob_client(container=IMAGE_CONTAINER_NAME, blob=filename)
    stream = await blob_client.download_blob()
    image_bytes = await stream.readall()
    mime_type = get_image_mime_type(filename)
    image_blocks.append(
      types.ImageContent(
        type="image",
        data=base64.b64encode(image_bytes).decode("utf-8"),
        mimeType=mime_type))
    image_results.append({
      "filename": filename,
      "mimeType": mime_type})

  return ToolResult(
    content=image_blocks,
    structured_content={
      "images": image_results,
  })

Next we need to define the resource that serves up the image viewer HTML page. We wrap a Python function in @mcp.resource, assign it a "ui://" URL that is unique for our MCP server, and declare what servers are allowed in its Content-Security Policy (CSP):

@mcp.resource(
    IMAGE_VIEW_URI,
    app=AppConfig(csp=ResourceCSP(resource_domains=["https://unpkg.com"])),
)
def image_view() -> str:
    """Render images returned by display_image_files as an MCP App."""
    return load_image_viewer_html()

Finally, we need the actual HTML that will render inside the iframed app. This tiny webpage brings in ext-apps, a JavaScript package which manages bidirectional communication with the MCP client. In our JavaScript, we declare an App instance, define the ontoolresult callback, and connect the app. That callback receives the images from the tool result and renders them inside the HTML. Note that apps also can communicate back, but that wasn't necessary for this UI.

<!DOCTYPE html>
<html>
<body>
  <div id="carousel">
    <button id="prev" type="button" aria-label="Previous">&#8249;</button>
    <div id="frame"></div>
    <button id="next" type="button" aria-label="Next">&#8250;</button>
    <span id="counter" aria-live="polite"></span>
  </div>
  <script type="module">
    import { App } from "https://unpkg.com/@modelcontextprotocol/ext-apps@0.4.0/app-with-deps";

    const app = new App({ name: "Image Viewer", version: "1.0.0" });

    let images = [];
    let index = 0;

    const frame = document.getElementById("frame");
    const prevBtn = document.getElementById("prev");
    const nextBtn = document.getElementById("next");
    const counter = document.getElementById("counter");

    function show(i) {
      index = i;
      const img = images[index];
      frame.innerHTML = "";
      const el = document.createElement("img");
      el.src = `data:${img.mimeType || "image/jpeg"};base64,${img.data}`;
      el.alt = "Blob image";
      frame.appendChild(el);
      prevBtn.disabled = index === 0;
      nextBtn.disabled = index === images.length - 1;
      counter.textContent = images.length > 1 ? `${index + 1} / ${images.length}` : "";
    }

    prevBtn.addEventListener("click", () => { if (index > 0) show(index - 1); });
    nextBtn.addEventListener("click", () => { if (index < images.length - 1) show(index + 1); });

    app.ontoolresult = ({ content }) => {
      images = (content || []).filter((block) => block.type === "image");
      if (images.length > 0) show(0);
    };

    await app.connect();
  </script>
</body>
</html>

Try it yourself!

The full MCP server code is available in the Azure-Samples/image-search-aisearch, along with a minimal frontend for image searching and data ingestion via Azure AI Search indexer with Azure OpenAI LLMs (for describing the images) and Azure AI Vision (for multi-modal embeddings of the images). The code can be used for any images, not just pictures of your clothing.

No comments: