Thursday, July 24, 2025

Automated repo maintenance via GitHub Copilot coding agent

I have a problem: I'm addicted to making new repositories on GitHub. As part of my advocacy role at Microsoft, my goal is to show developers how to combine technology X with technology Y, and a repository is a great way to prove it. But that means I now have hundreds of repositories that I am trying to keep working, and they require constant upgrades:

  • Upgraded Python packages, npm packages, GitHub Actions
  • Improved Python tooling (like moving from pip to uv, or black to ruff)
  • Hosted API changes (versions, URLs, deprecations)
  • Infrastructure upgrades (Bicep/Terraform changes)

All of those changes are necessary to keep the repositories working well, but they're both pretty boring changes to make, and they're very repetitive. In theory, GitHub already offers Dependabot to manage package upgrades, but unfortunately Dependabot hasn't worked for my more complex Python setups, so I often have to manually take over the Dependabot PRs. These are the kinds of changes that I want to delegate, so that I can focus on new features and technologies.

Fortunately, GitHub has introduced the GitHub Copilot coding agent, an autonomous agent powered by LLMs and MCP servers that can be assigned issues in your repositories. When you assign an issue to the agent, it will create a PR for the issue, put a plan in that PR, and ask for a review when it's made all the changes necessary. If you have comments, it can continue to iterate, asking for a review each time it thinks it's got it working.

I started off with some manual experimentation to see if GitHub Copilot could handle repo maintenance tasks, like tricky package upgrades. It did well enough that I then coded GitHub Repo Maintainer, a tool that searches for all my repos that require a particular maintenance task and creates issues for @Copilot in those repos with detailed task descriptions.

Here's what an example issue looks like:

Screenshot of issue assigned to Copilot agent

A few minutes after filing the issue, Copilot agent sends a pull request to address the issue:

Screenshot of PR from Copilot agent

To give you a feel for the kinds of issues that I've assigned to Copilot, here are more examples:

  • Update GitHub Actions workflow to use ubuntu-latest: This was an easy task. The only issue was with a more complex workflow where the latest ubuntu had a conflict with an additional service, and it came up with a roundabout way of fixing that.
  • Update Bicep to new syntax: This worked well when I provided it the exact new syntax to use. When I only told it that the old Bicep syntax was deprecated, it came up with a more convoluted way to fix it, and also tried fixing all the Bicep warnings too. It got sidetracked since the agent uses "az bicep build" to check the Bicep syntax validity, and that tool includes warnings by default, and Copilot generally likes to be a do-gooder and fix warnings too. I often will tell it explicitly "ignore the warnings, just fix the errors" for Bicep-related tasks.
  • Upgrade a tricky Python package: This was a harder upgrade as it required upgrading another package at the same time, something Dependabot had failed to do. Copilot was able to work it out, but only once I pointed out that the CI failed and reminded it to make sure to pip install the requirements file.
  • Update a deprecated URL: This was easy for it, especially because my tool tells it exactly which files it found the old URLs in.

Generally a good strategy has been for me to verify the right general fix in one repo, and then send that well-crafted issue to the other affected repos.

How to assign issues to GitHub Copilot

The GitHub documentation has a great guide on using the UI, API, or CLI to assign issues to the GitHub Copilot coding agent. When using the API, we have to first check if the Copilot agent is enabled, by doing a query to see if the repository's suggestedActors includes copilot-swe-agent. If so, then we grab the id of the agent and use that id when creating a new issue.

Here's what it looks like in Python to find the ID for the agent:

async def get_repo_and_copilot_ids(self, repo):
  headers = {"Authorization": f"Bearer {self.auth_token}", "Accept": "application/vnd.github+json"}
  query = '''
    query($owner: String!, $name: String!) {
      repository(owner: $owner, name: $name) {
        id
        suggestedActors(capabilities: [CAN_BE_ASSIGNED], first: 100) {
          nodes {
            login
             __typename
             ... on Bot { id }
          }
        }
      }
    }
  '''
  variables = {"owner": repo.owner, "name": repo.name}

  async with httpx.AsyncClient(timeout=self.timeout) as client:
    resp = await client.post(GITHUB_GRAPHQL_URL, headers=headers, json={"query": query, "variables": variables})
      resp.raise_for_status()
      data = resp.json()
    repo_id = data["data"]["repository"]["id"]
    copilot_node = next((n for n in data["data"]["repository"]["suggestedActors"]["nodes"]
        if n["login"] == "copilot-swe-agent"), None)
    if not copilot_node or not copilot_node.get("id"):
      raise RuntimeError("Copilot is not assignable in this repository.")
    return repo_id, copilot_node["id"]

The issue creation function uses that ID for the assignee IDs:

async def create_issue_graphql(self, repo, issue):
  repo_id, copilot_id = await self.get_repo_and_copilot_ids(repo)
  headers = {"Authorization": f"Bearer {self.auth_token}", "Accept": "application/vnd.github+json"}
  mutation = '''
  mutation($input: CreateIssueInput!) {
    createIssue(input: $input) {
      issue {
        id
        number
        title
        url
      }
    }
  }
  '''
  input_obj = {
    "repositoryId": repo_id,
    "title": issue.title,
    "body": issue.body,
    "assigneeIds": [copilot_id],
  }
  async with httpx.AsyncClient(timeout=self.timeout) as client:
    resp = await client.post(GITHUB_GRAPHQL_URL, headers=headers,
        json={"query": mutation, "variables": {"input": input_obj}})
    resp.raise_for_status()
    data = resp.json()
  issue_data = data.get("data", {}).get("createIssue", {}).get("issue")
  return {
    "number": issue_data["number"],
    "html_url": issue_data["url"]
  }

Lessons learned (so far!)

I've discovered that there are several intentional limitations on the behavior of the @Copilot agent:

  • Workflows must be approved before running: Typically, when a human contributor submits a pull request, and they're an existing contributor to the repository, the workflows automatically run on their PRs, and the contributor can see quickly if they need to fix any CI failures. For security reasons, GitHub requires a human to press "Approve and run workflows" on each push to a @Copilot PR. I will often press that, see that the CI failed, and comment @Copilot to address the CI failures. I would love to skip that manual process on my side, but I understand why GitHub is erring on the side of security here. See more details in their Copilot risk mitigation docs.
  • PRs must be marked "ready to review": Once again, typically a human contributor would start a PR in draft and mark it as "ready for review" before requesting a review. The @Copilot agent does not mark it as ready, and instead requires a human reviewer to mark it for them. According to my discussion with the GitHub team in the Copilot agent issue tracker, this is intentional to avoid triggering required reviews. However, I am hoping that GitHub adds a repository setting to allow the agent itself to mark PRs as ready, so that I can skip that trivial manual step.

I've also realized a few common ways that the @Copilot agent makes unsatisfactory PRs, and have started crafting issue descriptions better to improve the agent's success. My issue descriptions now include...

  • Validation steps: The agent will try to execute any validation steps, so if there are any that make sense, like running a pip install, a linter, or a script, I include those in the issue description. For example, for Bicep changes, issues include "After making this change, run `az bicep build` on `infra/main.bicep` to ensure the Bicep syntax is valid.".
  • How to make a venv: While testing its changes, the agent kept making Python virtual environments in directories other than ".venv", which is the only directory name that I use, and the one that's consistently in my .gitignore files. I would then see PRs that had 4,000 changed files, due to an accidentally checked in virtual environment folder. Now, in my descriptions, I tell it explicitly to create the venv in ".venv".

It's early days, but I'm pretty excited that there's a way that I can keep making ridiculous amounts of repositories and keep them well maintained. Definitely check out the GitHub Copilot coding agent to see if there are ways that it can help you automate the boring parts of repository maintenance.

Monday, July 21, 2025

To MCP or not to MCP?

When we're building automation tools in 2025, I see two main approaches:

  1. Agent + MCP: Point an LLM-powered Agent at MCP servers, give the Agent a detailed description of the task, and let the Agent decide which tools to use to complete the task. For this approach, we can use an existing Agent from agentic frameworks like PydanticAI, OpenAI-Agents, Semantic Kernel, etc., and we can either use an existing MCP server or build a custom MCP server depending on what tools are necessary to complete the range of tasks.
  2. Old school with LLM sprinkles: This is the way we would build it before LLMs: directly script the actions that are needed to complete the task, using APIs and SDKs, and then bring in an LLM for fuzzy decision/analysis points, where we might previously use regular expressions or loving handcrafted if statements.

There's a big obvious benefit to approach #1: we can theoretically give the agent any task that is possible with the tools at its disposal, and the agent can complete that task. So why do I keep writing my tools using approach #2??

  1. Control: I am a bit of a control freak. I like knowing exactly what's going on in a system, figuring out where a bug is happening, and fixing it so that bug never happens again. The more that my tools rely on LLMs for control flow, the less control I have, and that gives me the heebie jeebies. What if the agent only succeeds in the task 90% of the time, as it goes down the wrong path 10% of the time? What if I can't get the agent to execute the task exactly the way I envisioned it? What if it makes a horrible mistake, and I am blamed for its incompetence?

  2. Accuracy: Very related to the last point -- the more LLM calls are added to a system, the harder it is to guarantee accuracy. The impossibility of high accuracy from multi-LLM workflows is discussed in detail in this blog post from an agent developer.
  3. Cost: The MCP-powered approach requires far more tokens, and thus more cost and more energy consumption, all things that I'd like to reduce. The agent requires tokens for the list of MCP servers and tool definitions, and then the agent requires tokens for every additional step it decides to take. In my experience, agents generally do not take the most efficient path to a solution. For example, since they don't know exactly what context they'll need or where the answer is for a question, they prefer to over-research than under-research. An agent can often accomplish a task eventually, but at what cost? How many tokens did it have to use, to get to the same path that I could have hand-coded?

My two most recent "agents" both use approach #2, hand-coded automation with a single call to an LLM where it is most needed:

Personal LinkedIn Agent

github.com/pamelafox/personal-linkedin-agent

This tool automates the triaging of my inbound connection requests, by automating the browser with Playwright to open my account, check requests, open profiles, and click Accept/Ignore as needed. It uses the LLM to decide whether a connection meets my personal criteria ("are they technical? accept!"), and that's it.

When I first started coding the agent, I did try using approach #1 with the Playwright MCP server, but it required so many tokens that it went over the low capacity of the LLM I was using at the time. I wanted my solution to work for models with low rate limits, so I switched over to the Python playwright package instead.

GitHub Repo Maintainer Agent

github.com/pamelafox/github-repo-maintainer-agent

The goal of this tool is to automate the maintenance of my hundreds of repositories, handling upgrades like Python package dependencies and tool modernization. Currently, the tool uses the GitHub API to check my repositories for failed Dependant PRs, analyzes the log failures using an LLM call, creates issues referencing those PRs, and assigns those issues to the GitHub Copilot Coding agent.

That's when the real agentic part of this tool happens, since the GitHub Copilot Coding agent does use an MCP-based approach to resolve the issue. Those Copilot agent sessions can be quite long, and often unsuccessful, so my hope is that I can improve my tool's LLM calls to give that agent additional context and reduce its costs.

I am tempted to add in an MCP-based option to this tool, to help me with more general maintenance tasks, but I am not sure I am ready to give up control... are you?

Friday, July 18, 2025

MCP: Bringing mashups back!

In the summer of 2006, I discovered the blossoming world of web APIs: HTTP APIs like the Flickr API, JavaScript APIs like Google Maps API, and platform APIs like the iGoogle gadgets API. I spent my spare time making "mashups": programs that connected together multiple APIs to create new functionality. For example:

  • A search engine that found song lyrics from Google and their videos from YouTube
  • A news site that combined RSS feeds from multiple sources
  • A map plotting Flickr photos alongside travel recommendations

I adored the combinatorial power of APIs, and felt like the world was my mashable oyster. Mashups were actually the reason that I got back into web development, after having left it for a few years.

And now, with the growing popularity of MCP servers, I am getting a sense of deja vu.

An MCP server is an API: it exposes functionality that another program can use. An MCP server must expose the API in a very strict way, outputting the tools definition to follow the MCP schema. That allows MCP clients to use the tools (API) from any MCP server, since their interface is predictable.

But now it is no longer the programmers that are making the mashups: it's the agents. When using Claude Deskop, you can register MCP servers for searching videos, and Claude can match song lyrics to videos for you. When using GitHub Copilot Agent Mode, you can register the Playwright MCP server for browser automation, and it can write full documentation with screenshots for you. When using any of the many agent frameworks (Autogen, Openai-Agents, Pydantic AI, Langgraph, etc), you can point your Agent at MCP servers, and the agent will call the most relevant tools as needed, weaving them together with calls to LLMs to extract or summarize information. To really empower an agent to make the best mashups, give them access to a Code interpreter, and then they can write and run code to put all the tool outputs together.

And so, we have brought mashups back, but we programmers are no longer the mashers. That is a good thing for non-programmers, as it is empowering people of all backgrounds to weave together their favorite data and functionality. It makes me a bit sad as a programmer, because I love to make direct API calls and control the exact flow of a program. But it is time for me to start managing the mashers and to see what they can create in the hands of others.