Friday, April 11, 2025

Use any Python AI agent framework with free GitHub Models

I ❤️ when companies offer free tiers for developer services, since it gives everyone a way to learn new technologies without breaking the bank. Free tiers are especially important for students and people between jobs, where the desire to learn is high but the available cash is low.

That's why I'm such a fan of GitHub Models: free, high-quality generative AI models available to anyone with a GitHub account. The available models include the latest OpenAI LLMs (like o3-mini), LLMs from the research community (like Phi and Llama), LLMs from other popular providers (like Mistral and Jamba), multimodal models (like gpt-4o and llama-vision-instruct) and even a few embedding models (from OpenAI and Cohere). So cool! With access to such a range of models, you can prototype complex multi-model workflows to improve your productivity or heck, just make something fun for yourself. 🤗

To use GitHub Models, you can start off in no-code mode: open the playground for a model, send a few requests, tweak the parameters, and check out the answers. When you're ready to write code, select "Use this model". A screen will pop up where you can select a programming language (Python/JavaScript/C#/Java/REST) and select an SDK (which varies depending on model). Then you'll get instructions and code for that model, language, and SDK.

But here's what's really cool about GitHub Models: you can use them with all the popular Python AI frameworks, even if the framework has no specific integration with GitHub Models. How is that possible?

  1. The vast majority of Python AI frameworks support the OpenAI Chat Completions API, since that API became a defacto standard supported by many LLM API providers besides OpenAI itself.
  2. GitHub Models also provide OpenAI-compatible endpoints for chat completion models.
  3. Therefore, any Python AI framework that supports OpenAI-like models can be used with GitHub Models as well. 🎉

To prove my claim, I've made a new repository with examples from eight different Python AI agent packages, all working with GitHub Models: python-ai-agent-frameworks-demos. There are examples for AutoGen, LangGraph, Llamaindex, OpenAI Agents SDK, OpenAI standard SDK, PydanticAI, Semantic Kernel, and SmolAgents. You can open that repository in GitHub Codespaces, install the packages, and get the examples running immediately.

GitHub models plus 8 package names

Now let's walk through the API connection code for GitHub Models for each framework. Even if I missed your favorite framework, I hope my tips here will help you connect any framework to GitHub Models.

OpenAI sdk

I'll start with openai, the package that started it all!

import openai

client = openai.OpenAI(
  api_key=os.environ["GITHUB_TOKEN"],
  base_url="https://models.inference.ai.azure.com")

The code above demonstrates the two key parameters we'll need to configure for all frameworks:

  • api_key: When using OpenAI.com, you pass your OpenAI API key here. When using GitHub Models, you pass in a Personal Access Token (PAT). If you open the repository (or any repository) in GitHub Codespaces, a PAT is already stored in the GITHUB_TOKEN environment variable. However, if you're working locally with GitHub Models, you'll need to generate a PAT yourself and store it. PATs expire after a while, so you need to generate new PATs every so often.
  • base_url: This parameter tells the OpenAI client to send all requests to "https://models.inference.ai.azure.com" instead of the OpenAI.com API servers. That's the domain that hosts the OpenAI-compatible endpoint for GitHub Models, so you'll always pass that domain as the base URL.

If we're working with the new openai-agents SDK, we use very similar code, but we must use the AsyncOpenAI client from openai instead. Lately, Python AI packages are defaulting to async, because it's so much better for performance.

import agents
import openai

client = openai.AsyncOpenAI(
  base_url="https://models.inference.ai.azure.com",
  api_key=os.environ["GITHUB_TOKEN"])

spanish_agent = agents.Agent(
    name="Spanish agent",
    instructions="You only speak Spanish.",
    model=OpenAIChatCompletionsModel(model="gpt-4o", openai_client=client))

PydanticAI

Now let's look at all of the packages that make it really easy for us, by allowing us to directly bring in an instance of either OpenAI or AsyncOpenAI.

For PydanticAI, we configure an AsyncOpenAI client, then construct an OpenAIModel object from PydanticAI, and pass that model to the agent:

import openai
import pydantic_ai
import pydantic_ai.models.openai


client = openai.AsyncOpenAI(
    api_key=os.environ["GITHUB_TOKEN"],
    base_url="https://models.inference.ai.azure.com")

model = pydantic_ai.models.openai.OpenAIModel(
    "gpt-4o", provider=OpenAIProvider(openai_client=client))

spanish_agent = pydantic_ai.Agent(
    model,
    system_prompt="You only speak Spanish.")

Semantic Kernel

For Semantic Kernel, the code is very similar. We configure an AsyncOpenAI client, then construct an OpenAIChatCompletion object from Semantic Kernel, and add that object to the kernel.

import openai
import semantic_kernel.connectors.ai.open_ai
import semantic_kernel.agents

chat_client = openai.AsyncOpenAI(
  api_key=os.environ["GITHUB_TOKEN"],
  base_url="https://models.inference.ai.azure.com")

chat_completion_service = semantic_kernel.connectors.ai.open_ai.OpenAIChatCompletion(
  ai_model_id="gpt-4o",
  async_client=chat_client)

kernel.add_service(chat_completion_service)
  
spanish_agent = semantic_kernel.agents.ChatCompletionAgent(
  kernel=kernel,
  name="Spanish agent"
  instructions="You only speak Spanish")

AutoGen

Next, we'll check out a few frameworks that have their own wrapper of the OpenAI clients, so we won't be using any classes from openai directly.

For AutoGen, we configure both the OpenAI parameters and the model name in the same object, then pass that to each agent:

import autogen_ext.models.openai
import autogen_agentchat.agents

client = autogen_ext.models.openai.OpenAIChatCompletionClient(
  model="gpt-4o",
  api_key=os.environ["GITHUB_TOKEN"],
  base_url="https://models.inference.ai.azure.com")

spanish_agent = autogen_agentchat.agents.AssistantAgent(
    "spanish_agent",
    model_client=client,
    system_message="You only speak Spanish")

LangGraph

For LangGraph, we configure a very similar object, which even has the same parameter names:

import langchain_openai
import langgraph.graph

model = langchain_openai.ChatOpenAI(
  model="gpt-4o",
  api_key=os.environ["GITHUB_TOKEN"],
  base_url="https://models.inference.ai.azure.com", 
)

def call_model(state):
    messages = state["messages"]
    response = model.invoke(messages)
    return {"messages": [response]}

workflow = langgraph.graph.StateGraph(MessagesState)
workflow.add_node("agent", call_model)

SmolAgents

Once again, for SmolAgents, we configure a similar object, though with slightly different parameter names:

import smolagents

model = smolagents.OpenAIServerModel(
  model_id="gpt-4o",
  api_key=os.environ["GITHUB_TOKEN"],
  api_base="https://models.inference.ai.azure.com")
  
agent = smolagents.CodeAgent(model=model)

Llamaindex

I saved Llamaindex for last, as it is the most different. The Llamaindex Python package has a different constructor for OpenAI.com versus OpenAI-like servers, so I opted to use that OpenAILike constructor instead. However, I also needed an embeddings model for my example, and the package doesn't have an OpenAIEmbeddingsLike constructor, so I used the standard OpenAIEmbedding constructor.

import llama_index.embeddings.openai
import llama_index.llms.openai_like
import llama_index.core.agent.workflow

Settings.llm = llama_index.llms.openai_like.OpenAILike(
  model="gpt-4o",
  api_key=os.environ["GITHUB_TOKEN"],
  api_base="https://models.inference.ai.azure.com",
  is_chat_model=True)

Settings.embed_model = llama_index.embeddings.openai.OpenAIEmbedding(
  model="text-embedding-3-small",
  api_key=os.environ["GITHUB_TOKEN"],
  api_base="https://models.inference.ai.azure.com")

agent = llama_index.core.agent.workflow.ReActAgent(
  tools=query_engine_tools,
  llm=Settings.llm)

Choose your models wisely!

In all of the examples above, I specified the "gpt-4o" model. The "gpt-4o" model is a great choice for agents because it supports function calling, and many agent frameworks only work (or work best) with models that natively support function calling.

Fortunately, GitHub Models includes multiple models that support function calling, at least in my basic experiments:

  • gpt-4o
  • gpt-4o-mini
  • o3-mini
  • AI21-Jamba-1.5-Large
  • AI21-Jamba-1.5-Mini
  • Codestral-2501
  • Cohere-command-r
  • Ministral-3B
  • Mistral-Large-2411
  • Mistral-Nemo
  • Mistral-small

You might find that some models work better than others, especially if you're using agents with multiple tools. With GitHub Models, it's very easy to experiment and see for yourself, by simply changing the model name and re-running the code.

So, have you started prototyping AI agents with GitHub Models yet?! Go on, experiment, it's fun!

No comments: