The relatively new structured outputs mode from the OpenAI gpt-4o model makes it easy for us to define an object schema and get a response from the LLM that conforms to that schema.
Here's the most basic example from the Azure OpenAI tutorial about structured outputs:
class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]
completion = client.beta.chat.completions.parse(
model="MODEL_DEPLOYMENT_NAME",
messages=[
{"role": "system", "content": "Extract the event information."},
{"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
],
response_format=CalendarEvent,
)
output = completion.choices[0].message.parsed
The code first defines the CalendarEvent
class, an instance of a Pydantic model. Then it sends a request to the GPT model specifying a response_format
of CalendarEvent
. The parsed output will be a dictionary containing a name
, date
, and participants
.
We can even go a step farther and turn the parsed output into a CalendarEvent
instance, using the Pydantic model_validate
method:
event = CalendarEvent.model_validate(event)
With this structured outputs capability, it's easier than ever to use GPT models for "entity extraction" tasks: give it some data, tell it what sorts of entities to extract from that data, and constrain it as needed.
Extracting from GitHub READMEs
Let's see an example of a way that I actually used structured outputs, to help me summarize the submissions that we got to a recent hackathon. I can feed the README of a repository to the GPT model and ask for it to extract key details like project title and technologies used.
First I define the Pydantic models:
class Language(str, Enum):
JAVASCRIPT = "JavaScript"
PYTHON = "Python"
DOTNET = ".NET"
class Framework(str, Enum):
LANGCHAIN = "Langchain"
SEMANTICKERNEL = "Semantic Kernel"
LLAMAINDEX = "Llamaindex"
AUTOGEN = "Autogen"
SPRINGBOOT = "Spring Boot"
PROMPTY = "Prompty"
class RepoOverview(BaseModel):
name: str
summary: str = Field(..., description="A 1-2 sentence description of the project")
languages: list[Language]
frameworks: list[Framework]
In the code above, I asked for a list of a Python enum, which will constrain the model to return only options matching that list. I could have also asked for a list[str]
to give it more flexibility, but I wanted to constrain it in this case. I also annoted the description
using the Pydantic Field
class so that I could specify the length of the description. Without that annotation, the descriptions are often much longer. We can use that description
whenever we want to give additional guidance to the model about a field.
Next, I fetch the GitHub readme, storing it as a string:
url = "https://api.github.com/repos/shank250/CareerCanvas-msft-raghack/contents/README.md"
response = requests.get(url)
readme_content = base64.b64decode(response.json()["content"]).decode("utf-8")
Finally, I send off the request and convert the result into a RepoOverview
instance:
completion = client.beta.chat.completions.parse(
model=os.getenv("AZURE_OPENAI_GPT_DEPLOYMENT"),
messages=[
{
"role": "system",
"content": "Extract info from the GitHub issue markdown about this hack submission.",
},
{"role": "user", "content": readme_content},
],
response_format=RepoOverview,
)
output = completion.choices[0].message.parsed
repo_overview = RepoOverview.model_validate(output)
You can see the full code in extract_github_repo.py
Extracting from PDFs
I talk to many customers that want to extract details from PDF, like locations and dates, often to store as metadata in their RAG search index. The first step is to extract the PDF as text, and we have a few options: a hosted service like Azure Document Intelligence, or a local Python package like pymupdf. For this example, I'm using the latter, as I wanted to try out their specialized pymupdf4llm package that converts the PDF to LLM-friendly markdown.
First I load in a PDF of an order receipt and convert it to markdown:
md_text = pymupdf4llm.to_markdown("example_receipt.pdf")
Then I define the Pydantic models for a receipt:
class Item(BaseModel):
product: str
price: float
quantity: int
class Receipt(BaseModel):
total: float
shipping: float
payment_method: str
items: list[Item]
order_number: int
In this example, I'm using a nested Pydantic model Item
for each item in the receipt, so that I can get detailed information about each item.
And then, as before, I send the text off to the GPT model and convert the response back to a Receipt
instance:
completion = client.beta.chat.completions.parse(
model=os.getenv("AZURE_OPENAI_GPT_DEPLOYMENT"),
messages=[
{"role": "system", "content": "Extract the information from the blog post"},
{"role": "user", "content": md_text},
],
response_format=Receipt,
)
output = completion.choices[0].message.parsed
receipt = Receipt.model_validate(output)
You can see the full code in extract_pdf_receipt.py
Extracting from images
Since the gpt-4o model is also a multimodal model, it can accept both images and text. That means that we can send it an image and ask it for a structured output that extracts details from that image. Pretty darn cool!
First I load in a local image as a base-64 encoded data URI:
def open_image_as_base64(filename):
with open(filename, "rb") as image_file:
image_data = image_file.read()
image_base64 = base64.b64encode(image_data).decode("utf-8")
return f"data:image/png;base64,{image_base64}"
image_url = open_image_as_base64("example_graph_treecover.png")
For this example, my image is a graph, so I'm going to have it extract details about the graph. Here are the Pydantic models:
class Graph(BaseModel):
title: str
description: str = Field(..., description="1 sentence description of the graph")
x_axis: str
y_axis: str
legend: list[str]
Then I send off the base-64 image URI to the GPT model, inside a "image_url" type message, and convert the response back to a Graph
object:
completion = client.beta.chat.completions.parse(
model=os.getenv("AZURE_OPENAI_GPT_DEPLOYMENT"),
messages=[
{"role": "system", "content": "Extract the information from the graph"},
{
"role": "user",
"content": [
{"image_url": {"url": image_url}, "type": "image_url"},
],
},
],
response_format=Graph,
)
output = completion.choices[0].message.parsed
graph = Graph.model_validate(output)
More examples
You can use this same general approach for entity extraction across many file types, as long as they can be represented in either a text or image form. See more examples in my azure-openai-entity-extraction repository. As always, remember that large language models are probabilistic next-word-predictors that won't always get things right, so definitely evaluate the accuracy of the outputs before you use this approach for a business-critical task.