使用 Google 的 genai SDK 实现结构化输出¶
推荐的 SDK
`genai` SDK 是 Google 推荐的用于使用 Gemini 模型的 Python 客户端。它为 Gemini API 和 Vertex AI 提供了统一的接口。有关详细的设置说明,包括如何与 Vertex AI 一起使用,请参阅 GenAI SDK 的官方 Google AI 文档。
本指南演示了如何将 Instructor 与 Google 的 `genai` SDK 一起使用,以从 Gemini 模型中提取结构化数据。
Gemini 目前有两种模式
- `Mode.GENAI_TOOLS`:这在底层利用函数调用,并返回结构化响应
- `Mode.GENAI_STRUCTURED_OUTPUTS`:这为 Gemini 提供了一个 JSON Schema,Gemini 将使用它以结构化格式进行响应
安装¶
基本用法¶
联合类型和可选类型
Gemini 在结构化输出和工具调用集成中不支持联合类型和可选类型。当我们检测到您的响应模型中存在这些类型时,我们目前会抛出错误。
开始使用 Instructor 和 genai SDK 非常简单。只需创建一个 Pydantic 模型来定义您的输出结构,修补 (patch) genai 客户端,然后使用 response_model 参数发出请求即可。
from google import genai
import instructor
from pydantic import BaseModel
# Define your Pydantic model
class User(BaseModel):
name: str
age: int
# Initialize and patch the client
client = genai.Client()
client = instructor.from_genai(client, mode=instructor.Mode.GENAI_TOOLS)
# Extract structured data
response = client.chat.completions.create(
model="gemini-2.0-flash-001",
messages=[{"role": "user", "content": "Extract: Jason is 25 years old"}],
response_model=User,
)
print(response) # User(name='Jason', age=25)
消息格式¶
Genai 支持多种消息格式,Instructor 可以无缝地与所有这些格式配合使用。这种灵活性使您可以根据应用程序的需要使用最方便的格式。
from google import genai
import instructor
from pydantic import BaseModel
from google.genai import types
# Define your Pydantic model
class User(BaseModel):
name: str
age: int
# Initialize and patch the client
client = genai.Client()
client = instructor.from_genai(client, mode=instructor.Mode.GENAI_TOOLS)
# Single string (converted to user message)
response = client.chat.completions.create(
model="gemini-2.0-flash-001",
messages="Jason is 25 years old",
response_model=User,
)
print(response)
# > name='Jason' age=25
# Standard format
response = client.chat.completions.create(
model="gemini-2.0-flash-001",
messages=[
{"role": "user", "content": "Jason is 25 years old"}
],
response_model=User,
)
print(response)
# > name='Jason' age=25
# Using genai's Content type
response = client.chat.completions.create(
model="gemini-2.0-flash-001",
messages=[
genai.types.Content(
role="user",
parts=[genai.types.Part.from_text(text="Jason is 25 years old")]
)
],
response_model=User,
)
print(response)
# > name='Jason' age=25
系统消息¶
系统消息有助于为模型设置上下文和指令。对于 Gemini 模型,您可以通过两种不同的方式提供系统消息。
from google import genai
import instructor
from pydantic import BaseModel
class User(BaseModel):
name: str
age: int
client = genai.Client()
client = instructor.from_genai(client, mode=instructor.Mode.GENAI_TOOLS)
# As a parameter
response = client.chat.completions.create(
model="gemini-2.0-flash-001",
system="Jason is 25 years old",
messages=[{"role": "user", "content": "You are a data extraction assistant"}],
response_model=User,
)
print(response)
# > name='Jason' age=25
# Or as a message with role "system"
response = client.chat.completions.create(
model="gemini-2.0-flash-001",
messages=[
{"role": "system", "content": "Jason is 25 years old"},
{"role": "user", "content": "You are a data extraction assistant"},
],
response_model=User,
)
print(response)
# > name='Jason' age=25
模板变量¶
模板变量使得使用不同的值轻松重用提示成为可能。这对于动态内容或测试不同输入时特别有用。
from google import genai
import instructor
from pydantic import BaseModel
from google.genai import types
# Define your Pydantic model
class User(BaseModel):
name: str
age: int
# Initialize and patch the client
client = genai.Client()
client = instructor.from_genai(client, mode=instructor.Mode.GENAI_TOOLS)
# Single string (converted to user message)
response = client.chat.completions.create(
model="gemini-2.0-flash-001",
messages=["{{name}} is {{ age }} years old"],
response_model=User,
context={
"name": "Jason",
"age": 25,
},
)
print(response)
# > name='Jason' age=25
# Standard format
response = client.chat.completions.create(
model="gemini-2.0-flash-001",
messages=[{"role": "user", "content": "{{ name }} is {{ age }} years old"}],
response_model=User,
context={
"name": "Jason",
"age": 25,
},
)
print(response)
# > name='Jason' age=25
# Using genai's Content type
response = client.chat.completions.create(
model="gemini-2.0-flash-001",
messages=[
genai.types.Content(
role="user",
parts=[genai.types.Part.from_text(text="{{name}} is {{age}} years old")],
)
],
response_model=User,
context={
"name": "Jason",
"age": 25,
},
)
print(response)
# > name='Jason' age=25
验证和重试¶
当验证失败时,Instructor 可以自动重试请求,确保您获得格式正确的数据。这在强制执行特定数据要求时特别有用。
from typing import Annotated
from pydantic import AfterValidator, BaseModel
import instructor
from google import genai
def uppercase_validator(v: str) -> str:
if v.islower():
raise ValueError("Name must be ALL CAPS")
return v
class UserDetail(BaseModel):
name: Annotated[str, AfterValidator(uppercase_validator)]
age: int
client = instructor.from_genai(genai.Client())
response = client.chat.completions.create(
model="gemini-2.0-flash-001",
messages=[{"role": "user", "content": "Extract: jason is 25 years old"}],
response_model=UserDetail,
max_retries=3,
)
print(response) # UserDetail(name='JASON', age=25)
多模态能力¶
我们提供了一些不同的示例文件供您测试这些新功能。下面的所有示例都使用了这些文件。
- (音频):葛底斯堡演说的原始录音:gettysburg.wav
- (图像):一些蓝莓植物的图片 image.jpg
- (PDF):一个包含虚假发票的示例 PDF 文件 invoice.pdf
Instructor 提供了一个统一的、与提供商无关的接口,用于处理图像、PDF 和音频文件等多模态输入。借助 Instructor 的多模态对象,您可以使用跨不同 AI 提供商(OpenAI、Anthropic、Mistral 等)工作的统一 API 轻松地从 URL、本地文件或 base64 字符串加载媒体。
Instructor 在后台处理所有特定于提供商的格式要求,确保您的代码随着提供商 API 的演进而保持整洁且面向未来。
让我们看看如何使用 Image、Audio 和 PDF 类。
图像处理¶
自动检测图像
为了方便处理图像,您可以使用 `autodetect_images` 参数启用自动图像转换。启用后,Instructor 将自动检测作为字符串提供的文件路径和 HTTP URL,并将其转换为 Google GenAI SDK 所需的适当格式。这使得处理图像变得无缝且直接。(参见下面的示例)
Instructor 使得使用 Gemini 系列模型分析和提取图像中的语义信息变得容易。点击此处检查您想要使用的模型是否具有视觉能力。
让我们看看下面使用上述示例图像的例子,我们将使用 `from_url` 方法加载它。
请注意,我们也支持使用 `from_path` 和 `from_base64` 类方法加载本地文件和 base64 字符串。
from instructor.multimodal import Image
from pydantic import BaseModel, Field
import instructor
from google.genai import Client
class ImageDescription(BaseModel):
objects: list[str] = Field(..., description="The objects in the image")
scene: str = Field(..., description="The scene of the image")
colors: list[str] = Field(..., description="The colors in the image")
client = instructor.from_genai(Client())
url = "https://raw.githubusercontent.com/instructor-ai/instructor/main/tests/assets/image.jpg"
# Multiple ways to load an image:
response = client.chat.completions.create(
model="gemini-2.0-flash",
response_model=ImageDescription,
messages=[
{
"role": "user",
"content": [
"What is in this image?",
# Option 1: Direct URL with autodetection
Image.from_url(url),
# Option 2: Local file
# Image.from_path("path/to/local/image.jpg")
# Option 3: Base64 string
# Image.from_base64("base64_encoded_string_here")
# Option 4: Autodetect
# Image.autodetect(<url|path|base64>)
],
},
],
)
print(response)
# Example output:
# ImageDescription(
# objects=['blueberries', 'leaves'],
# scene='A blueberry bush with clusters of ripe blueberries and some unripe ones against a cloudy sky',
# colors=['green', 'blue', 'purple', 'white']
# )
音频处理¶
Instructor 使得使用 Gemini 系列模型分析和提取音频文件中的语义信息变得容易。让我们看看下面使用上述示例音频文件的例子,我们将使用 `from_url` 方法加载它。
请注意,我们也支持使用 `from_path` 加载本地文件和 base64 字符串
from instructor.multimodal import Audio
from pydantic import BaseModel
import instructor
from google.genai import Client
class AudioDescription(BaseModel):
transcript: str
summary: str
speakers: list[str]
key_points: list[str]
url = "https://raw.githubusercontent.com/instructor-ai/instructor/main/tests/assets/gettysburg.wav"
client = instructor.from_genai(Client())
response = client.chat.completions.create(
model="gemini-2.0-flash",
response_model=AudioDescription,
messages=[
{
"role": "user",
"content": [
"Please transcribe and analyze this audio:",
# Multiple loading options:
Audio.from_url(url),
# Option 2: Local file
# Audio.from_path("path/to/local/audio.mp3")
],
},
],
)
print(response)
# > transcript='Four score and seven years ago our fathers..."]
PDF¶
Instructor 使得使用 Gemini 的新模型分析和提取 PDF 中的语义信息变得容易。
让我们看看下面使用上述示例 PDF 的例子,我们将使用 `from_url` 方法加载它。通过这种我们将原始字节传递给 gemini 本身的集成,我们也支持使用 `PDFWithGenaiFile` 类来使用 Files API。
请注意,使用此方法,我们也支持使用 `from_path` 和 `from_base64` 类方法加载本地文件和 base64 字符串。
from instructor.multimodal import PDF
from pydantic import BaseModel
import instructor
from google.genai import Client
class Receipt(BaseModel):
total: int
items: list[str]
client = instructor.from_genai(Client())
url = "https://raw.githubusercontent.com/instructor-ai/instructor/main/tests/assets/invoice.pdf"
# Multiple ways to load an PDF:
response = client.chat.completions.create(
model="gemini-2.0-flash",
response_model=Receipt,
messages=[
{
"role": "user",
"content": [
"Extract out the total and line items from the invoice",
# Option 1: Direct URL
PDF.from_url(url),
# Option 2: Local file
# PDF.from_path("path/to/local/invoice.pdf"),
# Option 3: Base64 string
# PDF.from_base64("base64_encoded_string_here")
# Option 4: Autodetect
# PDF.autodetect(<url|path|base64>)
],
},
],
)
print(response)
# > Receipt(total=220, items=['English Tea', 'Tofu'])
我们也支持将 PDF 与 Gemini `Files` API 结合使用,通过 `PDFWithGenaiFile` 类,您可以使用现有已上传的文件或本地文件。
请注意,`PdfWithGenaiFile.from_new_genai_file` 操作是阻塞的,您可以设置超时和重试延迟,在我们等待上传注册为完成时我们将调用这些设置。
PDFWithGenaiFile.from_new_genai_file(
"./invoice.pdf",
retry_delay=1, # Time to wait before checking if file is ready to use
max_retries=20 # Number of times to check before throwing an error
),
这使您更容易使用 Gemini files API。您可以在正常的聊天完成中使用它,如下所示。
from instructor.multimodal import PDFWithGenaiFile
from pydantic import BaseModel
import instructor
from google.genai import Client
class Receipt(BaseModel):
total: int
items: list[str]
client = instructor.from_genai(Client())
url = "https://raw.githubusercontent.com/instructor-ai/instructor/main/tests/assets/invoice.pdf"
# Multiple ways to load an PDF:
response = client.chat.completions.create(
model="gemini-2.0-flash",
response_model=Receipt,
messages=[
{
"role": "user",
"content": [
"Extract out the total and line items from the invoice",
# Option 1: Direct URL
PDFWithGenaiFile.from_new_genai_file("./invoice.pdf"),
# Option 2 : Existing Genai File
# PDFWithGenaiFile.from_existing_genai_file("invoice.pdf"),
],
},
],
)
print(response)
如果您希望对使用的文件进行更精细的控制,您也可以直接使用 `Files` API,如下所示。
使用文件¶
我们的 API 集成也支持使用文件
from google import genai
import instructor
from pydantic import BaseModel
class Summary(BaseModel):
summary: str
client = genai.Client()
client = instructor.from_genai(client, mode=instructor.Mode.GENAI_TOOLS)
file1 = client.files.upload(
file="./gettysburg.wav",
)
# As a parameter
response = client.chat.completions.create(
model="gemini-2.0-flash-001",
system="Summarise the audio file.",
messages=[
file1,
],
response_model=Summary,
)
print(response)
# > summary="Abraham Lincoln's Gettysburg Address commences by stating that 87 years prior, the founding fathers created a new nation based on liberty and equality. It goes on to say that the Civil War is testing whether a nation so conceived can survive."
流式响应¶
注意:流式功能目前仅在使用 `Mode.GENAI_STRUCTURED_OUTPUTS` 模式与 Gemini 模型一起使用时可用。目前,像 `tools` 这样的其他模式不支持流式处理。
流式处理允许您增量处理响应,而不是等待完整结果。这对于使 UI 更改感觉即时和响应迅速非常有用。
部分流式处理¶
在对象生成时接收完整、经过验证的对象流
from pydantic import BaseModel
import instructor
from google import genai
client = instructor.from_genai(
genai.Client(), mode=instructor.Mode.GENAI_STRUCTURED_OUTPUTS
)
class Person(BaseModel):
name: str
age: int
class PersonList(BaseModel):
people: list[Person]
stream = client.chat.completions.create_partial(
model="gemini-2.0-flash-001",
system="You are a helpful assistant. You must return a function call with the schema provided.",
messages=[
{
"role": "user",
"content": "Ivan is 20 years old, Jason is 25 years old, and John is 30 years old",
}
],
response_model=PersonList,
)
for extraction in stream:
print(extraction)
# > people=[PartialPerson(name='Ivan', age=None)]
# > people=[PartialPerson(name='Ivan', age=20), PartialPerson(name='Jason', age=25), PartialPerson(name='John', age=None)]
# > people=[PartialPerson(name='Ivan', age=20), PartialPerson(name='Jason', age=25), PartialPerson(name='John', age=30)]
异步支持¶
Instructor 为 genai SDK 提供了完整的异步支持,允许您在异步应用程序中进行非阻塞请求。
import asyncio
import instructor
from google import genai
from pydantic import BaseModel
class User(BaseModel):
name: str
age: int
async def extract_user():
client = genai.Client()
client = instructor.from_genai(
client, mode=instructor.Mode.GENAI_TOOLS, use_async=True
)
response = await client.chat.completions.create(
model="gemini-2.0-flash-001",
messages=[{"role": "user", "content": "Extract: Jason is 25 years old"}],
response_model=User,
)
return response
print(asyncio.run(extract_user()))
#> name = Jason age= 25