跳到内容

使用 Google 的 genai SDK 实现结构化输出

推荐的 SDK

`genai` SDK 是 Google 推荐的用于使用 Gemini 模型的 Python 客户端。它为 Gemini API 和 Vertex AI 提供了统一的接口。有关详细的设置说明,包括如何与 Vertex AI 一起使用,请参阅 GenAI SDK 的官方 Google AI 文档

本指南演示了如何将 Instructor 与 Google 的 `genai` SDK 一起使用,以从 Gemini 模型中提取结构化数据。

Gemini 目前有两种模式

  • `Mode.GENAI_TOOLS`:这在底层利用函数调用,并返回结构化响应
  • `Mode.GENAI_STRUCTURED_OUTPUTS`:这为 Gemini 提供了一个 JSON Schema,Gemini 将使用它以结构化格式进行响应

安装

pip install "instructor[google-genai]"

基本用法

联合类型和可选类型

Gemini 在结构化输出和工具调用集成中不支持联合类型和可选类型。当我们检测到您的响应模型中存在这些类型时,我们目前会抛出错误。

开始使用 Instructor 和 genai SDK 非常简单。只需创建一个 Pydantic 模型来定义您的输出结构,修补 (patch) genai 客户端,然后使用 response_model 参数发出请求即可。

from google import genai
import instructor
from pydantic import BaseModel

# Define your Pydantic model
class User(BaseModel):
    name: str
    age: int

# Initialize and patch the client
client = genai.Client()
client = instructor.from_genai(client, mode=instructor.Mode.GENAI_TOOLS)

# Extract structured data
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    messages=[{"role": "user", "content": "Extract: Jason is 25 years old"}],
    response_model=User,
)

print(response)  # User(name='Jason', age=25)

消息格式

Genai 支持多种消息格式,Instructor 可以无缝地与所有这些格式配合使用。这种灵活性使您可以根据应用程序的需要使用最方便的格式。

from google import genai
import instructor
from pydantic import BaseModel
from google.genai import types

# Define your Pydantic model
class User(BaseModel):
    name: str
    age: int

# Initialize and patch the client
client = genai.Client()
client = instructor.from_genai(client, mode=instructor.Mode.GENAI_TOOLS)

# Single string (converted to user message)
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    messages="Jason is 25 years old",
    response_model=User,
)

print(response)
# > name='Jason' age=25

# Standard format
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    messages=[
        {"role": "user", "content": "Jason is 25 years old"}
    ],
    response_model=User,
)

print(response)
# > name='Jason' age=25

# Using genai's Content type
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    messages=[
        genai.types.Content(
            role="user",
            parts=[genai.types.Part.from_text(text="Jason is 25 years old")]
        )
    ],
    response_model=User,
)

print(response)
# > name='Jason' age=25

系统消息

系统消息有助于为模型设置上下文和指令。对于 Gemini 模型,您可以通过两种不同的方式提供系统消息。

from google import genai
import instructor
from pydantic import BaseModel


class User(BaseModel):
    name: str
    age: int


client = genai.Client()
client = instructor.from_genai(client, mode=instructor.Mode.GENAI_TOOLS)

# As a parameter
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    system="Jason is 25 years old",
    messages=[{"role": "user", "content": "You are a data extraction assistant"}],
    response_model=User,
)

print(response)
# > name='Jason' age=25

# Or as a message with role "system"
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    messages=[
        {"role": "system", "content": "Jason is 25 years old"},
        {"role": "user", "content": "You are a data extraction assistant"},
    ],
    response_model=User,
)

print(response)
# > name='Jason' age=25

模板变量

模板变量使得使用不同的值轻松重用提示成为可能。这对于动态内容或测试不同输入时特别有用。

from google import genai
import instructor
from pydantic import BaseModel
from google.genai import types


# Define your Pydantic model
class User(BaseModel):
    name: str
    age: int


# Initialize and patch the client
client = genai.Client()
client = instructor.from_genai(client, mode=instructor.Mode.GENAI_TOOLS)

# Single string (converted to user message)
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    messages=["{{name}} is {{ age }} years old"],
    response_model=User,
    context={
        "name": "Jason",
        "age": 25,
    },
)

print(response)
# > name='Jason' age=25

# Standard format
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    messages=[{"role": "user", "content": "{{ name }} is {{ age }} years old"}],
    response_model=User,
    context={
        "name": "Jason",
        "age": 25,
    },
)

print(response)
# > name='Jason' age=25

# Using genai's Content type
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    messages=[
        genai.types.Content(
            role="user",
            parts=[genai.types.Part.from_text(text="{{name}} is {{age}} years old")],
        )
    ],
    response_model=User,
    context={
        "name": "Jason",
        "age": 25,
    },
)

print(response)
# > name='Jason' age=25

验证和重试

当验证失败时,Instructor 可以自动重试请求,确保您获得格式正确的数据。这在强制执行特定数据要求时特别有用。

from typing import Annotated
from pydantic import AfterValidator, BaseModel
import instructor
from google import genai


def uppercase_validator(v: str) -> str:
    if v.islower():
        raise ValueError("Name must be ALL CAPS")
    return v


class UserDetail(BaseModel):
    name: Annotated[str, AfterValidator(uppercase_validator)]
    age: int


client = instructor.from_genai(genai.Client())

response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    messages=[{"role": "user", "content": "Extract: jason is 25 years old"}],
    response_model=UserDetail,
    max_retries=3,
)

print(response)  # UserDetail(name='JASON', age=25)

多模态能力

我们提供了一些不同的示例文件供您测试这些新功能。下面的所有示例都使用了这些文件。

  • (音频):葛底斯堡演说的原始录音:gettysburg.wav
  • (图像):一些蓝莓植物的图片 image.jpg
  • (PDF):一个包含虚假发票的示例 PDF 文件 invoice.pdf

Instructor 提供了一个统一的、与提供商无关的接口,用于处理图像、PDF 和音频文件等多模态输入。借助 Instructor 的多模态对象,您可以使用跨不同 AI 提供商(OpenAI、Anthropic、Mistral 等)工作的统一 API 轻松地从 URL、本地文件或 base64 字符串加载媒体。

Instructor 在后台处理所有特定于提供商的格式要求,确保您的代码随着提供商 API 的演进而保持整洁且面向未来。

让我们看看如何使用 Image、Audio 和 PDF 类。

图像处理

自动检测图像

为了方便处理图像,您可以使用 `autodetect_images` 参数启用自动图像转换。启用后,Instructor 将自动检测作为字符串提供的文件路径和 HTTP URL,并将其转换为 Google GenAI SDK 所需的适当格式。这使得处理图像变得无缝且直接。(参见下面的示例)

Instructor 使得使用 Gemini 系列模型分析和提取图像中的语义信息变得容易。点击此处检查您想要使用的模型是否具有视觉能力。

让我们看看下面使用上述示例图像的例子,我们将使用 `from_url` 方法加载它。

请注意,我们也支持使用 `from_path` 和 `from_base64` 类方法加载本地文件和 base64 字符串。

from instructor.multimodal import Image
from pydantic import BaseModel, Field
import instructor
from google.genai import Client


class ImageDescription(BaseModel):
    objects: list[str] = Field(..., description="The objects in the image")
    scene: str = Field(..., description="The scene of the image")
    colors: list[str] = Field(..., description="The colors in the image")


client = instructor.from_genai(Client())
url = "https://raw.githubusercontent.com/instructor-ai/instructor/main/tests/assets/image.jpg"
# Multiple ways to load an image:
response = client.chat.completions.create(
    model="gemini-2.0-flash",
    response_model=ImageDescription,
    messages=[
        {
            "role": "user",
            "content": [
                "What is in this image?",
                # Option 1: Direct URL with autodetection
                Image.from_url(url),
                # Option 2: Local file
                # Image.from_path("path/to/local/image.jpg")
                # Option 3: Base64 string
                # Image.from_base64("base64_encoded_string_here")
                # Option 4: Autodetect
                # Image.autodetect(<url|path|base64>)
            ],
        },
    ],
)

print(response)
# Example output:
# ImageDescription(
#     objects=['blueberries', 'leaves'],
#     scene='A blueberry bush with clusters of ripe blueberries and some unripe ones against a cloudy sky',
#     colors=['green', 'blue', 'purple', 'white']
# )

音频处理

Instructor 使得使用 Gemini 系列模型分析和提取音频文件中的语义信息变得容易。让我们看看下面使用上述示例音频文件的例子,我们将使用 `from_url` 方法加载它。

请注意,我们也支持使用 `from_path` 加载本地文件和 base64 字符串

from instructor.multimodal import Audio
from pydantic import BaseModel
import instructor
from google.genai import Client


class AudioDescription(BaseModel):
    transcript: str
    summary: str
    speakers: list[str]
    key_points: list[str]


url = "https://raw.githubusercontent.com/instructor-ai/instructor/main/tests/assets/gettysburg.wav"

client = instructor.from_genai(Client())

response = client.chat.completions.create(
    model="gemini-2.0-flash",
    response_model=AudioDescription,
    messages=[
        {
            "role": "user",
            "content": [
                "Please transcribe and analyze this audio:",
                # Multiple loading options:
                Audio.from_url(url),
                # Option 2: Local file
                # Audio.from_path("path/to/local/audio.mp3")
            ],
        },
    ],
)

print(response)
# > transcript='Four score and seven years ago our fathers..."]

PDF

Instructor 使得使用 Gemini 的新模型分析和提取 PDF 中的语义信息变得容易。

让我们看看下面使用上述示例 PDF 的例子,我们将使用 `from_url` 方法加载它。通过这种我们将原始字节传递给 gemini 本身的集成,我们也支持使用 `PDFWithGenaiFile` 类来使用 Files API。

请注意,使用此方法,我们也支持使用 `from_path` 和 `from_base64` 类方法加载本地文件和 base64 字符串。

from instructor.multimodal import PDF
from pydantic import BaseModel
import instructor
from google.genai import Client


class Receipt(BaseModel):
    total: int
    items: list[str]


client = instructor.from_genai(Client())
url = "https://raw.githubusercontent.com/instructor-ai/instructor/main/tests/assets/invoice.pdf"
# Multiple ways to load an PDF:
response = client.chat.completions.create(
    model="gemini-2.0-flash",
    response_model=Receipt,
    messages=[
        {
            "role": "user",
            "content": [
                "Extract out the total and line items from the invoice",
                # Option 1: Direct URL
                PDF.from_url(url),
                # Option 2: Local file
                # PDF.from_path("path/to/local/invoice.pdf"),
                # Option 3: Base64 string
                # PDF.from_base64("base64_encoded_string_here")
                # Option 4: Autodetect
                # PDF.autodetect(<url|path|base64>)
            ],
        },
    ],
)

print(response)
# > Receipt(total=220, items=['English Tea', 'Tofu'])

我们也支持将 PDF 与 Gemini `Files` API 结合使用,通过 `PDFWithGenaiFile` 类,您可以使用现有已上传的文件或本地文件。

请注意,`PdfWithGenaiFile.from_new_genai_file` 操作是阻塞的,您可以设置超时和重试延迟,在我们等待上传注册为完成时我们将调用这些设置。

PDFWithGenaiFile.from_new_genai_file(
    "./invoice.pdf",
    retry_delay=1,  # Time to wait before checking if file is ready to use
    max_retries=20 # Number of times to check before throwing an error
),

这使您更容易使用 Gemini files API。您可以在正常的聊天完成中使用它,如下所示。

from instructor.multimodal import PDFWithGenaiFile
from pydantic import BaseModel
import instructor
from google.genai import Client


class Receipt(BaseModel):
    total: int
    items: list[str]


client = instructor.from_genai(Client())
url = "https://raw.githubusercontent.com/instructor-ai/instructor/main/tests/assets/invoice.pdf"
# Multiple ways to load an PDF:
response = client.chat.completions.create(
    model="gemini-2.0-flash",
    response_model=Receipt,
    messages=[
        {
            "role": "user",
            "content": [
                "Extract out the total and line items from the invoice",
                # Option 1: Direct URL
                PDFWithGenaiFile.from_new_genai_file("./invoice.pdf"),

                # Option 2 : Existing Genai File
                # PDFWithGenaiFile.from_existing_genai_file("invoice.pdf"),
            ],
        },
    ],
)

print(response)

如果您希望对使用的文件进行更精细的控制,您也可以直接使用 `Files` API,如下所示。

使用文件

我们的 API 集成也支持使用文件

from google import genai
import instructor
from pydantic import BaseModel


class Summary(BaseModel):
    summary: str


client = genai.Client()
client = instructor.from_genai(client, mode=instructor.Mode.GENAI_TOOLS)

file1 = client.files.upload(
    file="./gettysburg.wav",
)

# As a parameter
response = client.chat.completions.create(
    model="gemini-2.0-flash-001",
    system="Summarise the audio file.",
    messages=[
        file1,
    ],
    response_model=Summary,
)

print(response)
# > summary="Abraham Lincoln's Gettysburg Address commences by stating that 87 years prior, the founding fathers created a new nation based on liberty and equality. It goes on to say that the Civil War is testing whether a nation so conceived can survive."

流式响应

注意:流式功能目前仅在使用 `Mode.GENAI_STRUCTURED_OUTPUTS` 模式与 Gemini 模型一起使用时可用。目前,像 `tools` 这样的其他模式不支持流式处理。

流式处理允许您增量处理响应,而不是等待完整结果。这对于使 UI 更改感觉即时和响应迅速非常有用。

部分流式处理

在对象生成时接收完整、经过验证的对象流

from pydantic import BaseModel
import instructor
from google import genai


client = instructor.from_genai(
    genai.Client(), mode=instructor.Mode.GENAI_STRUCTURED_OUTPUTS
)


class Person(BaseModel):
    name: str
    age: int


class PersonList(BaseModel):
    people: list[Person]


stream = client.chat.completions.create_partial(
    model="gemini-2.0-flash-001",
    system="You are a helpful assistant. You must return a function call with the schema provided.",
    messages=[
        {
            "role": "user",
            "content": "Ivan is 20 years old, Jason is 25 years old, and John is 30 years old",
        }
    ],
    response_model=PersonList,
)

for extraction in stream:
    print(extraction)
    # > people=[PartialPerson(name='Ivan', age=None)]
    # > people=[PartialPerson(name='Ivan', age=20), PartialPerson(name='Jason', age=25), PartialPerson(name='John', age=None)]
    # > people=[PartialPerson(name='Ivan', age=20), PartialPerson(name='Jason', age=25), PartialPerson(name='John', age=30)]

异步支持

Instructor 为 genai SDK 提供了完整的异步支持,允许您在异步应用程序中进行非阻塞请求。

import asyncio

import instructor
from google import genai
from pydantic import BaseModel


class User(BaseModel):
    name: str
    age: int


async def extract_user():
    client = genai.Client()
    client = instructor.from_genai(
        client, mode=instructor.Mode.GENAI_TOOLS, use_async=True
    )

    response = await client.chat.completions.create(
        model="gemini-2.0-flash-001",
        messages=[{"role": "user", "content": "Extract: Jason is 25 years old"}],
        response_model=User,
    )
    return response


print(asyncio.run(extract_user()))
#> name = Jason age= 25