使用 Azure OpenAI 获取结构化输出¶

本指南演示了如何将 Azure OpenAI 与 instructor 一起用于结构化输出。Azure OpenAI 提供与 OpenAI 相同的强大模型，但通过 Microsoft Azure 具有企业级安全和合规性功能。

安装¶

我们可以使用与 OpenAI 相同的安装，因为默认的 openai 客户端随附 AzureOpenAI 客户端。

首先，安装所需的依赖项

pip install instructor

接下来，确保您已在 Azure 账户中启用 Azure OpenAI 并已为您要使用的模型进行了部署。这里有一个入门指南

完成上述操作后，您将拥有一个终结点和一个 API 密钥，可用于配置客户端。

instructor.exceptions.InstructorRetryException: Error code: 401 - {'statusCode': 401, 'message': 'Unauthorized. Access token is missing, invalid, audience is incorrect (https://cognitiveservices.azure.com), or have expired.'}

如果您看到如上所示的错误，请确保您已在客户端中设置了正确的终结点和 API 密钥。

认证¶

要使用 Azure OpenAI，您需要

Azure OpenAI 终结点
API 密钥
部署名称

import os
from openai import AzureOpenAI
import instructor

# Configure Azure OpenAI client
client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version="2024-02-01",
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"]
)

# Patch the client with instructor
client = instructor.from_openai(client)

基本用法¶

这是一个使用 Pydantic 模型进行简单示例

import os
import instructor
from openai import AzureOpenAI
from pydantic import BaseModel

client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version="2024-02-01",
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
)
client = instructor.from_openai(client)


class User(BaseModel):
    name: str
    age: int


# Synchronous usage
user = client.chat.completions.create(
    model="gpt-4o-mini",  # Your deployment name
    messages=[{"role": "user", "content": "John is 30 years old"}],
    response_model=User,
)

print(user)
# > name='John' age=30

异步实现¶

Azure OpenAI 支持异步操作

import os
import instructor
import asyncio
from openai import AsyncAzureOpenAI
from pydantic import BaseModel

client = AsyncAzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version="2024-02-15-preview",
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
)
client = instructor.from_openai(client)


class User(BaseModel):
    name: str
    age: int


async def get_user_async():
    return await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "John is 30 years old"}],
        response_model=User,
    )


# Run async function
user = asyncio.run(get_user_async())
print(user)
# > name='John' age=30

嵌套模型¶

Azure OpenAI 处理复杂的嵌套结构

import os
import instructor
from openai import AzureOpenAI
from pydantic import BaseModel

client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version="2024-02-01",
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
)
client = instructor.from_openai(client)


class Address(BaseModel):
    street: str
    city: str
    country: str


class UserWithAddress(BaseModel):
    name: str
    age: int
    addresses: list[Address]


resp = client.chat.completions.create(
    model="gpt-4o-mini",  # Your deployment name
    messages=[
        {
            "role": "user",
            "content": """
        John is 30 years old and has two addresses:
        1. 123 Main St, New York, USA
        2. 456 High St, London, UK
        """,
        }
    ],
    response_model=UserWithAddress,
)

print(resp)
# {
#     'name': 'John',
#     'age': 30,
#     'addresses': [
#         {
#             'street': '123 Main St',
#             'city': 'New York',
#             'country': 'USA'
#         },
#         {
#             'street': '456 High St',
#             'city': 'London',
#             'country': 'UK'
#         }
#     ]
# }

流式传输支持¶

Instructor 提供了两种主要方式来流式传输响应

可迭代对象（Iterables）：当您想流式传输同一类型的对象列表时（例如，使用结构化输出来提取多个用户），这些很有用。
Partial 流式传输：当您想流式传输单个对象并希望在响应到达时立即开始处理时，这很有用。

Partial¶

您可以使用我们的 create_partial 方法流式传输单个对象。请注意，在流式传输对象时，不应在响应模型中声明验证器，因为它会中断流式传输过程。

from instructor import from_openai
from openai import AzureOpenAI
from pydantic import BaseModel
import os

client = from_openai(
    AzureOpenAI(
        api_key=os.environ["AZURE_OPENAI_API_KEY"],
        api_version="2024-02-01",
        azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    )
)


class User(BaseModel):
    name: str
    age: int
    bio: str


# Stream partial objects as they're generated
user = client.chat.completions.create_partial(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Create a user profile for Jason, age 25"},
    ],
    response_model=User,
)

for user_partial in user:
    print(user_partial)

# > name='Jason' age=None bio='None'
# > name='Jason' age=25 bio='A tech'
# > name='Jason' age=25 bio='A tech enthusiast'
# > name='Jason' age=25 bio='A tech enthusiast who loves coding, gaming, and exploring new'
# > name='Jason' age=25 bio='A tech enthusiast who loves coding, gaming, and exploring new technologies'

可迭代响应¶

from instructor import from_openai
from openai import AzureOpenAI
from pydantic import BaseModel
import os

client = from_openai(
    AzureOpenAI(
        api_key=os.environ["AZURE_OPENAI_API_KEY"],
        api_version="2024-02-01",
        azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    )
)


class User(BaseModel):
    name: str
    age: int


# Extract multiple users from text
users = client.chat.completions.create_iterable(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": """
            Extract users:
            1. Jason is 25 years old
            2. Sarah is 30 years old
            3. Mike is 28 years old
        """,
        },
    ],
    response_model=User,
)

for user in users:
    print(user)
#> name='Jason' age=25
# > name='Sarah' age=30
# > name='Mike' age=28

Instructor 模式¶

我们提供了几种模式，以便轻松使用 OpenAI 支持的不同响应模型

instructor.Mode.TOOLS : 这使用工具调用 API 将结构化输出返回给客户端
instructor.Mode.JSON : 这通过使用OpenAI 的 JSON 模式强制模型返回 JSON。
instructor.Mode.FUNCTIONS : 这使用 OpenAI 的函数调用 API 返回结构化输出，将来将被弃用。
instructor.Mode.PARALLEL_TOOLS : 这使用并行工具调用 API 将结构化输出返回给客户端。这允许模型在单个响应中生成多个调用。
instructor.Mode.MD_JSON : 这对 OpenAI 聊天完成 API 进行简单的调用，并将原始响应解析为 JSON。
instructor.Mode.TOOLS_STRICT : 这使用新的 OpenAI 结构化输出 API 通过受限语法采样将结构化输出返回给客户端。这将用户限制在 JSON 模式的一个子集内。
instructor.Mode.JSON_O1 : 这是 O1 模型的一种模式。我们创建了一个新模式，因为 O1 不支持任何系统消息、工具调用或流式传输，因此您需要使用此模式才能将 Instructor 与 O1 一起使用。

总的来说，我们建议使用 Mode.Tools，因为它最灵活且面向未来。它具有最广泛的功能集，您可以在其中指定您的 schema，并且可以显著简化工作。