Instructor 入门指南¶
本指南将引导你了解如何使用 Instructor 从语言模型中提取结构化数据的基础知识。学完本指南,你将了解如何
- 安装并设置 Instructor
- 提取基本的结构化数据
- 处理验证和错误
- 处理流式响应
- 使用不同的 LLM 提供商
安装¶
首先,安装 Instructor
若要使用特定的提供商,请安装相应的附加包
# For OpenAI (included by default)
pip install instructor
# For Anthropic
pip install "instructor[anthropic]"
# For other providers
pip install "instructor[google-generativeai]" # For Google/Gemini
pip install "instructor[vertexai]" # For Vertex AI
pip install "instructor[cohere]" # For Cohere
pip install "instructor[litellm]" # For LiteLLM (multiple providers)
pip install "instructor[mistralai]" # For Mistral
设置环境¶
将你的 API 密钥设置为环境变量
# For OpenAI
export OPENAI_API_KEY=your_openai_api_key
# For Anthropic
export ANTHROPIC_API_KEY=your_anthropic_api_key
# For other providers, set relevant API keys
你的第一个结构化输出¶
让我们从一个使用 OpenAI 的简单示例开始
import instructor
from openai import OpenAI
from pydantic import BaseModel
# Define your output structure
class UserInfo(BaseModel):
name: str
age: int
# Create an instructor-patched client
client = instructor.from_openai(OpenAI())
# Extract structured data
user_info = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=UserInfo,
messages=[
{"role": "user", "content": "John Doe is 30 years old."}
],
)
print(f"Name: {user_info.name}, Age: {user_info.age}")
# Output: Name: John Doe, Age: 30
此示例演示了核心工作流程:1. 为你的输出结构定义一个 Pydantic 模型 2. 使用 Instructor 修补你的 LLM 客户端 3. 使用 response_model
参数请求结构化输出
验证和错误处理¶
Instructor 利用 Pydantic 的验证功能来确保你的数据满足要求
from pydantic import BaseModel, Field, field_validator
class User(BaseModel):
name: str
age: int = Field(gt=0, lt=120) # Age must be between 0 and 120
@field_validator('name')
def name_must_have_space(cls, v):
if ' ' not in v:
raise ValueError('Name must include first and last name')
return v
# This will make the LLM retry if validation fails
user = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=User,
messages=[
{"role": "user", "content": "Extract: Tom is 25 years old."}
],
)
处理复杂模型¶
Instructor 可以无缝地处理嵌套的 Pydantic 模型
from pydantic import BaseModel
from typing import List
class Address(BaseModel):
street: str
city: str
state: str
zip_code: str
class Person(BaseModel):
name: str
age: int
addresses: List[Address]
person = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=Person,
messages=[
{"role": "user", "content": """
Extract: John Smith is 35 years old.
He has homes at 123 Main St, Springfield, IL 62704 and
456 Oak Ave, Chicago, IL 60601.
"""}
],
)
流式响应¶
对于更大的响应或更好的用户体验,请使用流式传输
from instructor import Partial
# Stream the response as it's being generated
stream = client.chat.completions.create_partial(
model="gpt-3.5-turbo",
response_model=Person,
messages=[
{"role": "user", "content": "Extract a detailed person profile for John Smith, 35, who lives in Chicago and Springfield."}
],
)
for partial in stream:
# This will incrementally show the response being built
print(partial)
使用不同的提供商¶
Instructor 支持多种 LLM 提供商。以下是如何使用 Anthropic 的示例
import instructor
from anthropic import Anthropic
from pydantic import BaseModel
class UserInfo(BaseModel):
name: str
age: int
# Create an instructor-patched Anthropic client
client = instructor.from_anthropic(Anthropic())
user_info = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=1024,
response_model=UserInfo,
messages=[
{"role": "user", "content": "John Doe is 30 years old."}
],
)
print(f"Name: {user_info.name}, Age: {user_info.age}")
下一步¶
现在你已经掌握了基础知识,接下来可以参考以下步骤
- 了解不同 LLM 提供商的模式设置
- 探索高级验证以确保数据质量
- 查阅实用技巧示例了解实际应用
- 了解如何使用钩子 (hooks) 进行监控和调试
有关任何主题的更多详细信息,请访问概念部分。
如果你有疑问或需要帮助,请加入我们的Discord 社区或查看 GitHub 仓库。