在 Instructor 中进行验证¶
本指南涵盖了在使用 Instructor 进行结构化输出时的验证概念和最佳实践。
概述¶
Instructor 中的验证确保语言模型的输出与您期望的模式匹配。这对于以下方面至关重要:- 数据一致性 - 错误处理 - 类型安全 - 业务逻辑强制执行
验证流程¶
下图展示了 Instructor 中的验证工作原理
flowchart TD
A[Define Pydantic Model] --> B[Send Request to LLM]
B --> C[LLM Generates Response]
C --> D{Validate Response}
D -->|Valid| E[Return Pydantic Object]
D -->|Invalid| F{Auto-Retry Enabled?}
F -->|Yes| G[Send Error Context to LLM]
F -->|No| H[Raise ValidationError]
G --> I[LLM Generates New Response]
I --> J{Validate Again}
J -->|Valid| E
J -->|Invalid| K{Max Retries Reached?}
K -->|No| G
K -->|Yes| H
classDef success fill:#d4edda,stroke:#c3e6cb,color:#155724;
classDef error fill:#f8d7da,stroke:#f5c6cb,color:#721c24;
classDef process fill:#e2f0fb,stroke:#b8daff,color:#004085;
classDef decision fill:#fff3cd,stroke:#ffeeba,color:#856404;
class A,B,C,G,I process
class D,F,J,K decision
class E success
class H error
此过程确保 LLM 输出符合您定义的模式,并内置重试机制来处理验证失败。
基本验证¶
Instructor 使用 Pydantic 进行验证,Pydantic 提供:1. 类型检查 2. 数据强制转换 3. 自定义验证器 4. 字段约束
# Standard library imports
from typing import List
# Third-party imports
from pydantic import BaseModel, Field, field_validator
class User(BaseModel):
"""Model representing a user with validation rules."""
name: str = Field(
..., min_length=2, description="User's full name, minimum 2 characters"
)
age: int = Field(..., ge=0, le=150, description="User's age between 0 and 150")
emails: List[str] = Field(description="List of user's email addresses")
@field_validator('emails')
@classmethod
def validate_emails(cls, v):
"""Validate that all email addresses contain an @ symbol."""
if not all('@' in email for email in v):
raise ValueError('Invalid email format')
return v
验证策略¶
1. 字段验证¶
使用 Field() 进行基本约束
# Third-party imports
from pydantic import BaseModel, Field
class Product(BaseModel):
"""Model representing a product with field validation constraints."""
name: str = Field(
...,
min_length=1,
max_length=100,
description="Product name between 1-100 characters",
)
price: float = Field(..., gt=0, description="Product price, must be greater than 0")
quantity: int = Field(
..., ge=0, description="Available quantity, must be 0 or greater"
)
2. 自定义验证器¶
使用 @validator 进行复杂验证
# Standard library imports
from typing import List
# Third-party imports
from pydantic import BaseModel, Field, field_validator
class Order(BaseModel):
"""Model representing an order with custom validation logic."""
items: List[str] = Field(description="List of item names in the order")
total: float = Field(description="Total order amount")
@field_validator('total')
@classmethod
def validate_total(cls, v):
"""Validate that the total amount is not negative."""
if v < 0:
raise ValueError('Total cannot be negative')
return v
3. 预验证钩子¶
使用预验证钩子进行数据转换
# Third-party imports
from pydantic import BaseModel, Field, validator
class UserProfile(BaseModel):
"""Model representing a user profile with pre-validation transformation."""
username: str = Field(description="User's unique username")
@validator('username', pre=True)
def lowercase_username(cls, v):
"""Transform username to lowercase before validation."""
return v.lower()
错误处理¶
Instructor 为验证失败提供了强大的错误处理
# Standard library imports
import os
# Third-party imports
import instructor
from openai import OpenAI
from pydantic import BaseModel, Field, validator
# Set up environment (typically handled before script execution)
# os.environ["OPENAI_API_KEY"] = "your-api-key" # Uncomment and replace with your API key if not set
# Define model with validation
class User(BaseModel):
"""Model representing a user with validation rules."""
name: str = Field(description="User's full name")
age: int = Field(description="User's age in years")
@validator('age')
def validate_age(cls, v):
"""Validate that age is a positive number."""
if v < 0:
raise ValueError("Age cannot be negative")
return v
# Initialize client with explicit mode
client = instructor.from_openai(
OpenAI(api_key=os.environ.get("OPENAI_API_KEY")), mode=instructor.Mode.JSON
)
try:
# Attempt to extract with validation
user = client.chat.completions.create(
model="gpt-4o", # Use latest stable model
response_model=User,
messages=[
{
"role": "system",
"content": "Extract structured user information from the text.",
},
{"role": "user", "content": "Extract: John Doe, age: -5"},
],
)
print(user.model_dump_json(indent=2))
"""
{
"name": "John Doe",
"age": 0
}
"""
except instructor.exceptions.InstructorValidationError as e:
print(f"Validation error: {e}")
# Expected output:
# Validation error: 1 validation error for User
# age
# Age cannot be negative (type=value_error)
except Exception as e:
print(f"Other error: {e}")
最佳实践¶
- 从简单开始:在添加复杂规则之前,先从基本类型验证开始
- 使用类型提示:始终指定类型以提高代码清晰度
- 文档化约束:为 Field() 定义添加清晰的描述
- 处理错误:为验证失败实施适当的错误处理
- 测试边缘情况:验证在面对意外输入时是否有效
常见模式¶
可选字段¶
# Standard library imports
from typing import Optional
# Third-party imports
from pydantic import BaseModel, Field
class Profile(BaseModel):
"""Model representing a user profile with optional fields."""
name: str = Field(description="User's full name")
bio: Optional[str] = Field(None, description="Optional user biography")
嵌套验证¶
# Standard library imports
from typing import List
# Third-party imports
from pydantic import BaseModel, Field
class Address(BaseModel):
"""Model representing a physical address."""
street: str = Field(description="Street address including number")
city: str = Field(description="City name")
country: str = Field(description="Country name")
class User(BaseModel):
"""Model representing a user with nested address validation."""
name: str = Field(description="User's full name")
addresses: List[Address] = Field(description="List of user's addresses")
复杂验证¶
# Standard library imports
from datetime import datetime
# Third-party imports
from pydantic import BaseModel, Field, validator
class Transaction(BaseModel):
"""Model representing a financial transaction with complex validation."""
amount: float = Field(description="Transaction amount")
currency: str = Field(description="Three-letter currency code (USD, EUR, GBP)")
timestamp: datetime = Field(description="Transaction timestamp")
@validator('currency')
def validate_currency(cls, v):
"""Validate that currency is one of the supported codes."""
valid_currencies = ['USD', 'EUR', 'GBP']
if v not in valid_currencies:
raise ValueError(f'Currency must be one of {valid_currencies}')
return v
@validator('timestamp')
def validate_timestamp(cls, v):
"""Validate that timestamp is not in the future."""
if v > datetime.now():
raise ValueError("Transaction timestamp cannot be in the future")
return v
相关资源¶
更新与兼容性¶
- 支持所有受支持的 LLM 提供商
- 兼容最新的 Pydantic 版本
- 定期更新以支持新的验证功能