跳到内容

简单对象提取

本指南介绍如何从文本中提取具有定义字段的简单对象 - 这是结构化数据提取中最常见的模式。

基本示例

from pydantic import BaseModel
import instructor
from openai import OpenAI

# Define the structure you want to extract
class Person(BaseModel):
    name: str
    age: int
    occupation: str

# Extract the structured data
client = instructor.from_openai(OpenAI())
person = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "John Smith is a 35-year-old software engineer."}
    ],
    response_model=Person
)

print(f"Name: {person.name}")
print(f"Age: {person.age}")
print(f"Occupation: {person.occupation}")
┌───────────────┐            ┌───────────────┐
│ Define Model  │            │ Extracted     │
│ name: str     │  Extract   │ name: "John"  │
│ age: int      │ ─────────> │ age: 35       │
│ occupation: str│            │ occupation:   │
└───────────────┘            │ "software..." │
                             └───────────────┘

使用字段描述

添加描述有助于模型理解需要提取什么

from pydantic import BaseModel, Field

class Book(BaseModel):
    title: str = Field(description="The full title of the book")
    author: str = Field(description="The author's full name")
    publication_year: int = Field(description="The year the book was published")

字段描述就像提取过程的指令一样。

处理可选字段

有时文本不包含所有信息

from typing import Optional
from pydantic import BaseModel

class MovieReview(BaseModel):
    title: str
    director: Optional[str] = None  # Optional field
    rating: float

通过使用 Optional 并提供默认值,即使字段缺失也不会导致错误。

添加简单验证

你可以添加基本的验证规则

from pydantic import BaseModel, Field

class Product(BaseModel):
    name: str
    price: float = Field(gt=0, description="The product price in USD")
    in_stock: bool

此示例确保 price 必须大于零。

实际示例

这是一个更实际的示例

from pydantic import BaseModel
from typing import Optional

class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: str

class ContactInfo(BaseModel):
    name: str
    email: str
    phone: Optional[str] = None
    address: Optional[Address] = None

# Extract structured data
client = instructor.from_openai(OpenAI())
contact = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": """
        Contact information:
        Name: Sarah Johnson
        Email: sarah.j@example.com
        Phone: (555) 123-4567
        Address: 123 Main St, Boston, MA 02108
        """}
    ],
    response_model=ContactInfo
)

print(f"Name: {contact.name}")
print(f"Email: {contact.email}")

下一步