跳到内容

简单嵌套结构

本指南解释了如何使用 Instructor 提取嵌套的结构化数据。嵌套结构允许您表示复杂的分层数据关系。

理解嵌套结构

嵌套结构是包含其他对象作为字段的对象。它们对于表示以下内容非常有用:

  1. 父子关系
  2. 具有子组件的复杂实体
  3. 分层数据
  4. 属于同一组的相关数据

基本嵌套结构示例

下面是一个提取嵌套结构的简单示例:

from pydantic import BaseModel, Field
import instructor
from openai import OpenAI
from typing import List, Optional

# Initialize the client
client = instructor.from_openai(OpenAI())

# Define nested models
class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: str

class Person(BaseModel):
    name: str
    age: int
    address: Address  # Nested structure

# Extract the nested data
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": """
        John Smith is 35 years old.
        He lives at 123 Main Street, Boston, MA 02108.
        """}
    ],
    response_model=Person
)

# Access the nested data
print(f"Name: {response.name}")
print(f"Age: {response.age}")
print(f"Address: {response.address.street}, {response.address.city}, " 
      f"{response.address.state} {response.address.zip_code}")

多层嵌套

您可以为更复杂的结构使用多层嵌套

from pydantic import BaseModel, Field
import instructor
from openai import OpenAI
from typing import List, Optional

client = instructor.from_openai(OpenAI())

class EmployeeDetails(BaseModel):
    department: str
    position: str
    start_date: str

class ContactInfo(BaseModel):
    phone: str
    email: str

class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: str

class Person(BaseModel):
    name: str
    age: int
    contact: ContactInfo  # First level nesting
    address: Address      # First level nesting
    employment: Optional[EmployeeDetails] = None  # Optional nested structure

# Extract deeply nested data
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": """
        Employee Profile:
        Name: Jane Doe
        Age: 32
        Phone: (555) 123-4567
        Email: jane.doe@example.com
        Address: 456 Oak Avenue, Chicago, IL 60601
        Department: Engineering
        Position: Senior Developer
        Start Date: 2021-03-15
        """}
    ],
    response_model=Person
)

嵌套列表

您可以将嵌套与列表结合使用来表示复杂的集合

from pydantic import BaseModel, Field
import instructor
from openai import OpenAI
from typing import List

client = instructor.from_openai(OpenAI())

class Ingredient(BaseModel):
    name: str
    amount: str
    unit: str

class Recipe(BaseModel):
    title: str
    description: str
    ingredients: List[Ingredient]  # Nested list of ingredients
    steps: List[str]  # List of strings

# Extract nested list data
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": """
        Recipe: Chocolate Chip Cookies

        Description: Classic homemade chocolate chip cookies that are soft in the middle and crispy on the edges.

        Ingredients:
        - 2 1/4 cups all-purpose flour
        - 1 teaspoon baking soda
        - 1 teaspoon salt
        - 1 cup butter
        - 3/4 cup white sugar
        - 3/4 cup brown sugar
        - 2 eggs
        - 2 teaspoons vanilla extract
        - 2 cups chocolate chips

        Instructions:
        1. Preheat oven to 375°F (190°C)
        2. Mix flour, baking soda, and salt
        3. Cream butter and sugars, then add eggs and vanilla
        4. Gradually add dry ingredients
        5. Stir in chocolate chips
        6. Drop by rounded tablespoons onto ungreased baking sheets
        7. Bake for 9 to 11 minutes or until golden brown
        8. Cool on wire racks
        """}
    ],
    response_model=Recipe
)

有关使用列表的更多信息,请参阅列表提取指南。

处理可选的嵌套字段

有时嵌套结构的某些部分可能缺失。使用 Optional 来处理这种情况

from pydantic import BaseModel, Field
import instructor
from openai import OpenAI
from typing import Optional

client = instructor.from_openai(OpenAI())

class SocialMedia(BaseModel):
    twitter: Optional[str] = None
    linkedin: Optional[str] = None
    instagram: Optional[str] = None

class ContactInfo(BaseModel):
    email: str
    phone: Optional[str] = None
    social: Optional[SocialMedia] = None  # Optional nested structure

class Person(BaseModel):
    name: str
    contact: ContactInfo

有关可选字段的更多信息,请参阅可选字段指南。

嵌套结构验证

您可以在任何层级为嵌套结构添加验证

from pydantic import BaseModel, Field, field_validator, model_validator
import instructor
from openai import OpenAI
import re

client = instructor.from_openai(OpenAI())

class EmailContact(BaseModel):
    email: str

    @field_validator('email')
    @classmethod
    def validate_email(cls, v):
        pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
        if not re.match(pattern, v):
            raise ValueError("Invalid email format")
        return v

class Customer(BaseModel):
    name: str
    contact: EmailContact  # Nested structure with its own validation

    @model_validator(mode='after')
    def validate_name_email_match(self):
        name_part = self.name.lower().split()[0]
        if name_part not in self.contact.email.lower():
            print(f"Warning: Email {self.contact.email} may not match name {self.name}")
        return self

有关验证的更多信息,请参阅字段验证验证基础

使用递归结构

对于更复杂的分层数据,您可以使用递归结构

from typing import List, Optional
from pydantic import BaseModel, Field
import instructor
from openai import OpenAI

client = instructor.from_openai(OpenAI())

class Comment(BaseModel):
    text: str
    author: str
    replies: List["Comment"] = []  # Recursive structure

# Update the Comment class reference for Pydantic
Comment.model_rebuild()

class Post(BaseModel):
    title: str
    content: str
    author: str
    comments: List[Comment] = []

# Extract recursive nested data
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": """
        Blog Post: "Python Tips and Tricks"
        Author: John Smith
        Content: Here are some helpful Python tips for beginners...

        Comments:
        1. Alice: "Great post! Very helpful."
           - Bob: "I agree, I learned a lot."
             - Alice: "Bob, did you try the last example?"
           - Charlie: "Thanks for sharing this."
        2. David: "Could you explain the second tip more?"
           - John: "Sure, I'll add more details."
        """}
    ],
    response_model=Post
)

有关更高级的递归结构,请参阅递归结构指南。

真实世界示例:组织结构

下面是一个更完整的提取组织结构的示例

from typing import List, Optional
from pydantic import BaseModel, Field
import instructor
from openai import OpenAI

client = instructor.from_openai(OpenAI())

class Employee(BaseModel):
    name: str
    title: str

class Department(BaseModel):
    name: str
    head: Employee
    employees: List[Employee]
    sub_departments: List["Department"] = []

# Update for Pydantic's recursive model support
Department.model_rebuild()

class Organization(BaseModel):
    name: str
    ceo: Employee
    departments: List[Department]

# Extract organization structure
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": """
        Acme Corporation
        CEO: Jane Smith, Chief Executive Officer

        Departments:

        1. Engineering
           Head: Bob Johnson, CTO
           Employees:
           - Sarah Lee, Senior Engineer
           - Tom Brown, Software Developer

           Sub-departments:
           - Frontend Team
             Head: Lisa Wang, Frontend Lead
             Employees:
             - Mike Chen, UI Developer
             - Ana Garcia, UX Designer

           - Backend Team
             Head: David Kim, Backend Lead
             Employees:
             - James Wright, Database Engineer
             - Rachel Patel, API Developer

        2. Marketing
           Head: Michael Davis, CMO
           Employees:
           - Jennifer Miller, Marketing Specialist
           - Robert Chen, Content Creator
        """}
    ],
    response_model=Organization
)

有关组织结构的更多信息,请参阅依赖树指南。

下一步