简单嵌套结构¶
本指南解释了如何使用 Instructor 提取嵌套的结构化数据。嵌套结构允许您表示复杂的分层数据关系。
理解嵌套结构¶
嵌套结构是包含其他对象作为字段的对象。它们对于表示以下内容非常有用:
- 父子关系
- 具有子组件的复杂实体
- 分层数据
- 属于同一组的相关数据
基本嵌套结构示例¶
下面是一个提取嵌套结构的简单示例:
from pydantic import BaseModel, Field
import instructor
from openai import OpenAI
from typing import List, Optional
# Initialize the client
client = instructor.from_openai(OpenAI())
# Define nested models
class Address(BaseModel):
street: str
city: str
state: str
zip_code: str
class Person(BaseModel):
name: str
age: int
address: Address # Nested structure
# Extract the nested data
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": """
John Smith is 35 years old.
He lives at 123 Main Street, Boston, MA 02108.
"""}
],
response_model=Person
)
# Access the nested data
print(f"Name: {response.name}")
print(f"Age: {response.age}")
print(f"Address: {response.address.street}, {response.address.city}, "
f"{response.address.state} {response.address.zip_code}")
多层嵌套¶
您可以为更复杂的结构使用多层嵌套
from pydantic import BaseModel, Field
import instructor
from openai import OpenAI
from typing import List, Optional
client = instructor.from_openai(OpenAI())
class EmployeeDetails(BaseModel):
department: str
position: str
start_date: str
class ContactInfo(BaseModel):
phone: str
email: str
class Address(BaseModel):
street: str
city: str
state: str
zip_code: str
class Person(BaseModel):
name: str
age: int
contact: ContactInfo # First level nesting
address: Address # First level nesting
employment: Optional[EmployeeDetails] = None # Optional nested structure
# Extract deeply nested data
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": """
Employee Profile:
Name: Jane Doe
Age: 32
Phone: (555) 123-4567
Email: jane.doe@example.com
Address: 456 Oak Avenue, Chicago, IL 60601
Department: Engineering
Position: Senior Developer
Start Date: 2021-03-15
"""}
],
response_model=Person
)
嵌套列表¶
您可以将嵌套与列表结合使用来表示复杂的集合
from pydantic import BaseModel, Field
import instructor
from openai import OpenAI
from typing import List
client = instructor.from_openai(OpenAI())
class Ingredient(BaseModel):
name: str
amount: str
unit: str
class Recipe(BaseModel):
title: str
description: str
ingredients: List[Ingredient] # Nested list of ingredients
steps: List[str] # List of strings
# Extract nested list data
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "user", "content": """
Recipe: Chocolate Chip Cookies
Description: Classic homemade chocolate chip cookies that are soft in the middle and crispy on the edges.
Ingredients:
- 2 1/4 cups all-purpose flour
- 1 teaspoon baking soda
- 1 teaspoon salt
- 1 cup butter
- 3/4 cup white sugar
- 3/4 cup brown sugar
- 2 eggs
- 2 teaspoons vanilla extract
- 2 cups chocolate chips
Instructions:
1. Preheat oven to 375°F (190°C)
2. Mix flour, baking soda, and salt
3. Cream butter and sugars, then add eggs and vanilla
4. Gradually add dry ingredients
5. Stir in chocolate chips
6. Drop by rounded tablespoons onto ungreased baking sheets
7. Bake for 9 to 11 minutes or until golden brown
8. Cool on wire racks
"""}
],
response_model=Recipe
)
有关使用列表的更多信息,请参阅列表提取指南。
处理可选的嵌套字段¶
有时嵌套结构的某些部分可能缺失。使用 Optional 来处理这种情况
from pydantic import BaseModel, Field
import instructor
from openai import OpenAI
from typing import Optional
client = instructor.from_openai(OpenAI())
class SocialMedia(BaseModel):
twitter: Optional[str] = None
linkedin: Optional[str] = None
instagram: Optional[str] = None
class ContactInfo(BaseModel):
email: str
phone: Optional[str] = None
social: Optional[SocialMedia] = None # Optional nested structure
class Person(BaseModel):
name: str
contact: ContactInfo
有关可选字段的更多信息,请参阅可选字段指南。
嵌套结构验证¶
您可以在任何层级为嵌套结构添加验证
from pydantic import BaseModel, Field, field_validator, model_validator
import instructor
from openai import OpenAI
import re
client = instructor.from_openai(OpenAI())
class EmailContact(BaseModel):
email: str
@field_validator('email')
@classmethod
def validate_email(cls, v):
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
if not re.match(pattern, v):
raise ValueError("Invalid email format")
return v
class Customer(BaseModel):
name: str
contact: EmailContact # Nested structure with its own validation
@model_validator(mode='after')
def validate_name_email_match(self):
name_part = self.name.lower().split()[0]
if name_part not in self.contact.email.lower():
print(f"Warning: Email {self.contact.email} may not match name {self.name}")
return self
使用递归结构¶
对于更复杂的分层数据,您可以使用递归结构
from typing import List, Optional
from pydantic import BaseModel, Field
import instructor
from openai import OpenAI
client = instructor.from_openai(OpenAI())
class Comment(BaseModel):
text: str
author: str
replies: List["Comment"] = [] # Recursive structure
# Update the Comment class reference for Pydantic
Comment.model_rebuild()
class Post(BaseModel):
title: str
content: str
author: str
comments: List[Comment] = []
# Extract recursive nested data
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "user", "content": """
Blog Post: "Python Tips and Tricks"
Author: John Smith
Content: Here are some helpful Python tips for beginners...
Comments:
1. Alice: "Great post! Very helpful."
- Bob: "I agree, I learned a lot."
- Alice: "Bob, did you try the last example?"
- Charlie: "Thanks for sharing this."
2. David: "Could you explain the second tip more?"
- John: "Sure, I'll add more details."
"""}
],
response_model=Post
)
有关更高级的递归结构,请参阅递归结构指南。
真实世界示例:组织结构¶
下面是一个更完整的提取组织结构的示例
from typing import List, Optional
from pydantic import BaseModel, Field
import instructor
from openai import OpenAI
client = instructor.from_openai(OpenAI())
class Employee(BaseModel):
name: str
title: str
class Department(BaseModel):
name: str
head: Employee
employees: List[Employee]
sub_departments: List["Department"] = []
# Update for Pydantic's recursive model support
Department.model_rebuild()
class Organization(BaseModel):
name: str
ceo: Employee
departments: List[Department]
# Extract organization structure
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "user", "content": """
Acme Corporation
CEO: Jane Smith, Chief Executive Officer
Departments:
1. Engineering
Head: Bob Johnson, CTO
Employees:
- Sarah Lee, Senior Engineer
- Tom Brown, Software Developer
Sub-departments:
- Frontend Team
Head: Lisa Wang, Frontend Lead
Employees:
- Mike Chen, UI Developer
- Ana Garcia, UX Designer
- Backend Team
Head: David Kim, Backend Lead
Employees:
- James Wright, Database Engineer
- Rachel Patel, API Developer
2. Marketing
Head: Michael Davis, CMO
Employees:
- Jennifer Miller, Marketing Specialist
- Robert Chen, Content Creator
"""}
],
response_model=Organization
)
有关组织结构的更多信息,请参阅依赖树指南。