跳到主要内容

可选字段

本指南解释如何在数据模型中使用可选字段。可选字段允许模型在信息不可用或不确定时跳过某些字段。

为何使用可选字段?

在以下情况下,可选字段很有用:

  1. 输入文本中缺少某些信息
  2. 某些字段仅在特定上下文中相关
  3. 大型语言模型 (LLM) 无法确定地提取所有字段
  4. 您希望允许部分成功而不是完全失败

基本可选字段

要将字段设置为可选,请使用 Python 的 Optional 类型并提供一个默认值

from typing import Optional
from pydantic import BaseModel
import instructor
from openai import OpenAI

client = instructor.from_openai(OpenAI())

class Person(BaseModel):
    name: str  # Required field
    age: Optional[int] = None  # Optional field with None default
    occupation: Optional[str] = None  # Optional field with None default

这里,name 是必需的,而 ageoccupation 是可选的,如果未找到,它们将默认为 None

使用默认值

您可以为可选字段提供有意义的默认值

from typing import List
from pydantic import BaseModel
import instructor
from openai import OpenAI

client = instructor.from_openai(OpenAI())

class Product(BaseModel):
    name: str
    price: float
    currency: str = "USD"  # Default value
    in_stock: bool = True  # Default value
    tags: List[str] = []  # Default empty list

带验证的可选字段

您可以添加 Field 类以实现更多控制和验证

from typing import Optional
from pydantic import BaseModel, Field
import instructor
from openai import OpenAI

client = instructor.from_openai(OpenAI())

class UserProfile(BaseModel):
    username: str
    email: str
    bio: Optional[str] = Field(
        None,  # Default value
        max_length=200,  # Validation applies if present
        description="User's biography, limited to 200 characters"
    )

可选嵌套结构

整个嵌套结构都可以是可选的

from typing import Optional
from pydantic import BaseModel
import instructor
from openai import OpenAI

client = instructor.from_openai(OpenAI())

class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: str

class Contact(BaseModel):
    email: str
    phone: Optional[str] = None
    address: Optional[Address] = None  # Optional nested structure

class Person(BaseModel):
    name: str
    contact: Contact

使用可选嵌套结构时,在访问之前检查它们是否存在

# Access nested data safely
if person.contact.address:
    print(f"Address: {person.contact.address.city}")
else:
    print("No address information available")

为不确定字段使用 Maybe

Instructor 提供了一个 Maybe 类型,用于表示不确定或模棱两可的字段

from pydantic import BaseModel
import instructor
from openai import OpenAI
from instructor.types import Maybe

client = instructor.from_openai(OpenAI())

class PersonInfo(BaseModel):
    name: str
    age: Maybe[int] = None  # Maybe type for uncertain fields

检查 Maybe 字段是否包含不确定信息

if person.age and person.age.is_uncertain:
    print(f"Uncertain age: approximately {person.age.value}")
elif person.age:
    print(f"Age: {person.age.value}")
else:
    print("Age: Unknown")

有关 Maybe 类型的更多信息,请参阅缺失概念页面。

处理可选值

始终在您的代码中处理 None 值的可能性

# Check for None before using
if person.age is not None:
    drinking_age = "Legal" if person.age >= 21 else "Underage"
else:
    drinking_age = "Unknown"

# Use conditional expressions
price_display = f"${product.price}" if product.price is not None else "Price unavailable"

# Provide defaults with 'or'
display_name = user.nickname or user.username

可选字段的验证

可选字段在存在时仍然可以进行验证

from typing import Optional
from pydantic import BaseModel, field_validator
import instructor
from openai import OpenAI
import re

client = instructor.from_openai(OpenAI())

class ContactInfo(BaseModel):
    email: str
    phone: Optional[str] = None

    @field_validator('phone')
    @classmethod
    def validate_phone(cls, v):
        if v is not None and not re.match(r'^\+?[1-9]\d{1,14}$', v):
            raise ValueError("Invalid phone format")
        return v

下一步