跳到内容

流式列表

本指南解释了如何使用 Instructor 流式处理结构化数据列表。流式处理列表允许您在集合项生成时即进行处理,从而提高了处理较大输出时的响应速度。

基本列表流式处理

以下是流式处理结构化对象列表的方法

from typing import List
import instructor
from openai import OpenAI
from pydantic import BaseModel, Field

# Initialize the client
client = instructor.from_openai(OpenAI())

class Book(BaseModel):
    title: str = Field(..., description="Book title")
    author: str = Field(..., description="Book author")
    year: int = Field(..., description="Publication year")

# Stream a list of books
for book in client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "List 5 classic science fiction books"}
    ],
    response_model=List[Book],  # Note: Using List directly
    stream=True
):
    print(f"Received: {book.title} by {book.author} ({book.year})")

此示例展示了如何: 1. 为每个列表项定义一个 Pydantic 模型 2. 使用 Python 的类型系统指定列表 3. 在流中处理每个到达的项

实际示例:任务生成

这是一个流式处理任务列表并跟踪进度的实际示例

from typing import List
import instructor
from openai import OpenAI
from pydantic import BaseModel, Field
import time

client = instructor.from_openai(OpenAI())

class Task(BaseModel):
    title: str = Field(..., description="Task title")
    description: str = Field(..., description="Detailed task description")
    priority: str = Field(..., description="Task priority (High/Medium/Low)")
    estimated_hours: float = Field(..., description="Estimated hours to complete")

print("Generating project tasks...")
start_time = time.time()
received_tasks = 0

for task in client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "Generate a list of 5 tasks for building a personal website"}
    ],
    response_model=List[Task],
    stream=True
):
    received_tasks += 1
    print(f"\nTask {received_tasks}: {task.title} (Priority: {task.priority})")
    print(f"Description: {task.description[:100]}...")
    print(f"Estimated time: {task.estimated_hours} hours")

    # Calculate progress percentage based on expected items
    progress = (received_tasks / 5) * 100
    print(f"Progress: {progress:.0f}%")

elapsed_time = time.time() - start_time
print(f"\nAll {received_tasks} tasks generated in {elapsed_time:.2f} seconds")

后续步骤