使用 GPT-4o 生成一致的故事¶

语言模型在生成具有大量节点的一致图时会遇到困难。通常情况下，这是因为图本身对于模型来说太大了。这会导致模型生成不一致的图，其中包含无效节点和断开连接的节点等问题。

在本文中，我们将通过一个简单的“选择你自己的冒险”故事生成示例，探讨如何使用两阶段方法结合 gpt-4o 来生成复杂的 DAGs，从而克服这一限制。

为什么 DAGs 很重要？¶

DAGs 是有向无环图。当节点之间的每个连接都是有方向的（单向移动），并且没有循环（不会回到先前的节点）时，该图被视为 DAG。

graph TD
    A --> B
    A --> C
    B --> D
    C --> D

这与“选择你自己的冒险”故事非常相似，在该故事中，用户在每个步骤都有固定的选择集，并且只能在故事中向前推进。我们可以在下面看到实际应用

graph TD
    A[Story Root] --> B[Choice 1]
    A --> C[Choice 2]
    A --> D[Choice 3]
    B --> E[Choice 1.1]
    B --> F[Choice 1.2]
    C --> G[Choice 2.1]
    C --> H[Choice 2.2]
    D --> I[Choice 3.1]
    D --> J[Choice 3.2]

挑战：扩展故事生成¶

当我们尝试使用语言模型一次性生成一个故事时，很快就会遇到几个限制，因为即使每一步只有 4 个选择，到第二层时我们已经有 20 个节点了。如果用户只能做 2 个选择故事就结束了，那并不是一个非常有趣的故事。

换句话说，我们将很快溢出模型的上下文窗口。为了解决这个问题，我们可以使用两阶段方法来生成故事，即先生成初始故事设定，然后并行生成选项/其他选择。

并行故事生成¶

生成大纲¶

首先，我们使用 gpt-4o 生成故事的大纲。这很重要，因为它为我们提供了起始设定、视觉风格和图像描述（用于横幅图像）。我们可以在后续使用这些信息，尽可能确保生成的图像保持一致。

from pydantic import BaseModel
from typing import List

class GeneratedStory(BaseModel):
    setting: str
    plot_summary: str
    choices: List[str]
    visual_style: str
    image_description: str

async def generate_story(
    client: instructor.AsyncInstructor,
    story_input: RestateStoryInput
):
    resp = await client.chat.completions.create(
        messages=[{
            "role": "user",
            "content": """
            Generate a story with:
            - Setting: {{ story_input.setting}}
            - Title: {{ story_input.title }}

            Rules:
            - Generate 2-4 initial choices that represent actions
            - Choices must move story forward
            - Include brief setting description
            - Generate a visual description for the story

            Required Elements:
            1. Plot Summary: A vivid description of the setting and plot
            2. Initial Choices: 2-4 distinct actions the user can take
            3. Visual Style: Description of art style, color palette
            4. Image Description: One-sentence scene description
            """
        }],
        model="gpt-4o",
        response_model=GeneratedStory,
        context={"story_input": story_input},
    )
    return resp

这将输出包含设定、情节摘要、选择、视觉风格和图像描述的故事。

# Example generated output
{
    "setting": "A neon-lit cyberpunk metropolis in 2150",
    "plot_summary": "In the sprawling city of Neo-Tokyo...",
    "choices": [
        "Investigate the mysterious signal in the abandoned district",
        "Meet your contact at the underground hacker hub",
        "Follow the corporate executive who seems suspicious"
    ],
    "visual_style": "Vibrant neon colors, detailed cyberpunk architecture",
    "image_description": "A towering cyberpunk cityscape at night with neon signs"
}

并行选项扩展¶

生成深度故事树的最大挑战之一是在故事分支增长时保持一致性。

以下是我们在并行生成和状态跟踪方面解决此问题的方法

graph TD
    %% Main nodes
    A[Find Door] --> B[Open Door]
    A --> C[Walk Away]

    B --> D[Read Book]
    B --> E[Leave Room]

    C --> F[Go Home]
    C --> G[Wait Outside]

    %% Styling for visual hierarchy
    classDef start fill:#ff9999,stroke:#333,stroke-width:2px
    classDef decision fill:#99ccff,stroke:#333,stroke-width:2px
    classDef outcome fill:#99ffff,stroke:#333,stroke-width:1px

    %% Apply styles
    class A start
    class B,C decision
    class D,E,F,G outcome

    %% Add tooltips for context
    click B "Door context" "Open Door Context"
    click C "Away context" "Walk Away Context"
    click D "Door and Book context" "Read Book Context"

关键在于，故事树中的每个路径都有其独特的独立状态。我们通过一个简单的累加器来实现这一点，它可以帮助我们跟踪先前的选择和故事上下文。

在此值得注意的是，模型也完全可以在任何时间点结束故事。

以下是我们的实现方式

async def rewrite_choice(
    client: instructor.AsyncInstructor,
    choice: str,
    story: GeneratedStory,
    prev_choices: list[dict],  # Accumulator for path state
    max_depth: int,
    sem: asyncio.Semaphore
) -> FinalStoryChoice:
    # Each choice knows its entire path history
    async with sem:
        rewritten_choice = await client.chat.completions.create(
            model="gpt-4o",
            response_model=RewrittenChoice,
            messages=[{
                "role": "user",
                "content": """
                Given this choice: {{ choice }}

                Story context:
                Setting: {{ story.setting }}
                Plot: {{ story.plot_summary }}

                Previous choices made in this path:
                {% for prev in prev_choices %}
                - {{ prev.choice_description }}
                  Result: {{ prev.choice_consequences }}
                {% endfor %}

                Generate the next story beat and 2-4 new choices.
                The story should end in {{ max_depth - len(prev_choices) }} more turns.
                """
            }],
            context={
                "choice": choice,
                "story": story,
                "prev_choices": prev_choices,
            }
        )

    # For terminal nodes (at max depth)
    if len(prev_choices) == max_depth - 1:
        return FinalStoryChoice(
            choice_description=rewritten_choice.choice_description,
            choice_consequences=rewritten_choice.choice_consequences,
            choices=[]  # Terminal node
        )

    # Recursively expand child choices
    child_choices = await asyncio.gather(*[
        rewrite_choice(
            client=client,
            choice=new_choice,
            story=story,
            prev_choices=prev_choices + [{
                "choice_description": rewritten_choice.choice_description,
                "choice_consequences": rewritten_choice.choice_consequences
            }],
            max_depth=max_depth,
            sem=sem
        )
        for new_choice in rewritten_choice.choices
    ])

    return FinalStoryChoice(
        choice_description=rewritten_choice.choice_description,
        choice_consequences=rewritten_choice.choice_consequences,
        choices=child_choices
    )

这种方法带来了几个关键优势

路径特定上下文：每个节点都保留了导致该节点的完整选择历史记录，确保了每个分支内的一致性
并行生成：由于每个分支都保持自身的状态，因此可以同时生成不同的分支
受控增长：max_depth 参数防止了指数级扩展
速率限制：信号量控制并发 API 调用，同时实现最大程度的并行化

信号量不仅用于速率限制，它还能确保我们在可控的速度下处理选择，同时保持状态的一致性。

故事树中的每个路径都成为一个独立的叙事，可以访问其完整的历史记录，这使我们能够以比单次调用快得多速度和详细程度生成连贯的故事。

此外，我们可以生成比单次调用能够生成的更广泛、更深度的故事。

故事生成之外¶

这种方法的成功归结为三个关键原则

状态隔离：每个节点仅保留所需的上下文，防止上下文窗口溢出
并行处理：可以在不同分支上同时进行生成，从而显著减少总生成时间
结构化验证：使用 Pydantic 模型确保每个生成的组件都符合您的要求

例如，顺序生成一个 20 节点的故事树可能需要 60 秒（每个节点 3 秒），但使用并行生成和 10 个并发请求，可能只需 45-50 秒即可完成。

这种模式在以下情况下特别有价值

您的生成任务自然形成树形或图形结构
单个节点需要其祖先的部分而非全部上下文
您需要生成超出单个上下文窗口的内容
生成速度很重要

通过将结构化输出与并行生成相结合，您可以可靠地大规模生成复杂、相互关联的内容，同时保持一致性和控制力。

instructor 使使用语言模型生成复杂数据结构变得容易，无论它们是使用 ollama 的开源模型，还是使用 OpenAI 等提供商的专有模型。今天就来试试吧！