跳到内容

提示词模板化

通过 Instructor 的 Jinja 模板化功能,您可以

  • 将提示词动态适应任何上下文
  • 更好地管理和版本化您的提示词
  • 与验证流程无缝集成
  • 安全地处理敏感信息

我们的解决方案提供

  • 提示词结构和内容分离
  • 在提示词中实现复杂逻辑
  • 模板在不同场景下的复用性
  • 增强的提示词版本控制和日志记录
  • Pydantic 集成用于验证和类型安全

上下文可用于模板引擎

context 参数是一个字典,它被传递给模板引擎。它用于将相关的变量传递给模板引擎。这个单独的 context 参数将被传递给 Jinja,用于渲染出最终的提示词。

import openai
import instructor
from pydantic import BaseModel

client = instructor.from_openai(openai.OpenAI())


class User(BaseModel):
    name: str
    age: int


resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": """Extract the information from the
        following text: `{{ data }}`""",  # (1)!
        },
    ],
    response_model=User,
    context={"data": "John Doe is thirty years old"},  # (2)!
)

print(resp)
#> name='John Doe' age=30
  1. 在提示词本身内部声明 Jinja 风格的模板变量(例如 {{ name }}
  2. 将要使用的变量传递到 context 参数中

上下文可用于 Pydantic 验证器

在此示例中,我们演示了如何利用 context 参数与 Pydantic 验证器结合,增强我们的验证和数据处理能力。通过将 context 传递给验证器,我们可以根据输入上下文实现动态验证规则和数据转换。这种方法允许灵活且感知上下文的验证,例如检查禁用词或对敏感信息应用脱敏模式。

import openai
import instructor
from pydantic import BaseModel, ValidationInfo, field_validator
import re

client = instructor.from_openai(openai.OpenAI())


class Response(BaseModel):
    text: str

    @field_validator('text')
    @classmethod
    def redact_regex(cls, v: str, info: ValidationInfo):
        context = info.context
        if context:
            redact_patterns = context.get('redact_patterns', [])
            for pattern in redact_patterns:
                v = re.sub(pattern, '****', v)
        return v


response = client.create(
    model="gpt-4o",
    response_model=Response,
    messages=[
        {
            "role": "user",
            "content": """
                Write about a {{ topic }}

                {% if banned_words %}
                You must not use the following banned words:

                <banned_words>
                {% for word in banned_words %}
                * {{ word }}
                {% endfor %}
                </banned_words>
                {% endif %}
              """,
        },
    ],
    context={
        "topic": "jason and now his phone number is 123-456-7890",
        "redact_patterns": [
            r"\b\d{3}[-.]?\d{3}[-.]?\d{4}\b",  # Phone number pattern
            r"\b\d{3}-\d{2}-\d{4}\b",  # SSN pattern
        ],
    },
    max_retries=3,
)

print(response.text)
"""
Jason is a vibrant and dynamic individual known for his charismatic personality and entrepreneurial spirit. He has always been passionate about technology and innovation, which led him to start his own tech company. Throughout his career, Jason has been dedicated to making a significant impact in the tech industry, always seeking out new opportunities to learn and grow.

In addition to his professional endeavors, Jason is an adventurous person who loves to travel and explore new places. He finds joy in experiencing different cultures and meeting new people, which has contributed to his broad worldview and understanding of global markets.

Jason’s journey is one of hard work, resilience, and determination, as he continuously pushes the boundaries to achieve his goals and inspire those around him.

(Note: Personal phone numbers should remain confidential and not be shared publicly to protect privacy.)
"""
  1. 在您的 Pydantic 验证器中访问传递到 context 变量中的变量

  2. 将用于验证和/或渲染的变量传递到 context 参数中

Jinja 语法

Jinja 用于渲染提示词,允许使用熟悉的 Jinja 语法。这使得列表、条件语句等的渲染成为可能。它还允许在 Jinja 中调用函数和方法。

这使得提示词的格式化和渲染逻辑变得非常容易。

import openai
import instructor
from pydantic import BaseModel

client = instructor.from_openai(openai.OpenAI())


class Citation(BaseModel):
    source_ids: list[int]
    text: str


class Response(BaseModel):
    answer: list[Citation]


resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": """
                You are a {{ role }} tasks with the following question

                <question>
                {{ question }}
                </question>

                Use the following context to answer the question, make sure to return [id] for every citation:

                <context>
                {% for chunk in context %}
                  <context_chunk>
                    <id>{{ chunk.id }}</id>
                    <text>{{ chunk.text }}</text>
                  </context_chunk>
                {% endfor %}
                </context>

                {% if rules %}
                Make sure to follow these rules:

                {% for rule in rules %}
                  * {{ rule }}
                {% endfor %}
                {% endif %}
            """,
        },
    ],
    response_model=Response,
    context={
        "role": "professional educator",
        "question": "What is the capital of France?",
        "context": [
            {"id": 1, "text": "Paris is the capital of France."},
            {"id": 2, "text": "France is a country in Europe."},
        ],
        "rules": ["Use markdown."],
    },
)

print(resp)
#> answer=[Citation(source_ids=[1], text='The capital of France is Paris.')]
# answer=[Citation(source_ids=[1], text='The capital of France is Paris.')]

处理敏感信息

将提示词发送给模型提供商时,可能需要包含敏感的用户信息。这或许是您不希望硬编码到提示词中或在日志中捕获的内容。一个简单的解决方法是在模型定义中使用 PydanticSecretStr 类型。

from pydantic import BaseModel, SecretStr
import instructor
import openai


class UserContext(BaseModel):
    name: str
    address: SecretStr


class Address(BaseModel):
    street: SecretStr
    city: str
    state: str
    zipcode: str


client = instructor.from_openai(openai.OpenAI())
context = UserContext(name="scolvin", address="secret address")

address = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "{{ user.name }} is `{{ user.address.get_secret_value() }}`, normalize it to an address object",
        },
    ],
    context={"user": context},
    response_model=Address,
)
print(context)
#> name='scolvin' address=SecretStr('**********')
print(address)
#> street=SecretStr('**********') city='scolvin' state='NA' zipcode='00000'

这允许您在提示词中使用敏感信息的同时保护其不被暴露。

安全性

我们使用 jinja2.sandbox.SandboxedEnvironment 来防止模板引擎的安全问题。这意味着您不能在提示词中使用任意的 Python 代码。但这并不意味着您应该将不受信任的输入传递给模板引擎,因为这仍然可能被滥用于拒绝服务攻击等目的。

您应该 始终清理 传递给模板引擎的任何输入。