跳到内容

示例:使用经验证的引用回答问题

完整代码示例请查看 examples/citation_fuzzy_match.py

概述

本示例展示了如何将 Instructor 与验证器结合使用,不仅为生成的答案添加引用,还能通过确保 LLM 做出的每个陈述都有提供的上下文中的直接引用支持,并且这些引用确实存在,从而防止幻觉!
定义了两个 Python 类,FactQuestionAnswer,分别用于封装单个事实和整个答案的信息。

数据结构

Fact 类

Fact 类封装了一个单独的陈述或事实。它包含两个字段:

  • fact:一个字符串,表示事实或陈述的主体。
  • substring_quote:一个字符串列表。每个字符串是来自支持该 fact 的上下文的直接引用。

验证方法:validate_sources

此方法验证上下文中的来源(substring_quote)。它使用 regex 查找给定上下文中每个子字符串引用的跨度。如果未找到跨度,则从列表中移除该引用。

from pydantic import Field, BaseModel, model_validator, ValidationInfo
from typing import List


class Fact(BaseModel):
    fact: str = Field(...)
    substring_quote: List[str] = Field(...)

    @model_validator(mode="after")
    def validate_sources(self, info: ValidationInfo) -> "Fact":
        text_chunks = info.context.get("text_chunk", None)
        spans = list(self.get_spans(text_chunks))
        self.substring_quote = [text_chunks[span[0] : span[1]] for span in spans]
        return self

    def get_spans(self, context):
        for quote in self.substring_quote:
            yield from self._get_span(quote, context)

    def _get_span(self, quote, context):
        for match in re.finditer(re.escape(quote), context):
            yield match.span()

QuestionAnswer 类

此类封装了问题及其对应的答案。它包含两个字段:

  • question:提出的问题。
  • answer:一个 Fact 对象列表,构成答案。

验证方法:validate_sources

此方法检查 answer 列表中的每个 Fact 对象是否至少有一个有效来源。如果一个 Fact 对象没有有效来源,则将其从 answer 列表中移除。

from pydantic import BaseModel, Field, model_validator
from typing import List

class QuestionAnswer(BaseModel):
    question: str = Field(...)
    answer: List[Fact] = Field(...)

    @model_validator(mode="after")
    def validate_sources(self) -> "QuestionAnswer":
        self.answer = [fact for fact in self.answer if len(fact.substring_quote) > 0]
        return self

提问 AI 的函数

ask_ai 函数

此函数接受一个字符串 question 和一个字符串 context,并返回一个 QuestionAnswer 对象。它使用 OpenAI API 获取答案,然后使用定义的类验证来源。

要了解 pydantic 的验证上下文工作原理,请查看 pydantic 文档

from openai import OpenAI
import instructor

# Apply the patch to the OpenAI client
# enables response_model, validation_context keyword
client = instructor.from_openai(OpenAI())


def ask_ai(question: str, context: str) -> QuestionAnswer:
    return client.chat.completions.create(
        model="gpt-3.5-turbo-0613",
        temperature=0,
        response_model=QuestionAnswer,
        messages=[
            {
                "role": "system",
                "content": "You are a world class algorithm to answer questions with correct and exact citations.",
            },
            {"role": "user", "content": f"{context}"},
            {"role": "user", "content": f"Question: {question}"},
        ],
        validation_context={"text_chunk": context},
    )

示例

以下是使用这些类和函数提问并验证答案的示例。

question = "What did the author do during college?"
context = """
My name is Jason Liu, and I grew up in Toronto Canada but I was born in China.
I went to an arts high school but in university I studied Computational Mathematics and physics.
As part of coop I worked at many companies including Stitchfix, Facebook.
I also started the Data Science club at the University of Waterloo and I was the president of the club for 2 years.
"""

输出将是一个包含经验证事实及其来源的 QuestionAnswer 对象。

{
    "question": "where did he go to school?",
    "answer": [
        {
            "statement": "Jason Liu went to an arts highschool.",
            "substring_phrase": ["arts highschool"],
        },
        {
            "statement": "Jason Liu studied Computational Mathematics and physics in university.",
            "substring_phrase": ["university"],
        },
    ],
}

这确保了答案中的每一条信息都已对照上下文进行验证。