示例:使用经验证的引用回答问题¶
完整代码示例请查看 examples/citation_fuzzy_match.py
概述¶
本示例展示了如何将 Instructor 与验证器结合使用,不仅为生成的答案添加引用,还能通过确保 LLM 做出的每个陈述都有提供的上下文中的直接引用支持,并且这些引用确实存在,从而防止幻觉!
定义了两个 Python 类,Fact
和 QuestionAnswer
,分别用于封装单个事实和整个答案的信息。
数据结构¶
Fact 类¶
Fact 类封装了一个单独的陈述或事实。它包含两个字段:
fact
:一个字符串,表示事实或陈述的主体。substring_quote
:一个字符串列表。每个字符串是来自支持该fact
的上下文的直接引用。
验证方法:validate_sources
¶
此方法验证上下文中的来源(substring_quote
)。它使用 regex 查找给定上下文中每个子字符串引用的跨度。如果未找到跨度,则从列表中移除该引用。
from pydantic import Field, BaseModel, model_validator, ValidationInfo
from typing import List
class Fact(BaseModel):
fact: str = Field(...)
substring_quote: List[str] = Field(...)
@model_validator(mode="after")
def validate_sources(self, info: ValidationInfo) -> "Fact":
text_chunks = info.context.get("text_chunk", None)
spans = list(self.get_spans(text_chunks))
self.substring_quote = [text_chunks[span[0] : span[1]] for span in spans]
return self
def get_spans(self, context):
for quote in self.substring_quote:
yield from self._get_span(quote, context)
def _get_span(self, quote, context):
for match in re.finditer(re.escape(quote), context):
yield match.span()
QuestionAnswer 类¶
此类封装了问题及其对应的答案。它包含两个字段:
question
:提出的问题。answer
:一个 Fact 对象列表,构成答案。
验证方法:validate_sources
¶
此方法检查 answer
列表中的每个 Fact 对象是否至少有一个有效来源。如果一个 Fact 对象没有有效来源,则将其从 answer
列表中移除。
from pydantic import BaseModel, Field, model_validator
from typing import List
class QuestionAnswer(BaseModel):
question: str = Field(...)
answer: List[Fact] = Field(...)
@model_validator(mode="after")
def validate_sources(self) -> "QuestionAnswer":
self.answer = [fact for fact in self.answer if len(fact.substring_quote) > 0]
return self
提问 AI 的函数¶
ask_ai
函数¶
此函数接受一个字符串 question
和一个字符串 context
,并返回一个 QuestionAnswer
对象。它使用 OpenAI API 获取答案,然后使用定义的类验证来源。
要了解 pydantic 的验证上下文工作原理,请查看 pydantic 文档
from openai import OpenAI
import instructor
# Apply the patch to the OpenAI client
# enables response_model, validation_context keyword
client = instructor.from_openai(OpenAI())
def ask_ai(question: str, context: str) -> QuestionAnswer:
return client.chat.completions.create(
model="gpt-3.5-turbo-0613",
temperature=0,
response_model=QuestionAnswer,
messages=[
{
"role": "system",
"content": "You are a world class algorithm to answer questions with correct and exact citations.",
},
{"role": "user", "content": f"{context}"},
{"role": "user", "content": f"Question: {question}"},
],
validation_context={"text_chunk": context},
)
示例¶
以下是使用这些类和函数提问并验证答案的示例。
question = "What did the author do during college?"
context = """
My name is Jason Liu, and I grew up in Toronto Canada but I was born in China.
I went to an arts high school but in university I studied Computational Mathematics and physics.
As part of coop I worked at many companies including Stitchfix, Facebook.
I also started the Data Science club at the University of Waterloo and I was the president of the club for 2 years.
"""
输出将是一个包含经验证事实及其来源的 QuestionAnswer
对象。
{
"question": "where did he go to school?",
"answer": [
{
"statement": "Jason Liu went to an arts highschool.",
"substring_phrase": ["arts highschool"],
},
{
"statement": "Jason Liu studied Computational Mathematics and physics in university.",
"substring_phrase": ["university"],
},
],
}
这确保了答案中的每一条信息都已对照上下文进行验证。