使用异步处理实现 Anthropic 的情境检索¶

Anthropic 的情境检索技术通过保留关键上下文来增强 RAG 系统。

本文探讨了该方法，并演示了使用异步处理的高效实现。我们将基于我们的异步处理指南中的概念，探索如何使用此方法优化您的 RAG 应用程序。

背景：RAG 中的上下文问题¶

Anthropic 指出了传统 RAG 系统中的一个关键问题：文档被分割成块时上下文丢失。他们提供了一个例子

“想象一下，您的知识库中嵌入了一系列财务信息（例如，美国证券交易委员会的备案文件），然后您收到了以下问题：‘ACME Corp 在 2023 年第二季度的收入增长是多少？’

一个相关的文本块可能包含以下文字：‘公司收入比上一季度增长了 3%。’然而，这个文本块本身没有指定它指的是哪家公司或相关的时期。”

Anthropic 的解决方案：情境检索¶

情境检索通过在嵌入之前添加块特定的解释性上下文来解决此问题。Anthropic 的例子

original_chunk = "The company's revenue grew by 3% over the previous quarter."

contextualized_chunk = "This chunk is from an SEC filing on ACME corp's performance in Q2 2023; the previous quarter's revenue was $314 million. The company's revenue grew by 3% over the previous quarter."

实现情境检索¶

Anthropic 使用 Claude 生成上下文。他们提供了这个提示

<document>
{{WHOLE_DOCUMENT}}
</document>
Here is the chunk we want to situate within the whole document
<chunk>
{{CHUNK_CONTENT}}
</chunk>
Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else.

性能改进¶

Anthropic 报告了显著的改进

情境嵌入将前 20 块检索失败率降低了 35%（5.7% → 3.7%）。
结合情境嵌入和情境 BM25 将失败率降低了 49%（5.7% → 2.9%）。
添加重新排序进一步将失败率降低了 67%（5.7% → 1.9%）。

Instructor 使用异步处理实现情境检索¶

我们可以使用异步处理实现 Anthropic 的技术，以提高效率

from instructor import AsyncInstructor, Mode, patch
from anthropic import AsyncAnthropic
from pydantic import BaseModel, Field
import asyncio
from typing import List, Dict


class SituatedContext(BaseModel):
    title: str = Field(..., description="The title of the document.")
    context: str = Field(
        ..., description="The context to situate the chunk within the document."
    )


client = AsyncInstructor(
    create=patch(
        create=AsyncAnthropic().beta.prompt_caching.messages.create,
        mode=Mode.ANTHROPIC_TOOLS,
    ),
    mode=Mode.ANTHROPIC_TOOLS,
)


async def situate_context(doc: str, chunk: str) -> str:
    response = await client.chat.completions.create(
        model="claude-3-haiku-20240307",
        max_tokens=1024,
        temperature=0.0,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "<document>{{doc}}</document>",
                        "cache_control": {"type": "ephemeral"},
                    },
                    {
                        "type": "text",
                        "text": "Here is the chunk we want to situate within the whole document\n<chunk>{{chunk}}</chunk>\nPlease give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk.\nAnswer only with the succinct context and nothing else.",
                    },
                ],
            }
        ],
        response_model=SituatedContext,
        context={"doc": doc, "chunk": chunk},
    )
    return response.context


def chunking_function(doc: str) -> List[str]:
    chunk_size = 1000
    overlap = 200
    chunks = []
    start = 0
    while start < len(doc):
        end = start + chunk_size
        chunks.append(doc[start:end])
        start += chunk_size - overlap
    return chunks


async def process_chunk(doc: str, chunk: str) -> Dict[str, str]:
    context = await situate_context(doc, chunk)
    return {"chunk": chunk, "context": context}


async def process(doc: str) -> List[Dict[str, str]]:
    chunks = chunking_function(doc)
    tasks = [process_chunk(doc, chunk) for chunk in chunks]
    results = await asyncio.gather(*tasks)
    return results


# Example usage
async def main():
    document = "Your full document text here..."
    processed_chunks = await process(document)
    for i, item in enumerate(processed_chunks):
        print(f"Chunk {i + 1}:")
        print(f"Text: {item['chunk'][:50]}...")
        print(f"Context: {item['context']}")
        print()


if __name__ == "__main__":
    asyncio.run(main())

此实现的关键特性¶

异步处理：使用 asyncio 进行并发块处理。
结构化输出：使用 Pydantic 模型实现类型安全的响应。
提示缓存：利用 Anthropic 的提示缓存提高效率。
分块：实现了带有重叠的基本分块策略。
Jinja2 模板：使用 Jinja2 模板将变量注入提示。

Anthropic 文章中的注意事项¶

Anthropic 提到了几个实现注意事项

块边界：尝试不同的块大小、边界和重叠。
嵌入模型：他们发现 Gemini 和 Voyage 嵌入效果很好。
自定义情境化提示：考虑特定领域的提示。
块数量：他们发现使用 20 个块最有效。
评估：始终针对您的特定用例运行评估。

进一步增强¶

基于 Anthropic 的建议

根据内容复杂性实现动态块大小调整。
与向量数据库集成，实现高效存储和检索。
添加错误处理和重试机制。
尝试不同的嵌入模型和提示。
实现重新排序步骤以进一步提高性能。

此实现提供了一个起点，可利用 Anthropic 的情境检索技术，并借助异步处理提高效率。