跳到内容

利用本地模型对私有数据进行分类

在本文中,我们将向您展示如何将Llama-cpp-python与instructor结合使用进行分类。这对于希望确保机密文档安全处理且永不离开自有基础设施的用户来说,是一个完美的用例。

设置

首先,在本地python环境中安装所需的库。这可能需要一些时间,因为我们需要为您的特定环境构建和编译llama-cpp

pip install instructor pydantic

接下来,我们将安装llama-cpp-python,这是一个允许我们在python脚本中使用llama-cpp的python包。

在本教程中,我们将使用TheBlokeMistral-7B-Instruct-v0.2-GGUF模型进行函数调用。这将需要大约6GB的内存和一块GPU。

我们可以通过运行以下命令来安装该包

CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python

没有GPU?

如果您没有GPU,我们建议改用Qwen2-0.5B-Instruct模型,并编译llama-cpp-python以使用OpenBLAS。这允许您使用CPU运行程序。

您可以通过运行命令编译支持OpenBLASllama-cpp-python

CMAKE_ARGS="-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python

使用LLama-cpp-python

以下是使用本地模型处理机密文档查询系统的示例

from llama_cpp import Llama  # type: ignore
import instructor
from pydantic import BaseModel
from enum import Enum
from typing import Optional

llm = Llama.from_pretrained(  # type: ignore
    repo_id="TheBloke/Mistral-7B-Instruct-v0.2-GGUF",  # (1)!
    filename="*Q4_K_M.gguf",
    verbose=False,  # (2)!
    n_gpu_layers=-1,  # (3)!
)

create = instructor.patch(
    create=llm.create_chat_completion_openai_v1,  # type: ignore  # (4)!
)


# Define query types for document-related inquiries
class QueryType(str, Enum):
    DOCUMENT_CONTENT = "document_content"
    LAST_MODIFIED = "last_modified"
    ACCESS_PERMISSIONS = "access_permissions"
    RELATED_DOCUMENTS = "related_documents"


# Define the structure for query responses
class QueryResponse(BaseModel):
    query_type: QueryType
    response: str
    additional_info: Optional[str] = None


def process_confidential_query(query: str) -> QueryResponse:
    prompt = f"""Analyze the following confidential document query and provide an appropriate response:
    Query: {query}

    Determine the type of query (document content, last modified, access permissions, or related documents),
    provide a response, and include a confidence score and any additional relevant information.
    Remember, you're handling confidential data, so be cautious about specific details.
    """

    return create(
        response_model=QueryResponse,  # (5)!
        messages=[
            {
                "role": "system",
                "content": "You are a secure AI assistant trained to handle confidential document queries.",
            },
            {"role": "user", "content": prompt},
        ],
    )


# Sample confidential document queries
confidential_queries = [
    "What are the key findings in the Q4 financial report?",
    "Who last accessed the merger proposal document?",
    "What are the access permissions for the new product roadmap?",
    "Are there any documents related to Project X's budget forecast?",
    "When was the board meeting minutes document last updated?",
]

# Process each query and print the results
for query in confidential_queries:
    response: QueryResponse = process_confidential_query(query)
    print(f"{query} : {response.query_type}")
    """
    #> What are the key findings in the Q4 financial report? : document_content
    #> Who last accessed the merger proposal document? : access_permissions
    #> What are the access permissions for the new product roadmap? : access_permissions
    #> Are there any documents related to Project X's budget forecast? : document_content
    #> When was the board meeting minutes document last updated? : last_modified
    """
  1. 我们从Hugging Face加载模型并将其本地缓存。这使得我们快速轻松地试验不同的模型配置和类型。

  2. 我们可以将verbose设置为True以记录llama.cpp的所有输出。这有助于您调试特定问题。

  3. 如果您的GPU内存有限,请将n_gpu设置为一个较小的数字(例如10)。我们在此处将其设置为-1,以便默认将所有模型层加载到GPU上。

  4. 现在,请确保使用兼容OpenAI的create_chat_completion_openai_v1 API来修补客户端。

  5. 像我们支持的任何其他推理客户端一样,将响应模型作为参数传入。

结论

instructor为需要本地处理机密文档查询的组织提供了一个强大的解决方案。通过在您自己的硬件上处理这些查询,您可以利用先进的AI能力,同时保持最高的数据隐私和安全标准。

但这不仅仅是简单的机密文档,使用本地模型开启了一个充满有趣用例、精调专业模型等的全新世界!