利用本地模型对私有数据进行分类¶
在本文中,我们将向您展示如何将Llama-cpp-python与instructor结合使用进行分类。这对于希望确保机密文档安全处理且永不离开自有基础设施的用户来说,是一个完美的用例。
设置¶
首先,在本地python环境中安装所需的库。这可能需要一些时间,因为我们需要为您的特定环境构建和编译llama-cpp
。
接下来,我们将安装llama-cpp-python
,这是一个允许我们在python脚本中使用llama-cpp的python包。
在本教程中,我们将使用TheBloke
的Mistral-7B-Instruct-v0.2-GGUF
模型进行函数调用。这将需要大约6GB的内存和一块GPU。
我们可以通过运行以下命令来安装该包
没有GPU?
如果您没有GPU,我们建议改用Qwen2-0.5B-Instruct
模型,并编译llama-cpp-python以使用OpenBLAS
。这允许您使用CPU运行程序。
您可以通过运行命令编译支持OpenBLAS
的llama-cpp-python
使用LLama-cpp-python
¶
以下是使用本地模型处理机密文档查询系统的示例
from llama_cpp import Llama # type: ignore
import instructor
from pydantic import BaseModel
from enum import Enum
from typing import Optional
llm = Llama.from_pretrained( # type: ignore
repo_id="TheBloke/Mistral-7B-Instruct-v0.2-GGUF", # (1)!
filename="*Q4_K_M.gguf",
verbose=False, # (2)!
n_gpu_layers=-1, # (3)!
)
create = instructor.patch(
create=llm.create_chat_completion_openai_v1, # type: ignore # (4)!
)
# Define query types for document-related inquiries
class QueryType(str, Enum):
DOCUMENT_CONTENT = "document_content"
LAST_MODIFIED = "last_modified"
ACCESS_PERMISSIONS = "access_permissions"
RELATED_DOCUMENTS = "related_documents"
# Define the structure for query responses
class QueryResponse(BaseModel):
query_type: QueryType
response: str
additional_info: Optional[str] = None
def process_confidential_query(query: str) -> QueryResponse:
prompt = f"""Analyze the following confidential document query and provide an appropriate response:
Query: {query}
Determine the type of query (document content, last modified, access permissions, or related documents),
provide a response, and include a confidence score and any additional relevant information.
Remember, you're handling confidential data, so be cautious about specific details.
"""
return create(
response_model=QueryResponse, # (5)!
messages=[
{
"role": "system",
"content": "You are a secure AI assistant trained to handle confidential document queries.",
},
{"role": "user", "content": prompt},
],
)
# Sample confidential document queries
confidential_queries = [
"What are the key findings in the Q4 financial report?",
"Who last accessed the merger proposal document?",
"What are the access permissions for the new product roadmap?",
"Are there any documents related to Project X's budget forecast?",
"When was the board meeting minutes document last updated?",
]
# Process each query and print the results
for query in confidential_queries:
response: QueryResponse = process_confidential_query(query)
print(f"{query} : {response.query_type}")
"""
#> What are the key findings in the Q4 financial report? : document_content
#> Who last accessed the merger proposal document? : access_permissions
#> What are the access permissions for the new product roadmap? : access_permissions
#> Are there any documents related to Project X's budget forecast? : document_content
#> When was the board meeting minutes document last updated? : last_modified
"""
-
我们从Hugging Face加载模型并将其本地缓存。这使得我们快速轻松地试验不同的模型配置和类型。
-
我们可以将
verbose
设置为True
以记录llama.cpp
的所有输出。这有助于您调试特定问题。 -
如果您的GPU内存有限,请将
n_gpu
设置为一个较小的数字(例如10)。我们在此处将其设置为-1
,以便默认将所有模型层加载到GPU上。 -
现在,请确保使用兼容OpenAI的
create_chat_completion_openai_v1
API来修补客户端。 -
像我们支持的任何其他推理客户端一样,将响应模型作为参数传入。
结论¶
instructor
为需要本地处理机密文档查询的组织提供了一个强大的解决方案。通过在您自己的硬件上处理这些查询,您可以利用先进的AI能力,同时保持最高的数据隐私和安全标准。
但这不仅仅是简单的机密文档,使用本地模型开启了一个充满有趣用例、精调专业模型等的全新世界!