使用 DeepSeek 进行结构化输出,附带 Instructor 的完整指南¶
DeepSeek 是一家提供 AI 模型和服务的中国公司。他们最著名的模型是 deepseek coder 和 chat 模型,最近还推出了 R1 推理模型。
本指南涵盖了如何使用 DeepSeek 和 Instructor 获取类型安全、经过验证的响应所需了解的一切。
快速开始¶
Instructor 开箱即用地支持 OpenAI 客户端,因此您无需安装任何额外的东西。
⚠️ 重要: 在使用客户端之前,您必须设置 DeepSeek API 密钥。您可以通过两种方式完成此操作
- 设置环境变量
- 或者直接将其提供给客户端
import os
from openai import OpenAI
client = OpenAI(api_key=os.getenv('DEEPSEEK_API_KEY'), base_url="https://api.deepseek.com")
简单用户示例 (同步)¶
import os
from openai import OpenAI
from pydantic import BaseModel
import instructor
client = instructor.from_openai(
OpenAI(api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com")
)
class User(BaseModel):
name: str
age: int
# Create structured output
user = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "user", "content": "Extract: Jason is 25 years old"},
],
response_model=User,
)
print(user)
# > name='Jason' age=25
简单用户示例 (异步)¶
import os
import asyncio
from openai import AsyncOpenAI
from pydantic import BaseModel
import instructor
client = instructor.from_openai(
AsyncOpenAI(
api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com"
)
)
class User(BaseModel):
name: str
age: int
async def extract_user():
user = await client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "user", "content": "Extract: Jason is 25 years old"},
],
response_model=User,
)
return user
# Run async function
user = asyncio.run(extract_user())
print(user)
# > name='Jason' age=25
嵌套示例¶
from pydantic import BaseModel
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel
class Address(BaseModel):
street: str
city: str
country: str
class User(BaseModel):
name: str
age: int
addresses: list[Address]
# Initialize with API key
client = instructor.from_openai(
OpenAI(api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com")
)
# Create structured output with nested objects
user = client.chat.completions.create(
model="deepseek-chat",
messages=[
{
"role": "user",
"content": """
Extract: Jason is 25 years old.
He lives at 123 Main St, New York, USA
and has a summer house at 456 Beach Rd, Miami, USA
""",
},
],
response_model=User,
)
print(user)
#> {
#> 'name': 'Jason',
#> 'age': 25,
#> 'addresses': [
#> {
#> 'street': '123 Main St',
#> 'city': 'New York',
#> 'country': 'USA'
#> },
#> {
#> 'street': '456 Beach Rd',
#> 'city': 'Miami',
#> 'country': 'USA'
#> }
#> ]
#> }
流式支持¶
Instructor 提供两种主要的流式输出响应方式
- 可迭代对象: 当您希望流式传输同类型对象的列表时,这非常有用(例如,使用结构化输出提取多个用户)
- 分块流式传输: 当您希望流式传输单个对象并希望在响应传入时立即开始处理它时,这非常有用。
分块处理¶
from pydantic import BaseModel
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel
# Initialize with API key
client = instructor.from_openai(
OpenAI(api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com")
)
class User(BaseModel):
name: str
age: int
bio: str
user = client.chat.completions.create_partial(
model="deepseek-chat",
messages=[
{
"role": "user",
"content": "Create a user profile for Jason and a one sentence bio, age 25",
},
],
response_model=User,
)
for user_partial in user:
print(user_partial)
# > name='Jason' age=None bio='None'
# > name='Jason' age=25 bio='A tech'
# > name='Jason' age=25 bio='A tech enthusiast'
# > name='Jason' age=25 bio='A tech enthusiast who loves coding, gaming, and exploring new'
# > name='Jason' age=25 bio='A tech enthusiast who loves coding, gaming, and exploring new technologies'
可迭代示例¶
from pydantic import BaseModel
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel
# Initialize with API key
client = instructor.from_openai(
OpenAI(api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com")
)
class User(BaseModel):
name: str
age: int
# Extract multiple users from text
users = client.chat.completions.create_iterable(
model="deepseek-chat",
messages=[
{
"role": "user",
"content": """
Extract users:
1. Jason is 25 years old
2. Sarah is 30 years old
3. Mike is 28 years old
""",
},
],
response_model=User,
)
for user in users:
print(user)
#> name='Jason' age=25
#> name='Sarah' age=30
#> name='Mike' age=28
推理模型¶
由于 Instructor 是建立在 OpenAI API 之上的,我们可以从 deepseek-reasoner
模型中获取推理轨迹。请确保在此处配置 MD_JSON
模式以获得最佳体验。
import os
from openai import OpenAI
from pydantic import BaseModel
import instructor
from rich import print
client = instructor.from_openai(
OpenAI(api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com"),
mode=instructor.Mode.MD_JSON,
)
class User(BaseModel):
name: str
age: int
# Create structured output
completion, raw_completion = client.chat.completions.create_with_completion(
model="deepseek-reasoner",
messages=[
{"role": "user", "content": "Extract: Jason is 25 years old"},
],
response_model=User,
)
print(completion)
# > User(name='Jason', age=25)
print(raw_completion.choices[0].message.reasoning_content)
# > Okay, let's see. The user wants me to extract information from the sentence "Jason is 25 years old" and format it into a JSON object that matches the given schema. The schema requires a "name" and an "age", both of which are required.
# >
# > First, I need to identify the name. The sentence starts with "Jason", so that's the name. Then the age is given as "25 years old". The age should be an integer, so I need to convert "25" from a string to a number.
# >
# > So putting that together, the JSON should have "name": "Jason" and "age": 25. Let me double-check the schema to make sure there are no other requirements. The properties are "name" (string) and "age" (integer), both required. Yep, that's all.
# >
# > I need to make sure the JSON is correctly formatted, with commas and braces. Also, the user specified to return it in a json codeblock, not the schema itself. So the final answer should be a JSON object with those key-value pairs.
Instructor 模式¶
我们建议 Deepseek 使用 Mode.Tools
模式,这是 from_openai
方法的默认模式。
相关资源¶
更新和兼容性¶
Instructor 与最新的 OpenAI API 版本和模型保持兼容。请查看更新日志了解更新内容。