跳到内容

使用 DeepSeek 进行结构化输出,附带 Instructor 的完整指南

DeepSeek 是一家提供 AI 模型和服务的中国公司。他们最著名的模型是 deepseek coder 和 chat 模型,最近还推出了 R1 推理模型。

本指南涵盖了如何使用 DeepSeek 和 Instructor 获取类型安全、经过验证的响应所需了解的一切。

快速开始

Instructor 开箱即用地支持 OpenAI 客户端,因此您无需安装任何额外的东西。

pip install "instructor"

⚠️ 重要: 在使用客户端之前,您必须设置 DeepSeek API 密钥。您可以通过两种方式完成此操作

  1. 设置环境变量
export DEEPSEEK_API_KEY='your-api-key-here'
  1. 或者直接将其提供给客户端
import os
from openai import OpenAI

client = OpenAI(api_key=os.getenv('DEEPSEEK_API_KEY'), base_url="https://api.deepseek.com")

简单用户示例 (同步)

import os
from openai import OpenAI
from pydantic import BaseModel
import instructor

client = instructor.from_openai(
    OpenAI(api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com")
)


class User(BaseModel):
    name: str
    age: int


# Create structured output
user = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "user", "content": "Extract: Jason is 25 years old"},
    ],
    response_model=User,
)

print(user)
# > name='Jason' age=25

简单用户示例 (异步)

import os
import asyncio
from openai import AsyncOpenAI
from pydantic import BaseModel
import instructor

client = instructor.from_openai(
    AsyncOpenAI(
        api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com"
    )
)


class User(BaseModel):
    name: str
    age: int


async def extract_user():
    user = await client.chat.completions.create(
        model="deepseek-chat",
        messages=[
            {"role": "user", "content": "Extract: Jason is 25 years old"},
        ],
        response_model=User,
    )
    return user


# Run async function
user = asyncio.run(extract_user())
print(user)
# > name='Jason' age=25

嵌套示例

from pydantic import BaseModel
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel


class Address(BaseModel):
    street: str
    city: str
    country: str


class User(BaseModel):
    name: str
    age: int
    addresses: list[Address]


# Initialize with API key
client = instructor.from_openai(
    OpenAI(api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com")
)


# Create structured output with nested objects
user = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {
            "role": "user",
            "content": """
            Extract: Jason is 25 years old.
            He lives at 123 Main St, New York, USA
            and has a summer house at 456 Beach Rd, Miami, USA
        """,
        },
    ],
    response_model=User,
)

print(user)

#> {
#>     'name': 'Jason',
#>     'age': 25,
#>     'addresses': [
#>         {
#>             'street': '123 Main St',
#>             'city': 'New York',
#>             'country': 'USA'
#>         },
#>         {
#>             'street': '456 Beach Rd',
#>             'city': 'Miami',
#>             'country': 'USA'
#>         }
#>     ]
#> }

流式支持

Instructor 提供两种主要的流式输出响应方式

  1. 可迭代对象: 当您希望流式传输同类型对象的列表时,这非常有用(例如,使用结构化输出提取多个用户)
  2. 分块流式传输: 当您希望流式传输单个对象并希望在响应传入时立即开始处理它时,这非常有用。

分块处理

from pydantic import BaseModel
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel


# Initialize with API key
client = instructor.from_openai(
    OpenAI(api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com")
)


class User(BaseModel):
    name: str
    age: int
    bio: str


user = client.chat.completions.create_partial(
    model="deepseek-chat",
    messages=[
        {
            "role": "user",
            "content": "Create a user profile for Jason and a one sentence bio, age 25",
        },
    ],
    response_model=User,
)

for user_partial in user:
    print(user_partial)


# > name='Jason' age=None bio='None'
# > name='Jason' age=25 bio='A tech'
# > name='Jason' age=25 bio='A tech enthusiast'
# > name='Jason' age=25 bio='A tech enthusiast who loves coding, gaming, and exploring new'
# > name='Jason' age=25 bio='A tech enthusiast who loves coding, gaming, and exploring new technologies'

可迭代示例

from pydantic import BaseModel
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel


# Initialize with API key
client = instructor.from_openai(
    OpenAI(api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com")
)


class User(BaseModel):
    name: str
    age: int


# Extract multiple users from text
users = client.chat.completions.create_iterable(
    model="deepseek-chat",
    messages=[
        {
            "role": "user",
            "content": """
            Extract users:
            1. Jason is 25 years old
            2. Sarah is 30 years old
            3. Mike is 28 years old
        """,
        },
    ],
    response_model=User,
)

for user in users:
    print(user)

    #> name='Jason' age=25
    #> name='Sarah' age=30
    #> name='Mike' age=28

推理模型

由于 Instructor 是建立在 OpenAI API 之上的,我们可以从 deepseek-reasoner 模型中获取推理轨迹。请确保在此处配置 MD_JSON 模式以获得最佳体验。

import os
from openai import OpenAI
from pydantic import BaseModel
import instructor
from rich import print

client = instructor.from_openai(
    OpenAI(api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com"),
    mode=instructor.Mode.MD_JSON,
)


class User(BaseModel):
    name: str
    age: int


# Create structured output
completion, raw_completion = client.chat.completions.create_with_completion(
    model="deepseek-reasoner",
    messages=[
        {"role": "user", "content": "Extract: Jason is 25 years old"},
    ],
    response_model=User,
)

print(completion)
# > User(name='Jason', age=25)
print(raw_completion.choices[0].message.reasoning_content)
# > Okay, let's see. The user wants me to extract information from the sentence "Jason is 25 years old" and format it into a JSON object that matches the given schema. The schema requires a "name" and an "age", both of which are required.
# >
# > First, I need to identify the name. The sentence starts with "Jason", so that's the name. Then the age is given as "25 years old". The age should be an integer, so I need to convert "25" from a string to a number.
# >
# > So putting that together, the JSON should have "name": "Jason" and "age": 25. Let me double-check the schema to make sure there are no other requirements. The properties are "name" (string) and "age" (integer), both required. Yep, that's all.
# >
# > I need to make sure the JSON is correctly formatted, with commas and braces. Also, the user specified to return it in a json codeblock, not the schema itself. So the final answer should be a JSON object with those key-value pairs.

Instructor 模式

我们建议 Deepseek 使用 Mode.Tools 模式,这是 from_openai 方法的默认模式。

更新和兼容性

Instructor 与最新的 OpenAI API 版本和模型保持兼容。请查看更新日志了解更新内容。