跳到内容

Instructor,最受欢迎的简单结构化输出库

由 LLMs 提供支持的结构化输出。设计简洁,透明,可控。


Twitter Follow Discord Downloads

Instructor 可以轻松地从 GPT-3.5、GPT-4、GPT-4-Vision 等 LLMs 以及包括 Mistral/MixtralOllamallama-cpp-python 在内的开源模型中获取 JSON 等结构化数据。

它以其简洁性、透明性和用户中心设计而脱颖而出,构建在 Pydantic 之上。Instructor 可帮助您管理验证上下文、使用 Tenacity 进行重试,以及流式传输列表 (Lists)部分 (Partial) 响应。

给仓库加星 示例集 Prompting 指南

pip install instructor
uv pip install instructor
poetry add instructor

如果您遇到困难,可以随时运行 instructor docs 在浏览器中打开文档。它甚至支持搜索特定主题。

instructor docs [QUERY]

新闻通讯

如果您想及时了解提示、新的博客文章和研究,请订阅我们的新闻通讯。您可以期待以下内容:

  • Instructor 功能和发布的更新
  • 关于 AI 和结构化输出的博客文章
  • 来自我们社区的技巧和窍门
  • LLMs 和结构化输出领域的研究
  • 使用 Instructor 进行 AI 开发的技能信息

订阅我们的新闻通讯,获取 AI 开发的最新信息。我们提供内容,让您随时了解最新进展,并帮助您在项目中使用 Instructor。

开始使用

如果您想查看所有集成,请参阅集成指南

pip install instructor

使用 OpenAI 的结构化输出响应

您现在可以在 Instructor 中使用 OpenAI 的结构化输出响应。此功能结合了 Instructor 的优势和 OpenAI 精确的采样能力。

client = instructor.from_openai(OpenAI(), mode=Mode.TOOLS_STRICT)
import instructor
from pydantic import BaseModel
from openai import OpenAI

# Define your desired output structure
class ExtractUser(BaseModel):
    name: str
    age: int

# Patch the OpenAI client
client = instructor.from_openai(OpenAI())

# Extract structured data from natural language
res = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=ExtractUser,
    messages=[{"role": "user", "content": "John Doe is 30 years old."}],
)

assert res.name == "John Doe"
assert res.age == 30

查看更多

pip install "instructor[ollama]"
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import List
import instructor

class ExtractUser(BaseModel):
    name: str
    age: int

client = instructor.from_openai(
    OpenAI(
        base_url="https://:11434/v1",
        api_key="ollama",
    ),
    mode=instructor.Mode.JSON,
)

resp = client.chat.completions.create(
    model="llama3",
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
    response_model=ExtractUser,
)
assert resp.name == "Jason"
assert resp.age == 25

查看更多

pip install "instructor[llama-cpp-python]"
import llama_cpp
import instructor
from llama_cpp.llama_speculative import LlamaPromptLookupDecoding
from pydantic import BaseModel

llama = llama_cpp.Llama(
    model_path="../../models/OpenHermes-2.5-Mistral-7B-GGUF/openhermes-2.5-mistral-7b.Q4_K_M.gguf",
    n_gpu_layers=-1,
    chat_format="chatml",
    n_ctx=2048,
    draft_model=LlamaPromptLookupDecoding(num_pred_tokens=2),
    logits_all=True,
    verbose=False,
)

create = instructor.patch(
    create=llama.create_chat_completion_openai_v1,
    mode=instructor.Mode.JSON_SCHEMA,
)

class ExtractUser(BaseModel):
    name: str
    age: int

user = create(
    messages=[
        {
            "role": "user",
            "content": "Extract `Jason is 30 years old`",
        }
    ],
    response_model=ExtractUser,
)

assert user.name == "Jason"
assert user.age == 30

查看更多

pip install "instructor[anthropic]"
import instructor
from anthropic import Anthropic
from pydantic import BaseModel

class ExtractUser(BaseModel):
    name: str
    age: int

client = instructor.from_anthropic(Anthropic())

# note that client.chat.completions.create will also work
resp = client.messages.create(
    model="claude-3-5-sonnet-20240620",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
    response_model=ExtractUser,
)

assert isinstance(resp, ExtractUser)
assert resp.name == "Jason"
assert resp.age == 25

查看更多

pip install "instructor[google-generativeai]"
import instructor
import google.generativeai as genai
from pydantic import BaseModel

class ExtractUser(BaseModel):
    name: str
    age: int

client = instructor.from_gemini(
    client=genai.GenerativeModel(
        model_name="models/gemini-1.5-flash-latest",
    ),
    mode=instructor.Mode.GEMINI_JSON,
)

# note that client.chat.completions.create will also work
resp = client.messages.create(
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
    response_model=ExtractUser,
)

assert isinstance(resp, ExtractUser)
assert resp.name == "Jason"
assert resp.age == 25

查看更多

pip install "instructor[vertexai]"
import instructor
import vertexai  # type: ignore
from vertexai.generative_models import GenerativeModel  # type: ignore
from pydantic import BaseModel

vertexai.init()

class ExtractUser(BaseModel):
    name: str
    age: int

client = instructor.from_vertexai(
    client=GenerativeModel("gemini-1.5-pro-preview-0409"),
    mode=instructor.Mode.VERTEXAI_TOOLS,
)

# note that client.chat.completions.create will also work
resp = client.create(
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
    response_model=ExtractUser,
)

assert isinstance(resp, ExtractUser)
assert resp.name == "Jason"
assert resp.age == 25

查看更多

pip install "instructor[groq]"
import instructor
from groq import Groq
from pydantic import BaseModel

client = instructor.from_groq(Groq())

class ExtractUser(BaseModel):
    name: str
    age: int

resp = client.chat.completions.create(
    model="llama3-70b-8192",
    response_model=ExtractUser,
    messages=[{"role": "user", "content": "Extract Jason is 25 years old."}],
)

assert resp.name == "Jason"
assert resp.age == 25

查看更多

pip install "instructor[litellm]"
import instructor
from litellm import completion
from pydantic import BaseModel

class ExtractUser(BaseModel):
    name: str
    age: int

client = instructor.from_litellm(completion)

resp = client.chat.completions.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
    response_model=ExtractUser,
)

assert isinstance(resp, ExtractUser)
assert resp.name == "Jason"
assert resp.age == 25

查看更多

pip install "instructor[cohere]"
import instructor
from pydantic import BaseModel
from cohere import Client

class ExtractUser(BaseModel):
    name: str
    age: int

client = instructor.from_cohere(Client())

resp = client.chat.completions.create(
    response_model=ExtractUser,
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
)

assert resp.name == "Jason"
assert resp.age == 25

查看更多

pip install "instructor[cerebras]"
from cerebras.cloud.sdk import Cerebras
import instructor
from pydantic import BaseModel
import os

client = Cerebras(
    api_key=os.environ.get("CEREBRAS_API_KEY"),
)
client = instructor.from_cerebras(client)

class ExtractUser(BaseModel):
    name: str
    age: int

resp = client.chat.completions.create(
    model="llama3.1-70b",
    response_model=ExtractUser,
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
)

assert resp.name == "Jason"
assert resp.age == 25

查看更多

pip install "instructor[fireworks]"
from fireworks.client import Fireworks
import instructor
from pydantic import BaseModel
import os

client = Fireworks(
    api_key=os.environ.get("FIREWORKS_API_KEY"),
)
client = instructor.from_fireworks(client)

class ExtractUser(BaseModel):
    name: str
    age: int

resp = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p2-1b-instruct",
    response_model=ExtractUser,
    messages=[
        {
            "role": "user",
            "content": "Extract Jason is 25 years old.",
        }
    ],
)

assert resp.name == "Jason"
assert resp.age == 25

查看更多

引用

如果您在研究或项目中使用 Instructor,请引用如下:

@software{liu2024instructor,
  author = {Jason Liu and Contributors},
  title = {Instructor: A library for structured outputs from large language models},
  url = {https://github.com/instructor-ai/instructor},
  year = {2024},
  month = {3}
}

为什么要使用 Instructor?

使用 Hooks

Instructor 包含一个 hooks 系统,允许您在语言模型交互过程中管理事件。Hooks 使您能够在不同阶段拦截、记录和处理事件,例如在提供 completion 参数或接收到响应时。该系统基于处理事件注册和发出的 Hooks 类。您可以使用 hooks 添加自定义行为,如日志记录或错误处理。下面是一个演示如何使用 hooks 的简单示例

import instructor
from openai import OpenAI
from pydantic import BaseModel


class UserInfo(BaseModel):
    name: str
    age: int


# Initialize the OpenAI client with Instructor
client = instructor.from_openai(OpenAI())


# Define hook functions
def log_kwargs(**kwargs):
    print(f"Function called with kwargs: {kwargs}")


def log_exception(exception: Exception):
    print(f"An exception occurred: {str(exception)}")


client.on("completion:kwargs", log_kwargs)
client.on("completion:error", log_exception)

user_info = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=UserInfo,
    messages=[
        {"role": "user", "content": "Extract the user name: 'John is 20 years old'"}
    ],
)

"""
{
        'args': (),
        'kwargs': {
            'messages': [
                {
                    'role': 'user',
                    'content': "Extract the user name: 'John is 20 years old'",
                }
            ],
            'model': 'gpt-3.5-turbo',
            'tools': [
                {
                    'type': 'function',
                    'function': {
                        'name': 'UserInfo',
                        'description': 'Correctly extracted `UserInfo` with all the required parameters with correct types',
                        'parameters': {
                            'properties': {
                                'name': {'title': 'Name', 'type': 'string'},
                                'age': {'title': 'Age', 'type': 'integer'},
                            },
                            'required': ['age', 'name'],
                            'type': 'object',
                        },
                    },
                }
            ],
            'tool_choice': {'type': 'function', 'function': {'name': 'UserInfo'}},
        },
    }
"""

print(f"Name: {user_info.name}, Age: {user_info.age}")
#> Name: John, Age: 20

此示例演示了:1. 一个记录传递给函数的全部 kwargs 的预执行 hook。2. 一个记录执行期间发生的任何异常的异常 hook。

这些 hooks 提供了关于函数输入和任何错误的宝贵见解,增强了调试和监控能力。

了解更多关于 hooks 的信息 :octicons-arrow-right

正确的类型推断

这是 Instructor 最初的设想,但由于对 openai 进行了补丁处理,我无法使其类型提示良好工作。现在,有了新的客户端,我们可以让类型提示很好地工作了!我们还添加了一些 create_* 方法,以便更容易创建可迭代对象和部分对象,并访问原始 completion。

调用 create

import openai
import instructor
from pydantic import BaseModel


class User(BaseModel):
    name: str
    age: int


client = instructor.from_openai(openai.OpenAI())

user = client.chat.completions.create(
    model="gpt-4-turbo-preview",
    messages=[
        {"role": "user", "content": "Create a user"},
    ],
    response_model=User,
)

现在,如果您使用 IDE,您可以看到类型被正确推断出来。

type

处理异步: await create

这也将与异步客户端正确配合工作。

import openai
import instructor
from pydantic import BaseModel


client = instructor.from_openai(openai.AsyncOpenAI())


class User(BaseModel):
    name: str
    age: int


async def extract():
    return await client.chat.completions.create(
        model="gpt-4-turbo-preview",
        messages=[
            {"role": "user", "content": "Create a user"},
        ],
        response_model=User,
    )

注意,仅仅因为我们返回 create 方法,extract() 函数就会返回正确的用户类型。

async

返回原始 completion: create_with_completion

您还可以返回原始的 completion 对象

import openai
import instructor
from pydantic import BaseModel


client = instructor.from_openai(openai.OpenAI())


class User(BaseModel):
    name: str
    age: int


user, completion = client.chat.completions.create_with_completion(
    model="gpt-4-turbo-preview",
    messages=[
        {"role": "user", "content": "Create a user"},
    ],
    response_model=User,
)

with_completion

流式传输部分对象: create_partial

为了处理流,我们仍然支持 Iterable[T]Partial[T],但为了简化类型推断,我们也添加了 create_iterablecreate_partial 方法!

import openai
import instructor
from pydantic import BaseModel


client = instructor.from_openai(openai.OpenAI())


class User(BaseModel):
    name: str
    age: int


user_stream = client.chat.completions.create_partial(
    model="gpt-4-turbo-preview",
    messages=[
        {"role": "user", "content": "Create a user"},
    ],
    response_model=User,
)

for user in user_stream:
    print(user)
    #> name=None age=None
    #> name=None age=None
    #> name=None age=None
    #> name=None age=None
    #> name=None age=25
    #> name=None age=25
    #> name=None age=25
    #> name=None age=25
    #> name=None age=25
    #> name=None age=25
    #> name='John Doe' age=25
    # name=None age=None
    # name='' age=None
    # name='John' age=None
    # name='John Doe' age=None
    # name='John Doe' age=30

现在请注意,推断出的类型是 Generator[User, None]

generator

流式传输可迭代对象: create_iterable

当我们要提取多个对象时,会得到一个对象的可迭代对象。

import openai
import instructor
from pydantic import BaseModel


client = instructor.from_openai(openai.OpenAI())


class User(BaseModel):
    name: str
    age: int


users = client.chat.completions.create_iterable(
    model="gpt-4-turbo-preview",
    messages=[
        {"role": "user", "content": "Create 2 users"},
    ],
    response_model=User,
)

for user in users:
    print(user)
    #> name='John Doe' age=30
    #> name='Jane Doe' age=28
    # User(name='John Doe', age=30)
    # User(name='Jane Smith', age=25)

iterable

模板化

Instructor 支持使用 Jinja 进行模板化,这允许您创建动态提示。当您想用数据填充提示的某些部分时,这非常有用。下面是一个简单示例

import openai
import instructor
from pydantic import BaseModel

client = instructor.from_openai(openai.OpenAI())

class User(BaseModel):
    name: str
    age: int

# Create a completion using a Jinja template in the message content
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": """Extract the information from the
            following text: {{ data }}`""",
        },
    ],
    response_model=User,
    context={"data": "John Doe is thirty years old"},
)

print(response)
#> User(name='John Doe', age=30)

了解更多关于模板化的信息 :octicons-arrow-right

验证

您还可以使用 Pydantic 验证您的输出,并在失败时让 LLM 重试。请查看我们关于重试验证上下文的文档。

import instructor
from openai import OpenAI
from pydantic import BaseModel, ValidationError, BeforeValidator
from typing_extensions import Annotated
from instructor import llm_validator

# Apply the patch to the OpenAI client
client = instructor.from_openai(OpenAI())

class QuestionAnswer(BaseModel):
    question: str
    answer: Annotated[
        str,
        BeforeValidator(llm_validator("don't say objectionable things", client=client)),
    ]

try:
    qa = QuestionAnswer(
        question="What is the meaning of life?",
        answer="The meaning of life is to be evil and steal",
    )
except ValidationError as e:
    print(e)
    """
    1 validation error for QuestionAnswer
    answer
      Assertion failed, The statement promotes objectionable behavior by encouraging evil and stealing. [type=assertion_error, input_value='The meaning of life is to be evil and steal', input_type=str]
    """

贡献

如果您想提供帮助,请查看标记为 good-first-issuehelp-wanted 的一些议题。可以在这里找到它们。它们可能包括代码改进、客座博客文章或新的示例集。

许可

本项目根据 MIT 许可条款获得许可。