复杂主题的知识图谱可视化¶

在本指南中，您将了解如何在处理复杂主题时可视化详细的知识图谱。然后，我们将通过一系列连续的 API 调用，仅使用 Instructor 库、Pydantic 和 Graphviz 来可视化我们的图谱，从而使用新信息迭代更新知识图谱。

动机

知识图谱提供了一种直观且连贯的方式来理解量子力学等复杂主题。通过自动生成这些图谱，您可以加速学习过程，使复杂信息更容易消化。

定义结构¶

让我们使用 **节点 (Node)** 和 **边 (Edge)** 对象来建模知识图谱。**节点 (Node)** 对象表示关键概念或实体，而 **边 (Edge)** 对象表示它们之间的关系。

from pydantic import BaseModel, Field
from typing import List


class Node(BaseModel, frozen=True):
    id: int
    label: str
    color: str


class Edge(BaseModel, frozen=True):
    source: int
    target: int
    label: str
    color: str = "black"


class KnowledgeGraph(BaseModel):
    nodes: List[Node] = Field(..., default_factory=list)
    edges: List[Edge] = Field(..., default_factory=list)

生成知识图谱¶

**generate_graph** 函数利用 OpenAI 的 API 根据输入查询生成知识图谱。

from openai import OpenAI
import instructor


# Adds response_model to ChatCompletion
# Allows the return of Pydantic model rather than raw JSON
client = instructor.from_openai(OpenAI())


def generate_graph(input) -> KnowledgeGraph:
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "user",
                "content": f"Help me understand the following by describing it as a detailed knowledge graph: {input}",
            }
        ],
        response_model=KnowledgeGraph,
    )  # type: ignore

图谱可视化¶

**visualize_knowledge_graph** 函数使用 Graphviz 库渲染生成的知识图谱。

from graphviz import Digraph



def visualize_knowledge_graph(kg: KnowledgeGraph):
    dot = Digraph(comment="Knowledge Graph")

    # Add nodes
    for node in kg.nodes:
        dot.node(str(node.id), node.label, color=node.color)

    # Add edges
    for edge in kg.edges:
        dot.edge(str(edge.source), str(edge.target), label=edge.label, color=edge.color)

    # Render the graph
    dot.render("knowledge_graph.gv", view=True)


graph = generate_graph("Teach me about quantum mechanics")
visualize_knowledge_graph(graph)

Knowledge Graph

这将生成知识图谱的可视化表示，存储为“knowledge_graph.gv”。您可以打开此文件来探索量子力学中的关键概念及其关系。

迭代更新¶

现在我们已经了解了如何从单个输入生成知识图谱，接下来我们将看看如何使用新信息迭代更新我们的知识图谱，或者当信息无法放入单个提示时该如何处理。

让我们看一个简单的例子，我们将可视化以下句子所代表的组合知识图谱。

text_chunks = [
    "Jason knows a lot about quantum mechanics. He is a physicist. He is a professor",
    "Professors are smart.",
    "Sarah knows Jason and is a student of his.",
    "Sarah is a student at the University of Toronto. and UofT is in Canada",
]

更新数据模型¶

为了支持新的迭代方法，我们需要更新数据模型。可以通过向 Pydantic 模型添加辅助方法 update 和 draw 来实现。这些方法将简化代码并使我们能够轻松可视化知识图谱。

在 KnowledgeGraph 类中，我们将 visualize_knowledge_graph 方法中的代码迁移过来，并为节点和边添加了新的列表。

from pydantic import BaseModel, Field
from typing import List, Optional


class Node(BaseModel, frozen=True):
    id: int
    label: str
    color: str


class Edge(BaseModel, frozen=True):
    source: int
    target: int
    label: str
    color: str = "black"


class KnowledgeGraph(BaseModel):
    nodes: Optional[List[Node]] = Field(..., default_factory=list)
    edges: Optional[List[Edge]] = Field(..., default_factory=list)

    def update(self, other: "KnowledgeGraph") -> "KnowledgeGraph":
        """Updates the current graph with the other graph, deduplicating nodes and edges."""
        return KnowledgeGraph(
            nodes=list(set(self.nodes + other.nodes)),
            edges=list(set(self.edges + other.edges)),
        )

    def draw(self, prefix: str = None):
        dot = Digraph(comment="Knowledge Graph")

        for node in self.nodes:  # (1)!
            dot.node(str(node.id), node.label, color=node.color)

        for edge in self.edges:  # (2)!
            dot.edge(
                str(edge.source), str(edge.target), label=edge.label, color=edge.color
            )
        dot.render(prefix, format="png", view=True)

我们遍历图谱中的所有节点并将它们添加到图谱中
我们遍历图谱中的所有边并将它们添加到图谱中

我们可以修改 generate_graph 函数，使其现在接受字符串列表。在每个步骤中，它会像之前一样从句子中提取出边和节点形式的关键信息。然后，我们可以通过对图谱进行迭代更新，将这些新的边和节点与现有知识图谱结合起来，最终得到结果。

from typing import List



def generate_graph(input: List[str]) -> KnowledgeGraph:
    cur_state = KnowledgeGraph()  # (1)!
    num_iterations = len(input)
    for i, inp in enumerate(input):
        new_updates = client.chat.completions.create(
            model="gpt-3.5-turbo-16k",
            messages=[
                {
                    "role": "system",
                    "content": """You are an iterative knowledge graph builder.
                    You are given the current state of the graph, and you must append the nodes and edges
                    to it Do not procide any duplcates and try to reuse nodes as much as possible.""",
                },
                {
                    "role": "user",
                    "content": f"""Extract any new nodes and edges from the following:
                    # Part {i}/{num_iterations} of the input:

                    {inp}""",
                },
                {
                    "role": "user",
                    "content": f"""Here is the current state of the graph:
                    {cur_state.model_dump_json(indent=2)}""",
                },  # (2)!
            ],
            response_model=KnowledgeGraph,
        )  # type: ignore

        # Update the current state
        cur_state = cur_state.update(new_updates)  # (3)!
        cur_state.draw(prefix=f"iteration_{i}")
    return cur_state

我们首先初始化一个空的 KnowledgeGraph。在此状态下，它没有节点和边。
然后我们将图谱的当前状态添加到提示中，以便模型知道需要添加哪些新信息
然后我们使用模型返回的信息更新图谱的节点和边，再可视化新的更改

完成此操作后，我们现在可以使用以下两行运行新的 generate_graph 函数。

text_chunks = [
    "Jason knows a lot about quantum mechanics. He is a physicist. He is a professor",
    "Professors are smart.",
    "Sarah knows Jason and is a student of his.",
    "Sarah is a student at the University of Toronto. and UofT is in Canada",
]
graph: KnowledgeGraph = generate_graph(text_chunks)
graph.draw(prefix="final")

结论¶

我们已经了解了如何使用 Instructor 从 OpenAI LLM API 获取结构化输出，您也可以将其用于该库兼容的任何其他开源模型。如果您喜欢这些内容或想尝试 Instructor，请访问 github 并别忘了给我们一个星标！