你的常见问题机器人不需要博士学位：大语言模型查询路由与 Elastic 工作流

原创于 2026-06-16 08:24:12 发布 · 257 阅读

1 ·

CC 4.0 BY-SA版权

本文为博主原创文章，未经博主允许不得转载。

文章标签：

#语言模型 #人工智能 #自然语言处理 #elasticsearch #大数据 #搜索引擎 #数据库

Elasticsearch 同时被 3 个专栏收录

1579 篇文章

订阅专栏

AI Builder

49 篇文章

订阅专栏

Workflow

21 篇文章

订阅专栏

作者：来自 Elastic Jeffrey Rengifo

基于 Elasticsearch 搜索元数据按复杂度路由 LLM 查询：Mistral Small 用于 FAQ 问题，Claude Sonnet 用于多源综合。

Elasticsearch 与行业领先的生成式 AI 工具和提供商具有原生集成。查看我们关于 “超越 RAG 基础” 的网络研讨会，或使用 Elastic 向量数据库构建可用于生产的应用。

为了为你的用例构建最佳搜索解决方案，可以启动免费云试用，或立即在本地机器上尝试 Elastic。

将每个客户支持查询发送到大模型意味着你的简单 FAQ 答案会和最复杂的问题一样慢、一样昂贵。本文展示了如何在 Elastic Workflows 中构建一个双模型路由系统：Mistral Small 直接处理来自单个 FAQ 文章的简单问题；Claude Sonnet 在查询需要时跨多个知识库来源进行答案综合。路由决策仅基于搜索元数据完成，使得每次查询的分类都保持低成本和高速度。

前置条件

运行 Elasticsearch 9.3+ 的 Elastic Cloud 部署，或启动免费试用
已启用 Workflows（高级设置）
Python 3.9+
一个 Mistral API 密钥

此系统中 LLM 查询路由的工作方式

我们将构建一个两阶段系统：一个路由器用于决定如何回答，一个回答模型用于生成响应。

路由器会查看查询以及顶部搜索结果的元数据，例如分数、类别和复杂度标签。在此基础上，它会选择两种策略之一：直接基于顶部 FAQ 文章回答，或跨多篇文章进行带引用的综合回答。这个决策可以仅通过结构化信号完成，因此可以由一个小型、快速的模型来处理。

回答阶段则不同。单一文章的回答是有边界的任务，小模型可以很好地完成并快速返回结果。多来源带引用的综合回答则更适合能力更强的模型，多花一些时间是值得的。将每个查询匹配到合适的模型，可以让简单答案更快，复杂答案更好。

为什么用小模型做路由而不是大模型？

因为路由器会在每个查询上运行，包括简单查询。一个慢速路由器会让所有答案变慢，即使那些本来小模型就可以更快生成的答案也会被拖慢。

关键设计在于：路由步骤只查看元数据，而不是完整文档。比如一个查询 “my OTG isn't heating evenly”，只需要知道顶部命中结果属于 “Product Troubleshooting - Appliances” 类别，并且 issue_complexity: medium，而不需要完整对话记录。这使得分类 prompt 保持很小（几百 tokens）且成本低。完整文章内容只在响应阶段才加载一次。

设置 AI 连接器

我们使用两个 AI connectors 来实现这个工作流：

我们在工作流中使用两个 AI 连接器：

连接器模型类型角色
Mistral Small	mistral-small-latest	自定义（兼容 OpenAI）	基于元数据对查询复杂度进行分类，回答简单的 FAQ 类型问题
Anthropic Claude Sonnet 4.6	Claude Sonnet	Elastic 托管 LLM	从多篇文章中综合生成复杂答案，并提供引用

两个连接器都是按每百万 token 计费，其中较小的模型成本明显更低。将简单查询路由到该模型可以在降低成本的同时带来延迟优势。要了解更多关于 Elastic 托管的 LLM，请查看此文档。

Claude Sonnet 连接器已经作为 Elastic 托管的大型语言模型（LLM）提供。我们只需要为 Mistral 使用 .gen-ai 连接器类型创建一个自定义连接器，它支持任何兼容 OpenAI 的 API。你也可以通过 Kibana UI 创建它。

本文中的所有设置代码都可以在配套 notebook 中找到。你可以在阅读过程中在其中逐段运行。

SMALL_LLM_CONNECTOR = "Mistral Small"

headers = {
    "Authorization": f"ApiKey {ELASTICSEARCH_API_KEY}",
    "kbn-xsrf": "true",
    "Content-Type": "application/json",
}

mistral_connector_payload = {
    "connector_type_id": ".gen-ai",
    "name": SMALL_LLM_CONNECTOR,
    "config": {
        "apiProvider": "Other",
        "apiUrl": "https://api.mistral.ai/v1/chat/completions",
        "defaultModel": "mistral-small-latest",
    },
    "secrets": {
        "apiKey": MISTRAL_API_KEY,
    },
}

response = requests.post(
    f"{KIBANA_URL}/api/actions/connector",
    headers=headers,
    json=mistral_connector_payload,
)
result = response.json()
MISTRAL_CONNECTOR_ID = result.get("id")

连接器 ID 由 Kibana 自动生成。我们让平台来处理这一点，而不是手动设置。

创建之后，该连接器会出现在 Connectors（连接器）界面中：

加载并索引数据集

我们使用来自 Hugging Face 的 e-commerce-customer-support-qa 数据集。该数据集包含来自一个电子商务（电子商务）平台（BrownBox）的 1000 条真实客户支持交互记录，其中包括客户问题、客服解决方案、问题类别、复杂度等级以及客户情感。

索引映射使用语义文本搭配来自 Elastic Inference Service 的 Jina 嵌入 v5 文本模型模型。该字段端到端处理语义搜索：嵌入生成、分块以及查询。我们使用 copy_to 字段将对话和问答摘要聚合为一个可搜索字段：

es_client.indices.create(
    index="support-knowledge-base",
    mappings={
        "properties": {
            "conversation": {
                "type": "text",
                "copy_to": "semantic_content",
            },
            "qa": {
                "type": "text",
                "copy_to": "semantic_content",
            },
            "issue_area": {"type": "keyword"},
            "issue_category": {"type": "keyword"},
            "issue_complexity": {"type": "keyword"},
            "product_category": {"type": "keyword"},
            "semantic_content": {
                "type": "semantic_text",
                "inference_id": ".jina-embeddings-v5-text-small",
            },
        }
    },
)

在 Elastic Workflows YAML 中定义查询路由工作流

该路由工作流包含四个步骤：语义搜索、仅基于元数据的分类、条件分支，以及与模型匹配的响应步骤。

我们使用 Elastic Workflows 来封装这套路由逻辑。Workflows 让我们可以：

将三分类逻辑作为工具暴露给 Elastic Agent Builder，从而让对话式 Agent 可以调用它
直接触发执行，包括手动运行、定时任务或告警触发

这种灵活性使得同一套逻辑可以同时服务程序化接口和对话式接口，而无需重复开发代码。

Workflows 使用 YAML 定义，并直接在 Workflow UI 中配置（Elasticsearch > Workflows > Create a New Workflow）。每个步骤都可以查询 Elasticsearch、调用 Kibana API 或提示 LLM。

下面是完整的 workflow 定义：

name: support_query_router
description: >
  Routes customer queries to the appropriate LLM based on complexity.
  Searches the KB, classifies using only result metadata (cheap),
  then routes to a small or large model depending on complexity.
enabled: true

inputs:
  - name: query
    type: string
    description: The customer support query
    required: true

consts:
  indexName: support-knowledge-base

triggers:
  - type: manual

steps:
  # Step 1: Search the knowledge base using semantic search
  - name: search_es
    type: elasticsearch.search
    with:
      index: "{{ consts.indexName }}"
      query:
        semantic:
          field: semantic_content
          query: "{{ inputs.query }}"
      size: 5

  # Step 2: Classify using only METADATA (Mistral Small - cheap)
  # We deliberately do NOT pass the full documents here. The routing
  # decision only needs to know the shape of the results: which
  # categories they hit, their complexity labels, and their scores.
  - name: classify_query
    type: ai.prompt
    with:
      connectorId: Mistral Small
      prompt: >
        You are a support query classifier. Based on the customer query
        and the metadata of the top knowledge base hits below, decide
        how this query should be handled.

        Return ONLY a JSON object with:
        - "complexity": "simple" if the top hit clearly matches a single
          FAQ (high score, low-complexity category, single product area),
          or "complex" if the query spans multiple categories, the top
          hits have medium/high complexity labels, or the results are
          weakly matched.
        - "reasoning": one-line explanation.

        Customer query: {{ inputs.query }}

        Top 5 results (metadata only):
        1. score={{ steps.search_es.output.hits.hits[0]._score }}, category={{ steps.search_es.output.hits.hits[0]._source.issue_category_sub_category }}, complexity={{ steps.search_es.output.hits.hits[0]._source.issue_complexity }}, product={{ steps.search_es.output.hits.hits[0]._source.product_category }}
        2. score={{ steps.search_es.output.hits.hits[1]._score }}, category={{ steps.search_es.output.hits.hits[1]._source.issue_category_sub_category }}, complexity={{ steps.search_es.output.hits.hits[1]._source.issue_complexity }}, product={{ steps.search_es.output.hits.hits[1]._source.product_category }}
        3. score={{ steps.search_es.output.hits.hits[2]._score }}, category={{ steps.search_es.output.hits.hits[2]._source.issue_category_sub_category }}, complexity={{ steps.search_es.output.hits.hits[2]._source.issue_complexity }}, product={{ steps.search_es.output.hits.hits[2]._source.product_category }}
        4. score={{ steps.search_es.output.hits.hits[3]._score }}, category={{ steps.search_es.output.hits.hits[3]._source.issue_category_sub_category }}, complexity={{ steps.search_es.output.hits.hits[3]._source.issue_complexity }}, product={{ steps.search_es.output.hits.hits[3]._source.product_category }}
        5. score={{ steps.search_es.output.hits.hits[4]._score }}, category={{ steps.search_es.output.hits.hits[4]._source.issue_category_sub_category }}, complexity={{ steps.search_es.output.hits.hits[4]._source.issue_complexity }}, product={{ steps.search_es.output.hits.hits[4]._source.product_category }}

  # Step 3: Route based on complexity
  - name: route_by_complexity
    type: if
    condition: "${{ steps.classify_query.output.complexity == 'simple' }}"
    steps:
      # Simple: answer directly from FAQ snippet (Mistral Small)
      - name: answer_from_faq
        type: ai.prompt
        with:
          connectorId: Mistral Small
          prompt: >
            You are a customer support agent. Answer the customer's question
            using ONLY the FAQ article below. Be concise, friendly, and
            include specific steps if applicable.

            Customer query: {{ inputs.query }}

            FAQ article:
            {{ steps.search_es.output.hits.hits[0]._source | json }}
    else:
      # Complex: synthesize from multiple articles (Claude Sonnet)
      - name: synthesize_answer
        type: ai.prompt
        with:
          connectorId: Anthropic Claude Sonnet 4.6
          prompt: >
            You are a senior customer support specialist. The customer's query
            requires careful analysis across multiple knowledge base articles.

            Provide a detailed, empathetic response that:
            1. Addresses all aspects of the customer's question
            2. Cites specific articles from the knowledge base (reference them
               by their question/title)
            3. Provides clear resolution steps
            4. Notes if any part of the query isn't covered by the KB

            Customer query: {{ inputs.query }}

            Knowledge base articles:
            {{ steps.search_es.output.hits.hits | json }}

工作流包含四个关键部分：

search_es 使用 elasticsearch.search 搭配语义查询来查找最相关的五篇文章。
classify_query 将客户查询以及搜索结果中的仅元数据发送给 Mistral Small。该提示词包含评分、类别、复杂度标签以及产品分类。这一步保持分类成本低，避免消耗大量 token。
route_by_complexity 使用 if 步骤根据分类器输出进行分支。
响应步骤取决于路由。对于简单查询，Mistral Small 会获取排名最高的 FAQ 文章并进行改写。对于复杂查询，Claude Sonnet 会获取全部五篇文章，并综合生成带引用的详细回答。这是唯一加载完整文档内容的步骤。

在 Agent Builder 中将工作流作为工具使用

除了默认触发方式（手动、定时、告警）之外，工作流还可以作为工具暴露在 Elastic Agent Builder 中。这增加了一层对话能力，用户通过聊天界面交互，由 agent 决定何时调用该工作流。

我们使用 Agent Builder API 来创建工具和 agent。完成在 Kibana UI 中创建工作流后，复制其 ID，并用它来注册该工作流作为工具：

WORKFLOW_ID = "workflow-aaf77e41-37cf-48a8-973b-c853f71e4fae"

# Create the workflow tool
workflow_tool_payload = {
    "id": "run_support_query_router",
    "type": "workflow",
    "description": (
        "Routes a customer support query through the triage workflow. "
        "Searches the knowledge base, classifies query complexity, and "
        "generates a response using the appropriate model. Use this tool "
        "whenever a customer asks a support question."
    ),
    "tags": ["support", "triage", "workflow"],
    "configuration": {
        "workflow_id": WORKFLOW_ID,
    },
}

response = requests.post(
    f"{KIBANA_URL}/api/agent_builder/tools",
    headers=headers,
    json=workflow_tool_payload,
)

然后创建一个使用该工具的 agent：

agent_payload = {
    "id": "support-query-agent",
    "name": "Support Query Agent",
    "description": "Customer support agent that routes queries through a multi-model workflow.",
    "labels": ["support", "e-commerce"],
    "configuration": {
        "instructions": (
            "You are a customer support assistant for BrownBox, an e-commerce platform. "
            "When a customer asks a support question, use the `run_support_query_router` tool "
            "to process it. The tool will search the knowledge base, classify the query, "
            "and generate an appropriate response.\n\n"
            "Present the response to the customer in a friendly, professional tone."
        ),
        "tools": [{"tool_ids": ["run_support_query_router"]}],
    },
}

response = requests.post(
    f"{KIBANA_URL}/api/agent_builder/agents",
    headers=headers,
    json=agent_payload,
)

该 agent 现在已在 Kibana 的 Elastic Agent Builder UI 中可用。你也可以通过 Agent Builder UI 直接创建该 agent 及其工具。

创建完成后，该 agent 会在 Agent Builder UI 中显示，并已分配对应的 workflow 工具：

测试简单查询与复杂查询路由

简单查询

"How do I track my order?"

复杂查询

复杂查询会触发不同的路径。系统不会只依赖单一 FAQ，而是会检索多个相关文档，并将结果连同分类标签、评分以及问题类型一起发送给 Claude Sonnet 进行综合处理。

与简单查询不同，这里会加载全部五篇文章的内容，用于生成更全面的回答，并在输出中提供引用与跨文档整合的解释。

"I ordered an OTG last week and it arrived damaged. I also noticed I was
charged twice on my credit card. I want a replacement for the OTG and a
refund for the duplicate charge. Also, my account shows the wrong delivery
address - can you update it?"

该查询涉及三个不同问题（产品损坏、重复扣款、地址更新），并分布在不同支持类别中。工作流将其分类为复杂（complex），并路由到 Claude Sonnet，该模型会从多个知识库文章中进行信息整合，将每个问题分别处理，引用相关文章，并为每个问题提供清晰的解决步骤。

结论

在 Elasticsearch 中按复杂度路由 LLM 查询，可以在不影响质量的情况下，为简单查询降低延迟和成本。

小模型可以以极快速度回答 FAQ 风格问题，而大模型则保留给真正需要其能力的复杂问题。同时也带来了成本节省：路由到小模型的简单查询本身就更便宜。

实现这一模式的关键在于：在路由之前先进行知识库搜索。如果没有这一步，上层路由器只能基于表面信息进行猜测。而有了搜索结果的结构信息（如评分、类别、复杂度标签），系统就能判断答案是来自单一文章，还是需要跨多篇文章综合生成。这才是判断查询处理方式的真实信号。

Elastic Workflows 让这一切无需编写编排代码即可实现。整个路由逻辑以 YAML 形式存在于 Kibana 中，使用原生步骤完成搜索、LLM 提示词以及条件分支。

结合 Elastic Agent Builder，同一套工作流既可以服务程序化触发，也可以服务对话式接口。

下一步

试用完整实现的 notebook
添加 LLM 监控（OpenRouter）来跟踪每个路由层级的成本
探索 Elastic Workflows 的其他自动化模式
了解 Agent Builder 以及如何将工作流暴露为对话工具
阅读关于构建 AI agentic 工作流的内容

常见问题

什么是 LLM 查询路由，为什么它对客户支持很重要？

LLM 查询路由是指根据查询复杂度将请求发送到不同 AI 模型的做法。在客户支持系统中，简单的 FAQ 问题可以由小型快速模型（如 Mistral Small）回答，而复杂的多问题查询则更适合能力更强的 Claude Sonnet。路由可以同时降低延迟与成本：由小模型处理的简单查询响应更快且 token 成本更低。

Elasticsearch 的元数据如何支持 LLM 查询路由？

在 Elasticsearch 的语义搜索之后，每个结果都会包含相关性评分、问题类别、复杂度标签以及产品类别。分类模型只读取这些元数据字段（不读取完整文档内容），判断查询是对应单一 FAQ 还是需要多来源综合生成。这使得分类提示词保持很小（通常只有几百 token），从而降低每次查询的成本。

该 Elasticsearch 查询路由模式使用了哪些模型？

该示例使用 Mistral Small 进行分类与简单 FAQ 回答，并使用 Claude Sonnet 4.6（Elastic 托管 LLM）进行多来源综合生成。Mistral Small 通过 .gen-ai 连接器类型注册为自定义连接器，该类型支持任何兼容 OpenAI 的 API。该划分基于成本与能力：简单任务交给小模型，复杂生成任务交给大模型。

是否可以在 Elastic Workflows 路由模式中使用不同模型？

可以。Elastic Workflows 的路由逻辑基于连接器实现：任何兼容 OpenAI API 的模型都可以通过 .gen-ai 连接器注册为自定义连接器。该模式与模型无关，核心要求是：分类步骤使用便宜且快速的模型，生成步骤使用能力更强的模型。

Elasticsearch 中基于元数据路由的限制是什么？

基于搜索结果元数据的路由依赖索引中结构化字段（如 issue_complexity、issue_category）的质量。如果这些字段缺失或标注不一致，分类信号会下降，路由准确性也会降低。直接对完整文档进行分类更稳健，但会显著增加 token 成本与延迟。

Elastic Workflows 如何简化 LLM 查询路由，而不是使用自定义代码？

Elastic Workflows 在 Kibana 中以 YAML 定义路由逻辑，使用原生步骤执行 Elasticsearch 语义搜索、LLM 提示词（通过 AI 连接器）以及条件分支。不需要任何编排框架或自定义代码。同一工作流可以同时支持程序化触发（手动、定时、告警）以及通过 Agent Builder 的对话式调用，而无需重复实现逻辑。

原文：LLM query routing in Elasticsearch with Elastic Workflows - Elasticsearch Labs