使用langchain的意义是什么

原创于 2026-06-30 08:37:56 发布 · 205 阅读

本内容遵循CC 4.0 BY-SA版权协议

背景

在RAG（检索增强生成）这个概念没有出生之前，各大中厂实现KBQA（基于知识库的问答系统）的方式，也用到了检索技术（参见：基于ElasticSearch+文本相似度模型的检索式智能对话方案_elasticsearch 文本相似度-CSDN博客），只是以前所检索的是“QA问答对”中的“Q”。

参见下面“微信对话开放平台”的截图：

这种方式最大的优点是 chatbot的回答相当可控，最大缺点是需要大量的人力去配置问答对。

使用langchain做KBQA

在langchain的方案里，通常不用像上述方案一样需要整理问答对，可以直接对承载“知识”的文档进行文本分块，供后续与用户的提问进行“向量相似度计算”。流程大致如下：

（其实不用 langchain 也可以实现上述流程，只是 langchain 对“文本分块”、“向量化”等动作都做好了方法封装，用langchain来实现会比较方便）

代码示例

    # Simulated knowledge base
    documents = [
        Document(page_content="Python is a dynamically-typed, interpreted programming language created by Guido van Rossum and first released in 1991."),
        Document(page_content="LangChain is a framework for building LLM-powered applications, supporting chains, agents, and RAG patterns."),
        Document(page_content="Deep learning is a subset of machine learning that uses multi-layer neural networks to learn representations from data."),
        Document(page_content="PyTorch is an open-source deep learning framework developed by Meta AI, known for its dynamic computation graph."),
        Document(page_content="The Transformer architecture was proposed by Vaswani et al. in 2017 and is the foundation of modern LLMs."),
    ]

    # Split into chunks
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
    splits = text_splitter.split_documents(documents)

    # Build vector store with local embeddings
    vectorstore = InMemoryVectorStore.from_documents(splits, embeddings)
    retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

    # RAG prompt template
    rag_prompt = ChatPromptTemplate.from_template("""\
You are a knowledgeable assistant. Answer the question based on the provided documents only.
If the documents don't contain the answer, say "I cannot answer based on the provided information."

Documents:
{context}

Question: {question}

Answer:""")

    def format_docs(docs):
        return "\n\n".join(doc.page_content for doc in docs)

    rag_chain = (
        {"context": retriever | format_docs, "question": lambda x: x}
        | rag_prompt
        | model
        | StrOutputParser()
    )

    questions = [
        "What is LangChain?",
        "Who created Python?",
        "What's the weather like today?",
    ]

    for q in questions:
        result = rag_chain.invoke(q)
        print(f"\nQ: {q}")
        print(f"A: {result}")

标签

#langchain