Skip to main content

使用Vectara进行文档聊天

本笔记本基于chat_vector_db笔记本,但使用Vectara作为向量数据库。

import os
from langchain.vectorstores import Vectara
from langchain.vectorstores.vectara import VectaraRetriever
from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain

加载文档。您可以将其替换为您想要的任何类型的数据加载器

from langchain.document_loaders import TextLoader

loader = TextLoader("../../modules/state_of_the_union.txt")
documents = loader.load()

现在我们将文档拆分,为它们创建嵌入,并将它们放入向量存储中。这样我们就可以对它们进行语义搜索。

vectorstore = Vectara.from_documents(documents, embedding=None)

我们现在可以创建一个内存对象,这对于跟踪输入/输出并进行对话是必要的。

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

现在我们初始化ConversationalRetrievalChain

openai_api_key = os.environ["OPENAI_API_KEY"]
llm = OpenAI(openai_api_key=openai_api_key, temperature=0)
retriever = vectorstore.as_retriever(lambda_val=0.025, k=5, filter=None)
d = retriever.get_relevant_documents(
"What did the president say about Ketanji Brown Jackson"
)

qa = ConversationalRetrievalChain.from_llm(llm, retriever, memory=memory)
query = "What did the president say about Ketanji Brown Jackson"
result = qa({"question": query})
result["answer"]
    " 总统说Ketanji Brown Jackson是全国顶尖的法律专家之一,她将继续Justice Breyer的卓越传统。"
query = "Did he mention who she suceeded"
result = qa({"question": query})
result["answer"]
    ' 史蒂芬·布雷耶法官'

传递聊天历史

在上面的示例中,我们使用了一个Memory对象来跟踪聊天历史。我们也可以直接传递它。为了做到这一点,我们需要初始化一个没有任何Memory对象的链。

qa = ConversationalRetrievalChain.from_llm(
OpenAI(temperature=0), vectorstore.as_retriever()
)

以下是在没有聊天历史的情况下提问的示例

chat_history = []
query = "What did the president say about Ketanji Brown Jackson"
result = qa({"question": query, "chat_history": chat_history})
result["answer"]
    " 总统说Ketanji Brown Jackson是全国顶尖的法律专家之一,她将继续Justice Breyer的卓越传统。"

以下是在有一些聊天历史的情况下提问的示例

chat_history = [(query, result["answer"])]
query = "Did he mention who she suceeded"
result = qa({"question": query, "chat_history": chat_history})
result["answer"]
    ' 史蒂芬·布雷耶法官'

返回源文档

您还可以轻松地从ConversationalRetrievalChain中返回源文档。这对于您想要检查返回了哪些文档非常有用。

qa = ConversationalRetrievalChain.from_llm(
llm, vectorstore.as_retriever(), return_source_documents=True
)
chat_history = []
query = "What did the president say about Ketanji Brown Jackson"
result = qa({"question": query, "chat_history": chat_history})
result["source_documents"][0]
    Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'})

ConversationalRetrievalChain with search_distance

如果您使用的向量存储支持按搜索距离进行过滤,您可以添加一个阈值参数。

vectordbkwargs = {"search_distance": 0.9}
qa = ConversationalRetrievalChain.from_llm(
OpenAI(temperature=0), vectorstore.as_retriever(), return_source_documents=True
)
chat_history = []
query = "What did the president say about Ketanji Brown Jackson"
result = qa(
{"question": query, "chat_history": chat_history, "vectordbkwargs": vectordbkwargs}
)
print(result["answer"])
     总统说Ketanji Brown Jackson是全国顶尖的法律专家之一,她将继续Justice Breyer的卓越传统。

ConversationalRetrievalChain with map_reduce

我们还可以使用不同类型的组合文档链与ConversationalRetrievalChain链。

from langchain.chains import LLMChain
from langchain.chains.question_answering import load_qa_chain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT
question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)
doc_chain = load_qa_chain(llm, chain_type="map_reduce")

chain = ConversationalRetrievalChain(
retriever=vectorstore.as_retriever(),
question_generator=question_generator,
combine_docs_chain=doc_chain,
)
chat_history = []
query = "What did the president say about Ketanji Brown Jackson"
result = chain({"question": query, "chat_history": chat_history})
result["answer"]
    " 总统说他提名了Circuit Court of Appeals法官Ketanji Brown Jackson,他形容她是全国顶尖的法律专家之一,将继续Justice Breyer的卓越传统。"

ConversationalRetrievalChain with Question Answering with sources

您还可以将此链与带有源的问题回答链一起使用。

from langchain.chains.qa_with_sources import load_qa_with_sources_chain
question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)
doc_chain = load_qa_with_sources_chain(llm, chain_type="map_reduce")

chain = ConversationalRetrievalChain(
retriever=vectorstore.as_retriever(),
question_generator=question_generator,
combine_docs_chain=doc_chain,
)
chat_history = []
query = "What did the president say about Ketanji Brown Jackson"
result = chain({"question": query, "chat_history": chat_history})
result["answer"]
    " 总统说他提名了Circuit Court of Appeals法官Ketanji Brown Jackson,他形容她是全国顶尖的法律专家之一,并且她将继续Justice Breyer的卓越传统。\nSOURCES: ../../../state_of_the_union.txt"

ConversationalRetrievalChain with streaming to stdout

在此示例中,链的输出将逐个令牌流式传输到stdout

from langchain.chains.llm import LLMChain
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains.conversational_retrieval.prompts import (
CONDENSE_QUESTION_PROMPT,
QA_PROMPT,
)
from langchain.chains.question_answering import load_qa_chain

# Construct a ConversationalRetrievalChain with a streaming llm for combine docs
# and a separate, non-streaming llm for question generation
llm = OpenAI(temperature=0, openai_api_key=openai_api_key)
streaming_llm = OpenAI(
streaming=True,
callbacks=[StreamingStdOutCallbackHandler()],
temperature=0,
openai_api_key=openai_api_key,
)

question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)
doc_chain = load_qa_chain(streaming_llm, chain_type="stuff", prompt=QA_PROMPT)

qa = ConversationalRetrievalChain(
retriever=vectorstore.as_retriever(),
combine_docs_chain=doc_chain,
question_generator=question_generator,
)
chat_history = []
query = "What did the president say about Ketanji Brown Jackson"
result = qa({"question": query, "chat_history": chat_history})
     总统说Ketanji Brown Jackson是全国顶尖的法律专家之一,她将继续Justice Breyer的卓越传统。
chat_history = [(query, result["answer"])]
query = "Did he mention who she suceeded"
result = qa({"question": query, "chat_history": chat_history})
     史蒂芬·布雷耶法官

get_chat_history函数

您还可以指定一个get_chat_history函数,该函数可用于格式化聊天历史字符串。

def get_chat_history(inputs) -> str:
res = []
for human, ai in inputs:
res.append(f"Human:{human}\nAI:{ai}")
return "\n".join(res)


qa = ConversationalRetrievalChain.from_llm(
llm, vectorstore.as_retriever(), get_chat_history=get_chat_history
)
chat_history = []
query = "What did the president say about Ketanji Brown Jackson"
result = qa({"question": query, "chat_history": chat_history})
result["answer"]
    " 总统说Ketanji Brown Jackson是全国顶尖的法律专家之一,她将继续Justice Breyer的卓越传统。"