Skip to main content

多个检索源

通常情况下,您可能希望在多个源上进行检索。这些源可以是不同的向量存储(其中一个包含有关主题X的信息,另一个包含有关主题Y的信息)。它们也可以是完全不同的数据库!

关键部分是尽可能并行地进行检索。这将使延迟尽可能低。幸运的是,LangChain表达式语言支持开箱即用的并行处理。

让我们看一下如何在SQL数据库和向量存储上进行检索。

from langchain.chat_models import ChatOpenAI

API参考:

设置SQL查询

from langchain.utilities import SQLDatabase
from langchain.chains import create_sql_query_chain

db = SQLDatabase.from_uri("sqlite:///../../../../../notebooks/Chinook.db")
query_chain = create_sql_query_chain(ChatOpenAI(temperature=0), db)

API参考:

设置向量存储

from langchain.indexes import VectorstoreIndexCreator
from langchain.schema.document import Document

index_creator = VectorstoreIndexCreator()
index = index_creator.from_documents([Document(page_content="Foo")])
retriever = index.vectorstore.as_retriever()

API参考:

结合

from langchain.prompts import ChatPromptTemplate

system_message = """Use the information from the below two sources to answer any questions.

Source 1: a SQL database about employee data
<source1>
{source1}
</source1>

Source 2: a text database of random information
<source2>
{source2}
</source2>
"""

prompt = ChatPromptTemplate.from_messages([("system", system_message), ("human", "{question}")])

API参考:

full_chain = {
"source1": {"question": lambda x: x["question"]} | query_chain | db.run,
"source2": (lambda x: x['question']) | retriever,
"question": lambda x: x['question'],
} | prompt | ChatOpenAI()
response = full_chain.invoke({"question":"How many Employees are there"})
print(response)
Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1

content='There are 8 employees.' additional_kwargs={} example=False