Marqo
这个笔记本展示了如何使用与Marqo向量存储相关的功能。
Marqo是一个开源的向量搜索引擎。Marqo允许您存储和查询文本和图像等多模态数据。Marqo使用大量的开源模型为您创建向量,您还可以提供自己的微调模型,Marqo将为您处理加载和推理。
要使用我们的docker镜像运行此笔记本,请先运行以下命令获取Marqo:
docker pull marqoai/marqo:latest
docker rm -f marqo
docker run --name marqo -it --privileged -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:latest
pip install marqo
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Marqo
from langchain.document_loaders import TextLoader
from langchain.document_loaders import TextLoader
loader = TextLoader("../../../state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
import marqo
# initialize marqo
marqo_url = "http://localhost:8882" # if using marqo cloud replace with your endpoint (console.marqo.ai)
marqo_api_key = "" # if using marqo cloud replace with your api key (console.marqo.ai)
client = marqo.Client(url=marqo_url, api_key=marqo_api_key)
index_name = "langchain-demo"
docsearch = Marqo.from_documents(docs, index_name=index_name)
query = "What did the president say about Ketanji Brown Jackson"
result_docs = docsearch.similarity_search(query)
Index langchain-demo exists.
print(result_docs[0].page_content)
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
result_docs = docsearch.similarity_search_with_score(query)
print(result_docs[0][0].page_content, result_docs[0][1], sep="\n")
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
0.68647254
附加功能
作为向量存储的强大功能之一,Marqo可以使用外部创建的索引。例如:
如果您有另一个应用程序中的图像和文本对的数据库,您可以简单地在langchain中使用Marqo向量存储。请注意,自己带来的多模态索引将禁用
add_texts
方法。如果您有一个文本文档数据库,您可以将其带入langchain框架,并通过
add_texts
添加更多文本。
返回的文档可以通过在搜索方法中传递自己的函数到page_content_builder
回调来自定义。
多模态示例
# 使用一个新的索引
index_name = "langchain-multimodal-demo"
# 如果重新运行演示,则删除索引
try:
client.delete_index(index_name)
except Exception:
print(f"Creating {index_name}")
# 这个索引可能是由另一个系统创建的
settings = {"treat_urls_and_pointers_as_images": True, "model": "ViT-L/14"}
client.create_index(index_name, **settings)
client.index(index_name).add_documents(
[
# 图片:公共汽车
{
"caption": "Bus",
"image": "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image4.jpg",
},
# 图片:飞机
{
"caption": "Plane",
"image": "https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image2.jpg",
},
],
)
{'errors': False,
'processingTimeMs': 2090.2822139996715,
'index_name': 'langchain-multimodal-demo',
'items': [{'_id': 'aa92fc1c-1fb2-4d86-b027-feb507c419f7',
'result': 'created',
'status': 201},
{'_id': '5142c258-ef9f-4bf2-a1a6-2307280173a0',
'result': 'created',
'status': 201}]}
def get_content(res):
"""将Marqo的文档格式化为用作page_content的文本的辅助函数"""
return f"{res['caption']}: {res['image']}"
docsearch = Marqo(client, index_name, page_content_builder=get_content)
query = "vehicles that fly"
doc_results = docsearch.similarity_search(query)
for doc in doc_results:
print(doc.page_content)
Plane: https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image2.jpg
Bus: https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image4.jpg
仅文本示例
# 使用一个新的索引
index_name = "langchain-byo-index-demo"
# 如果重新运行演示,则删除索引
try:
client.delete_index(index_name)
except Exception:
print(f"Creating {index_name}")
# 这个索引可能是由另一个系统创建的
client.create_index(index_name)
client.index(index_name).add_documents(
[
{
"Title": "Smartphone",
"Description": "A smartphone is a portable computer device that combines mobile telephone "
"functions and computing functions into one unit.",
},
{
"Title": "Telephone",
"Description": "A telephone is a telecommunications device that permits two or more users to"
"conduct a conversation when they are too far apart to be easily heard directly.",
},
],
)
{'errors': False,
'processingTimeMs': 139.2144540004665,
'index_name': 'langchain-byo-index-demo',
'items': [{'_id': '27c05a1c-b8a9-49a5-ae73-fbf1eb51dc3f',
'result': 'created',
'status': 201},
{'_id': '6889afe0-e600-43c1-aa3b-1d91bf6db274',
'result': 'created',
'status': 201}]}
# 注意,文本索引保留了使用add_texts的能力,尽管文档中的字段名称不同,这是因为page_content_builder回调允许您根据需要处理这些文档字段
def get_content(res):
"""将Marqo的文档格式化为用作page_content的文本的辅助函数"""
if "text" in res:
return res["text"]
return res["Description"]
docsearch = Marqo(client, index_name, page_content_builder=get_content)
docsearch.add_texts(["This is a document that is about elephants"])
['9986cc72-adcd-4080-9d74-265c173a9ec3']
query = "modern communications devices"
doc_results = docsearch.similarity_search(query)
print(doc_results[0].page_content)
A smartphone is a portable computer device that combines mobile telephone functions and computing functions into one unit.
query = "elephants"
doc_results = docsearch.similarity_search(query, page_content_builder=get_content)
print(doc_results[0].page_content)
This is a document that is about elephants
加权查询
我们还公开了marqos加权查询,这是一种组合复杂语义搜索的强大方法。
query = {"communications devices": 1.0}
doc_results = docsearch.similarity_search(query)
print(doc_results[0].page_content)
A smartphone is a portable computer device that combines mobile telephone functions and computing functions into one unit.
query = {"communications devices": 1.0, "technology post 2000": -1.0}
doc_results = docsearch.similarity_search(query)
print(doc_results[0].page_content)
A telephone is a telecommunications device that permits two or more users toconduct a conversation when they are too far apart to be easily heard directly.
使用来源进行问答
本节展示了如何使用Marqo作为RetrievalQAWithSourcesChain
的一部分。Marqo将在来源中执行信息搜索。
from langchain.chains import RetrievalQAWithSourcesChain
from langchain import OpenAI
import os
import getpass
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
OpenAI API Key:········
with open("../../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)
index_name = "langchain-qa-with-retrieval"
docsearch = Marqo.from_documents(docs, index_name=index_name)
Index langchain-qa-with-retrieval exists.
chain = RetrievalQAWithSourcesChain.from_chain_type(
OpenAI(temperature=0), chain_type="stuff", retriever=docsearch.as_retriever()
)
chain(
{"question": "What did the president say about Justice Breyer"},
return_only_outputs=True,
)
{'answer': ' The president honored Justice Breyer, thanking him for his service and noting that he is a retiring Justice of the United States Supreme Court.\n',
'sources': '../../../state_of_the_union.txt'}