Skip to main content

ScaNN (可扩展最近邻)

ScaNN (Scalable Nearest Neighbors) 是一种用于高效向量相似性搜索的方法。

ScaNN 包括搜索空间修剪和量化,用于最大内积搜索,并支持其他距离函数,如欧几里德距离。该实现经过了针对支持 AVX2 的 x86 处理器的优化。有关更多详细信息,请参阅其 Google Research github

安装

通过 pip 安装 ScaNN。或者,您可以按照 ScaNN 网站 上的说明从源代码安装。

pip install scann

检索演示

下面我们展示如何将 ScaNN 与 Huggingface Embeddings 结合使用。

from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import ScaNN
from langchain.document_loaders import TextLoader

loader = TextLoader("state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

from langchain.embeddings import TensorflowHubEmbeddings
embeddings = HuggingFaceEmbeddings()

db = ScaNN.from_documents(docs, embeddings)
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)

docs[0]
    Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': 'state_of_the_union.txt'})

RetrievalQA 演示

接下来,我们演示如何将 ScaNN 与 Google PaLM API 结合使用。

您可以从 https://developers.generativeai.google/tutorials/setup 获取 API 密钥。

from langchain.chains import RetrievalQA
from langchain.chat_models import google_palm

palm_client = google_palm.ChatGooglePalm(google_api_key='YOUR_GOOGLE_PALM_API_KEY')

qa = RetrievalQA.from_chain_type(
llm=palm_client,
chain_type="stuff",
retriever=db.as_retriever(search_kwargs={'k': 10})
)
print(qa.run('What did the president say about Ketanji Brown Jackson?'))
    The president said that Ketanji Brown Jackson is one of our nation's top legal minds, who will continue Justice Breyer's legacy of excellence.
print(qa.run('What did the president say about Michael Phelps?'))
    The president did not mention Michael Phelps in his speech.

保存和加载本地检索索引

db.save_local('/tmp/db', 'state_of_union')
restored_db = ScaNN.load_local('/tmp/db', embeddings, index_name='state_of_union')