Skip to main content

Caching integrations

本笔记本介绍了如何缓存单个LLM调用的结果。

import langchain
from langchain.llms import OpenAI

# 为了使缓存更加明显,让我们使用一个较慢的模型。
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)

内存缓存 (In Memory Cache)

from langchain.cache import InMemoryCache

langchain.llm_cache = InMemoryCache()
# 第一次,还没有在缓存中,所以需要更长时间
llm("告诉我一个笑话")
    CPU times: user 35.9 ms, sys: 28.6 ms, total: 64.6 ms
Wall time: 4.83 s





"\n\n为什么自行车不能自己站起来?因为它太累了!"
# 第二次,已经在缓存中,所以速度更快
llm("告诉我一个笑话")
    CPU times: user 238 µs, sys: 143 µs, total: 381 µs
Wall time: 1.76 ms





'\n\n为什么鸡会过马路?\n\n为了到达另一边。'

SQLite缓存(SQLite Cache)

rm .langchain.db
# 我们可以使用SQLite缓存来完成同样的事情
from langchain.cache import SQLiteCache

langchain.llm_cache = SQLiteCache(database_path=".langchain.db")
# 第一次运行时,它还没有在缓存中,所以会花费更长的时间
llm("Tell me a joke")
    CPU times: user 17 ms, sys: 9.76 ms, total: 26.7 ms
Wall time: 825 ms





'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
# 第二次运行时,它已经在缓存中,所以速度更快
llm("Tell me a joke")
    CPU times: user 2.46 ms, sys: 1.23 ms, total: 3.7 ms
Wall time: 2.67 ms





'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

Redis缓存 (Redis Cache)

标准缓存

使用Redis来缓存提示和响应。

# 我们可以使用Redis缓存来完成同样的事情
# (在运行此示例之前,请确保您的本地Redis实例正在运行)
from redis import Redis
from langchain.cache import RedisCache

langchain.llm_cache = RedisCache(redis_=Redis())
# 第一次运行时,它还没有在缓存中,所以需要更长时间
llm("Tell me a joke")
    CPU times: user 6.88 ms, sys: 8.75 ms, total: 15.6 ms
Wall time: 1.04 s





'\n\n为什么小鸡要过马路?\n\n为了到达另一边!'
# 第二次运行时,它已经在缓存中,所以速度更快
llm("Tell me a joke")
    CPU times: user 1.59 ms, sys: 610 µs, total: 2.2 ms
Wall time: 5.58 ms





'\n\n为什么小鸡要过马路?\n\n为了到达另一边!'

语义缓存 (Semantic Cache)

使用 Redis 来缓存提示和回复,并根据语义相似性进行命中评估。

from langchain.embeddings import OpenAIEmbeddings
from langchain.cache import RedisSemanticCache


langchain.llm_cache = RedisSemanticCache(
redis_url="redis://localhost:6379", embedding=OpenAIEmbeddings()
)
# 第一次,还没有缓存,所以需要更长时间
llm("告诉我一个笑话")
    CPU times: user 351 ms, sys: 156 ms, total: 507 ms
Wall time: 3.37 s





"\n\n为什么科学家不相信原子?\n因为它们组成了一切。"
# 第二次,虽然不是直接命中,但问题在语义上与原始问题相似,所以使用了缓存的结果!
llm("告诉我一个笑话")
    CPU times: user 6.25 ms, sys: 2.72 ms, total: 8.97 ms
Wall time: 262 ms





"\n\n为什么科学家不相信原子?\n因为它们组成了一切。"

GPTCache (GPT缓存)

我们可以使用GPTCache来进行精确匹配缓存,或者基于语义相似性缓存结果。

让我们首先从一个精确匹配的例子开始。

from gptcache import Cache
from gptcache.manager.factory import manager_factory
from gptcache.processor.pre import get_prompt
from langchain.cache import GPTCache
import hashlib


def get_hashed_name(name):
return hashlib.sha256(name.encode()).hexdigest()


def init_gptcache(cache_obj: Cache, llm: str):
hashed_llm = get_hashed_name(llm)
cache_obj.init(
pre_embedding_func=get_prompt,
data_manager=manager_factory(manager="map", data_dir=f"map_cache_{hashed_llm}"),
)


langchain.llm_cache = GPTCache(init_gptcache)
# 第一次运行时,尚未缓存,所以需要更长时间
llm("Tell me a joke")
    CPU times: user 21.5 ms, sys: 21.3 ms, total: 42.8 ms
Wall time: 6.2 s





'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'
# 第二次运行时,已经缓存,所以速度更快
llm("Tell me a joke")
    CPU times: user 571 µs, sys: 43 µs, total: 614 µs
Wall time: 635 µs





'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

现在让我们展示一个语义相似性缓存的例子。

from gptcache import Cache
from gptcache.adapter.api import init_similar_cache
from langchain.cache import GPTCache
import hashlib


def get_hashed_name(name):
return hashlib.sha256(name.encode()).hexdigest()


def init_gptcache(cache_obj: Cache, llm: str):
hashed_llm = get_hashed_name(llm)
init_similar_cache(cache_obj=cache_obj, data_dir=f"similar_cache_{hashed_llm}")


langchain.llm_cache = GPTCache(init_gptcache)
# 第一次运行时,尚未缓存,所以需要更长时间
llm("Tell me a joke")
    CPU times: user 1.42 s, sys: 279 ms, total: 1.7 s
Wall time: 8.44 s





'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
# 这是一个精确匹配,所以可以在缓存中找到
llm("Tell me a joke")
    CPU times: user 866 ms, sys: 20 ms, total: 886 ms
Wall time: 226 ms





'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
# 这不是一个精确匹配,但在语义上相似,所以也能命中缓存!
llm("Tell me joke")
    CPU times: user 853 ms, sys: 14.8 ms, total: 868 ms
Wall time: 224 ms





'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

Momento缓存

使用Momento来缓存提示和响应。

需要使用Momento,取消下面的注释以安装:

# !pip install momento

您需要获取Momento的身份验证令牌才能使用此类。这可以直接传递给momento.CacheClient,如果您想直接实例化它,作为MomentoChatMessageHistory.from_client_params的命名参数auth_token,或者可以将其设置为环境变量MOMENTO_AUTH_TOKEN

from datetime import timedelta

from langchain.cache import MomentoCache


cache_name = "langchain"
ttl = timedelta(days=1)
langchain.llm_cache = MomentoCache.from_client_params(cache_name, ttl)
# 第一次运行时,它还没有在缓存中,所以需要更长时间
llm("Tell me a joke")
    CPU times: user 40.7 ms, sys: 16.5 ms, total: 57.2 ms
Wall time: 1.73 s





'\n\n为什么小鸡过马路?\n\n为了到达另一边!'
# 第二次运行时,它已经在缓存中,所以速度更快
# 在与缓存相同的区域运行时,延迟为几毫秒
llm("Tell me a joke")
    CPU times: user 3.16 ms, sys: 2.98 ms, total: 6.14 ms
Wall time: 57.9 ms





'\n\n为什么小鸡过马路?\n\n为了到达另一边!'

SQLAlchemy缓存(SQLAlchemy Cache)

# 您可以使用SQLAlchemyCache将缓存与SQLAlchemy支持的任何SQL数据库一起使用。

# from langchain.cache import SQLAlchemyCache
# from sqlalchemy import create_engine

# engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres")
# langchain.llm_cache = SQLAlchemyCache(engine)

自定义SQLAlchemy模式

# 您可以定义自己的声明性SQLAlchemyCache子类来自定义用于缓存的模式。例如,为了支持使用Postgres进行高速全文提示索引,可以使用以下代码:

from sqlalchemy import Column, Integer, String, Computed, Index, Sequence
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy_utils import TSVectorType
from langchain.cache import SQLAlchemyCache

Base = declarative_base()


class FulltextLLMCache(Base): # type: ignore
"""用于全文索引的LLM缓存的Postgres表"""

__tablename__ = "llm_cache_fulltext"
id = Column(Integer, Sequence("cache_id"), primary_key=True)
prompt = Column(String, nullable=False)
llm = Column(String, nullable=False)
idx = Column(Integer)
response = Column(String)
prompt_tsv = Column(
TSVectorType(),
Computed("to_tsvector('english', llm || ' ' || prompt)", persisted=True),
)
__table_args__ = (
Index("idx_fulltext_prompt_tsv", prompt_tsv, postgresql_using="gin"),
)


engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres")
langchain.llm_cache = SQLAlchemyCache(engine, FulltextLLMCache)

可选缓存

您还可以选择关闭特定LLM的缓存。在下面的示例中,即使全局缓存已启用,我们也将其关闭了一个特定的LLM。

llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2, cache=False)
llm("Tell me a joke")
    CPU times: user 5.8 ms, sys: 2.71 ms, total: 8.51 ms
Wall time: 745 ms





'\n\n为什么小鸡要过马路?\n\n为了到达另一边!'
llm("Tell me a joke")
    CPU times: user 4.91 ms, sys: 2.64 ms, total: 7.55 ms
Wall time: 623 ms





'\n\n两个人偷了一个日历。他们每人得到六个月。'

可选的链式缓存

您还可以关闭链中特定节点的缓存。请注意,由于某些接口的原因,通常更容易先构建链,然后再编辑LLM。

以一个示例为例,我们将加载一个summarizer map-reduce链。我们将为map步骤缓存结果,但在combine步骤中不冻结它。

llm = OpenAI(model_name="text-davinci-002")
no_cache_llm = OpenAI(model_name="text-davinci-002", cache=False)
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain

text_splitter = CharacterTextSplitter()
with open("../../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
texts = text_splitter.split_text(state_of_the_union)
from langchain.docstore.document import Document

docs = [Document(page_content=t) for t in texts[:3]]
from langchain.chains.summarize import load_summarize_chain
chain = load_summarize_chain(llm, chain_type="map_reduce", reduce_llm=no_cache_llm)
chain.run(docs)
    CPU times: user 452 ms, sys: 60.3 ms, total: 512 ms
Wall time: 5.09 s





'\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure. In response to Russian aggression in Ukraine, the United States is joining with European allies to impose sanctions and isolate Russia. American forces are being mobilized to protect NATO countries in the event that Putin decides to keep moving west. The Ukrainians are bravely fighting back, but the next few weeks will be hard for them. Putin will pay a high price for his actions in the long run. Americans should not be alarmed, as the United States is taking action to protect its interests and allies.'

当我们再次运行它时,我们会发现它运行得更快,但最终的答案是不同的。这是由于在map步骤中进行了缓存,但在reduce步骤中没有进行缓存。

chain.run(docs)
    CPU times: user 11.5 ms, sys: 4.33 ms, total: 15.8 ms
Wall time: 1.04 s





'\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure.'
rm .langchain.db sqlite.db