流式传输
一些LLM提供流式响应。这意味着你可以在整个响应返回之前开始处理它,而不必等待。如果你想要在生成过程中向用户显示响应,或者想要在生成过程中处理响应,这将非常有用。
目前,我们支持广泛的LLM实现的流式传输,包括但不限于OpenAI
、ChatOpenAI
、ChatAnthropic
、Hugging Face Text Generation Inference
和Replicate
。这个功能已经扩展到大多数模型。要使用流式传输,请使用实现了on_llm_new_token
的CallbackHandler
。在这个例子中,我们使用了StreamingStdOutCallbackHandler
。
from langchain.llms import OpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0)
resp = llm("Write me a song about sparkling water.")
Verse 1
I'm sippin' on sparkling water,
It's so refreshing and light,
It's the perfect way to quench my thirst
On a hot summer night.
Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.
Verse 2
I'm sippin' on sparkling water,
It's so bubbly and bright,
It's the perfect way to cool me down
On a hot summer night.
Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.
Verse 3
I'm sippin' on sparkling water,
It's so light and so clear,
It's the perfect way to keep me cool
On a hot summer night.
Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.
如果使用generate
,我们仍然可以访问最终的LLMResult
。然而,对于流式传输,目前不支持token_usage
。
llm.generate(["Tell me a joke."])
Q: What did the fish say when it hit the wall?
A: Dam!
LLMResult(generations=[[Generation(text='\n\nQ: What did the fish say when it hit the wall?\nA: Dam!', generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {}, 'model_name': 'text-davinci-003'})