流式传输
一些LLM提供流式响应。这意味着你可以在整个响应返回之前开始处理它,而不必等待。如果你想要在生成过程中向用户显示响应,或者想要在生成过程中处理响应,这将非常有用。
目前,我们支持广泛的LLM实现的流式传输,包括但不限于OpenAI、ChatOpenAI、ChatAnthropic、Hugging Face Text Generation Inference和Replicate。这个功能已经扩展到大多数模型。要使用流式传输,请使用实现了on_llm_new_token的CallbackHandler。在这个例子中,我们使用了StreamingStdOutCallbackHandler。
from langchain.llms import OpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0)
resp = llm("Write me a song about sparkling water.")
Verse 1
I'm sippin' on sparkling water,
It's so refreshing and light,
It's the perfect way to quench my thirst
On a hot summer night.
Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.
Verse 2
I'm sippin' on sparkling water,
It's so bubbly and bright,
It's the perfect way to cool me down
On a hot summer night.
Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.
Verse 3
I'm sippin' on sparkling water,
It's so light and so clear,
It's the perfect way to keep me cool
On a hot summer night.
Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.
如果使用generate,我们仍然可以访问最终的LLMResult。然而,对于流式传输,目前不支持token_usage。
llm.generate(["Tell me a joke."])
Q: What did the fish say when it hit the wall?
A: Dam!
LLMResult(generations=[[Generation(text='\n\nQ: What did the fish say when it hit the wall?\nA: Dam!', generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {}, 'model_name': 'text-davinci-003'})
