流式传输

一些LLM提供流式响应。这意味着你可以在整个响应返回之前开始处理它，而不必等待。如果你想要在生成过程中向用户显示响应，或者想要在生成过程中处理响应，这将非常有用。

目前，我们支持广泛的LLM实现的流式传输，包括但不限于OpenAI、ChatOpenAI、ChatAnthropic、Hugging Face Text Generation Inference和Replicate。这个功能已经扩展到大多数模型。要使用流式传输，请使用实现了on_llm_new_token的CallbackHandler。在这个例子中，我们使用了StreamingStdOutCallbackHandler。

from langchain.llms import OpenAI  
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler  

llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0)  
resp = llm("Write me a song about sparkling water.")  

Verse 1  
I'm sippin' on sparkling water,  
It's so refreshing and light,  
It's the perfect way to quench my thirst  
On a hot summer night.  

Chorus  
Sparkling water, sparkling water,  
It's the best way to stay hydrated,  
It's so crisp and so clean,  
It's the perfect way to stay refreshed.  

Verse 2  
I'm sippin' on sparkling water,  
It's so bubbly and bright,  
It's the perfect way to cool me down  
On a hot summer night.  

Chorus  
Sparkling water, sparkling water,  
It's the best way to stay hydrated,  
It's so crisp and so clean,  
It's the perfect way to stay refreshed.  

Verse 3  
I'm sippin' on sparkling water,  
It's so light and so clear,  
It's the perfect way to keep me cool  
On a hot summer night.  

Chorus  
Sparkling water, sparkling water,  
It's the best way to stay hydrated,  
It's so crisp and so clean,  
It's the perfect way to stay refreshed.  

如果使用generate，我们仍然可以访问最终的LLMResult。然而，对于流式传输，目前不支持token_usage。

llm.generate(["Tell me a joke."])  

Q: What did the fish say when it hit the wall?  
A: Dam!  

LLMResult(generations=[[Generation(text='\n\nQ: What did the fish say when it hit the wall?\nA: Dam!', generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {}, 'model_name': 'text-davinci-003'})