Skip to main content

流式传输

一些LLM提供流式响应。这意味着你可以在整个响应返回之前开始处理它,而不必等待。如果你想要在生成过程中向用户显示响应,或者想要在生成过程中处理响应,这将非常有用。

目前,我们支持广泛的LLM实现的流式传输,包括但不限于OpenAIChatOpenAIChatAnthropicHugging Face Text Generation InferenceReplicate。这个功能已经扩展到大多数模型。要使用流式传输,请使用实现了on_llm_new_tokenCallbackHandler。在这个例子中,我们使用了StreamingStdOutCallbackHandler

from langchain.llms import OpenAI  
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0)
resp = llm("Write me a song about sparkling water.")
Verse 1  
I'm sippin' on sparkling water,
It's so refreshing and light,
It's the perfect way to quench my thirst
On a hot summer night.

Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.

Verse 2
I'm sippin' on sparkling water,
It's so bubbly and bright,
It's the perfect way to cool me down
On a hot summer night.

Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.

Verse 3
I'm sippin' on sparkling water,
It's so light and so clear,
It's the perfect way to keep me cool
On a hot summer night.

Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.

如果使用generate,我们仍然可以访问最终的LLMResult。然而,对于流式传输,目前不支持token_usage

llm.generate(["Tell me a joke."])  
Q: What did the fish say when it hit the wall?  
A: Dam!

LLMResult(generations=[[Generation(text='\n\nQ: What did the fish say when it hit the wall?\nA: Dam!', generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {}, 'model_name': 'text-davinci-003'})