Huggingface TextGen

Huggingface 文本生成推理是一个用于文本生成推理的 Rust、Python 和 gRPC 服务器。在 HuggingFace 中用于支持 LLMs api-inference 小部件的生产环境。

本笔记本介绍如何使用使用 Text Generation Inference 自托管的 LLM。

使用前，请确保已安装 text_generation Python 包。

# !pip3 install text_generation

from langchain.llms import HuggingFaceTextGenInference

llm = HuggingFaceTextGenInference(
    inference_server_url="http://localhost:8010/",
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
)
llm("What did foo say about bar?")

流式处理

from langchain.llms import HuggingFaceTextGenInference
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

llm = HuggingFaceTextGenInference(
    inference_server_url="http://localhost:8010/",
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
    stream=True
)
llm("What did foo say about bar?", callbacks=[StreamingStdOutCallbackHandler()])

Huggingface TextGen

流式处理​

流式处理