Hugging Face Local Pipelines

Hugging Face 模型可以通过 HuggingFacePipeline 类在本地运行。

Hugging Face 模型中心托管了超过 120,000 个模型、20,000 个数据集和 50,000 个演示应用程序（Spaces），所有这些都是开源的，公开可用，人们可以在这个在线平台上轻松地进行协作和构建机器学习。

这些模型可以通过 LangChain 中的本地管道包装器调用，也可以通过调用 HuggingFaceHub 类的托管推理端点来调用。有关托管管道的更多信息，请参阅 HuggingFaceHub 笔记本。

要使用，您应该安装 transformers Python 包。

pip install transformers > /dev/null

加载模型

from langchain import HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    model_id="bigscience/bloom-1b7",
    task="text-generation",
    model_kwargs={"temperature": 0, "max_length": 64},
)

    WARNING:root:Failed to default session, using empty session: HTTPConnectionPool(host='localhost', port=8000): Max retries exceeded with url: /sessions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1117f9790>: Failed to establish a new connection: [Errno 61] Connection refused'))

将模型集成到 LLMChain 中

from langchain import PromptTemplate, LLMChain

template = """问题：{question}

回答：让我们逐步思考。"""
prompt = PromptTemplate(template=template, input_variables=["question"])

llm_chain = LLMChain(prompt=prompt, llm=llm)

question = "什么是脑电图？"

print(llm_chain.run(question))

    /Users/wfh/code/lc/lckg/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:1288: UserWarning: Using `max_length`'s default (64) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.
      warnings.warn(
    WARNING:root:Failed to persist run: HTTPConnectionPool(host='localhost', port=8000): Max retries exceeded with url: /chain-runs (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x144d06910>: Failed to establish a new connection: [Errno 61] Connection refused'))


     首先，我们需要了解什么是脑电图。脑电图是一种记录脑活动的方法。它是通过在头皮上放置电极来记录脑活动的一种方法。电极被放置在

Hugging Face Local Pipelines

加载模型​

将模型集成到 LLMChain 中​

加载模型

将模型集成到 LLMChain 中