JSONFormer

JSONFormer 是一个库，它包装了本地 HuggingFace pipeline 模型，用于对 JSON Schema 的子集进行结构化解码。

它的工作原理是填充结构标记，然后从模型中采样内容标记。

警告 - 这个模块仍然是实验性的

pip install --upgrade jsonformer > /dev/null

HuggingFace 基准

首先，让我们通过检查没有结构化解码的模型输出来建立一个定性基准。

import logging

logging.basicConfig(level=logging.ERROR)

from typing import Optional
from langchain.tools import tool
import os
import json
import requests

HF_TOKEN = os.environ.get("HUGGINGFACE_API_KEY")


@tool
def ask_star_coder(query: str, temperature: float = 1.0, max_new_tokens: float = 250):
    """Query the BigCode StarCoder model about coding questions."""
    url = "https://api-inference.huggingface.co/models/bigcode/starcoder"
    headers = {
        "Authorization": f"Bearer {HF_TOKEN}",
        "content-type": "application/json",
    }
    payload = {
        "inputs": f"{query}\n\nAnswer:",
        "temperature": temperature,
        "max_new_tokens": int(max_new_tokens),
    }
    response = requests.post(url, headers=headers, data=json.dumps(payload))
    response.raise_for_status()
    return json.loads(response.content.decode("utf-8"))

prompt = """You must respond using JSON format, with a single action and single action input.
You may 'ask_star_coder' for help on coding problems.

{arg_schema}

EXAMPLES
----
Human: "So what's all this about a GIL?"
AI Assistant:{{
  "action": "ask_star_coder",
  "action_input": {{"query": "What is a GIL?", "temperature": 0.0, "max_new_tokens": 100}}"
}}
Observation: "The GIL is python's Global Interpreter Lock"
Human: "Could you please write a calculator program in LISP?"
AI Assistant:{{
  "action": "ask_star_coder",
  "action_input": {{"query": "Write a calculator program in LISP", "temperature": 0.0, "max_new_tokens": 250}}
}}
Observation: "(defun add (x y) (+ x y))\n(defun sub (x y) (- x y ))"
Human: "What's the difference between an SVM and an LLM?"
AI Assistant:{{
  "action": "ask_star_coder",
  "action_input": {{"query": "What's the difference between SGD and an SVM?", "temperature": 1.0, "max_new_tokens": 250}}
}}
Observation: "SGD stands for stochastic gradient descent, while an SVM is a Support Vector Machine."

BEGIN! Answer the Human's question as best as you are able.
------
Human: 'What's the difference between an iterator and an iterable?'
AI Assistant:""".format(
    arg_schema=ask_star_coder.args
)

from transformers import pipeline
from langchain.llms import HuggingFacePipeline

hf_model = pipeline(
    "text-generation", model="cerebras/Cerebras-GPT-590M", max_new_tokens=200
)

original_model = HuggingFacePipeline(pipeline=hf_model)

generated = original_model.predict(prompt, stop=["Observation:", "Human:"])
print(generated)

    Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

     'What's the difference between an iterator and an iterable?'

这并不那么令人印象深刻，是吗？它根本没有遵循 JSON 格式！让我们尝试使用结构化解码。

JSONFormer LLM 包装器

让我们再试一次，现在将 Action 输入的 JSON Schema 提供给模型。

decoder_schema = {
    "title": "Decoding Schema",
    "type": "object",
    "properties": {
        "action": {"type": "string", "default": ask_star_coder.name},
        "action_input": {
            "type": "object",
            "properties": ask_star_coder.args,
        },
    },
}

from langchain_experimental.llms import JsonFormer

json_former = JsonFormer(json_schema=decoder_schema, pipeline=hf_model)

results = json_former.predict(prompt, stop=["Observation:", "Human:"])
print(results)

    {"action": "ask_star_coder", "action_input": {"query": "What's the difference between an iterator and an iter", "temperature": 0.0, "max_new_tokens": 50.0}}

哇！没有解析错误。

JSONFormer

HuggingFace 基准​

JSONFormer LLM 包装器​

HuggingFace 基准

JSONFormer LLM 包装器