自定义成对评估器

您可以通过继承PairwiseStringEvaluator类并重写_evaluate_string_pairs方法（如果要异步使用评估器，则还需重写_aevaluate_string_pairs方法）来创建自己的成对字符串评估器。

在这个例子中，您将创建一个简单的自定义评估器，它只返回第一个预测的分词“单词”比第二个预测的多的情况。

您可以查看PairwiseStringEvaluator接口的参考文档以获取更多信息。

from typing import Optional, Any  
from langchain.evaluation import PairwiseStringEvaluator  

class LengthComparisonPairwiseEvalutor(PairwiseStringEvaluator):  
    """  
    自定义评估器，用于比较两个字符串。  
    """  

    def _evaluate_string_pairs(  
        self,  
        *,  
        prediction: str,  
        prediction_b: str,  
        reference: Optional[str] = None,  
        input: Optional[str] = None,  
        **kwargs: Any,  
    ) -> dict:  
        score = int(len(prediction.split()) > len(prediction_b.split()))  
        return {"score": score}  

API 参考:

PairwiseStringEvaluator 来自 langchain.evaluation

evaluator = LengthComparisonPairwiseEvalutor()  

evaluator.evaluate_string_pairs(  
    prediction="The quick brown fox jumped over the lazy dog.",  
    prediction_b="The quick brown fox jumped over the dog.",  
)  

输出结果为:

{'score': 1}

基于LLM的示例

上面的示例只是为了说明API，实际上并不太有用。下面，使用一个带有自定义指令的LLM来形成一个类似于内置的PairwiseStringEvalChain的简单偏好评分器。我们将使用ChatAnthropic作为评估器链。

# %pip install anthropic  
# %env ANTHROPIC_API_KEY=YOUR_API_KEY  

from typing import Optional, Any  
from langchain.evaluation import PairwiseStringEvaluator  
from langchain.chat_models import ChatAnthropic  
from langchain.chains import LLMChain  

class CustomPreferenceEvaluator(PairwiseStringEvaluator):  
    """  
    自定义评估器，使用自定义的LLMChain来比较两个字符串。  
    """  

    def __init__(self) -> None:  
        llm = ChatAnthropic(model="claude-2", temperature=0)  
        self.eval_chain = LLMChain.from_string(  
            llm,  
            """Which option is preferred? Do not take order into account. Evaluate based on accuracy and helpfulness. If neither is preferred, respond with C. Provide your reasoning, then finish with Preference: A/B/C  

Input: How do I get the path of the parent directory in python 3.8?  
Option A: You can use the following code:  
  
import os  
os.path.dirname(os.path.dirname(os.path.abspath(__file__)))  
    
Option B: You can use the following code:
   
from pathlib import Path  
Path(__file__).absolute().parent

Reasoning: Both options return the same result. However, since option B is
more concise and easily understand, it is preferred. Preference: B

Which option is preferred? Do not take order into account. Evaluate based on accuracy and helpfulness. If neither is preferred, respond with C. Provide your reasoning, then finish with Preference: A/B/C Input: {input} Option A: {prediction} Option B: {prediction_b} Reasoning:""", )
""")

    @property  
    def requires_input(self) -> bool:  
        return True  

    @property  
    def requires_reference(self) -> bool:  
        return False  

    def _evaluate_string_pairs(  
        self,  
        *,  
        prediction: str,  
        prediction_b: str,  
        reference: Optional[str] = None,  
        input: Optional[str] = None,  
        **kwargs: Any,  
    ) -> dict:  
        result = self.eval_chain(  
            {  
                "input": input,  
                "prediction": prediction,  
                "prediction_b": prediction_b,  
                "stop": ["Which option is preferred?"],  
            },  
            **kwargs,  
        )  

        response_text = result["text"]  
        reasoning, preference = response_text.split("Preference:", maxsplit=1)  
        preference = preference.strip()  
        score = 1.0 if preference == "A" else (0.0 if preference == "B" else None)  
        return {"reasoning": reasoning.strip(), "value": preference, "score": score}  

API 参考:

PairwiseStringEvaluator 来自 langchain.evaluation
ChatAnthropic 来自 langchain.chat_models
LLMChain 来自 langchain.chains

evaluator = CustomPreferenceEvaluator()  

evaluator.evaluate_string_pairs(  
    input="How do I import from a relative directory?",  
    prediction="use importlib! importlib.import_module('.my_package', '.')",  
    prediction_b="from .sibling import foo",  
)  

输出结果为:

{'reasoning': 'Option B is preferred over option A for importing from a relative directory, because it is more straightforward and concise.\n\nOption A uses the importlib module, which allows importing a module by specifying the full name as a string. While this works, it is less clear compared to option B.\n\nOption B directly imports from the relative path using dot notation, which clearly shows that it is a relative import. This is the recommended way to do relative imports in Python.\n\nIn summary, option B is more accurate and helpful as it uses the standard Python relative import syntax.',  
 'value': 'B',  
 'score': 0.0}  

# 将requires_input设置为返回True，以添加额外的验证，以避免在提供给链条的数据不足时返回分数。

try:  
    evaluator.evaluate_string_pairs(  
        prediction="use importlib! importlib.import_module('.my_package', '.')",  
        prediction_b="from .sibling import foo",  
    )  
except ValueError as e:  
    print(e)  

输出结果为:

CustomPreferenceEvaluator requires an input string.

自定义成对评估器

API 参考:​

基于LLM的示例​

API 参考:​

API 参考:

基于LLM的示例

API 参考: