Skip to main content

Feature stores

Feature stores是传统机器学习中的一个概念,用于确保输入模型的数据是最新和相关的。欲了解更多信息,请参阅此处

当考虑将 LLM(语言模型微服务) 应用投入生产时,这个概念非常重要。为了个性化 LLM 应用,您可能希望将 LLM 与特定用户的最新信息结合起来。Feature stores可以是保持数据新鲜的绝佳方式,而 LangChain 提供了一种简便的方法来将该数据与 LLM 组合在一起。

在本节中,我们将展示如何将提示模板连接到Feature stores。基本思想是在提示模板内部调用特征存储以检索值,然后将这些值格式化到提示中。

1. Feast

首先,我们将使用流行的开源特征存储框架 Feast

这假设您已经按照 README 中的入门步骤运行过了。我们将基于入门示例进行扩展,并创建一个 LLMChain(语言模型微服务链),以向特定driver写入有关其最新统计信息的便签。

1.1. Load Feast Store 加载Feast

同样,这应该按照 Feast 的 README 中的说明进行设置。

from feast import FeatureStore

# You may need to update the path depending on where you stored it
feast_repo_path = "../../../../../my_feature_repo/feature_repo/"
store = FeatureStore(repo_path=feast_repo_path)

1.2 配置Prompts

在这里,我们将设置一个自定义的 FeastPromptTemplate。这个提示模板将接收一个Driver ID,查找他们的统计数据,并将这些统计数据格式化成一个提示。

请注意,这个提示模板的输入只有 driver_id,因为这是唯一由用户定义的部分(所有其他变量都是在提示模板内部查找的)。

from langchain.prompts import PromptTemplate, StringPromptTemplate

API Reference:

template = """Given the driver's up to date stats, write them note relaying those stats to them.
If they have a conversation rate above .5, give them a compliment. Otherwise, make a silly joke about chickens at the end to make them feel better

Here are the drivers stats:
Conversation rate: {conv_rate}
Acceptance rate: {acc_rate}
Average Daily Trips: {avg_daily_trips}

Your response:"""
prompt = PromptTemplate.from_template(template)
class FeastPromptTemplate(StringPromptTemplate):
def format(self, **kwargs) -> str:
driver_id = kwargs.pop("driver_id")
feature_vector = store.get_online_features(
features=[
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate",
"driver_hourly_stats:avg_daily_trips",
],
entity_rows=[{"driver_id": driver_id}],
).to_dict()
kwargs["conv_rate"] = feature_vector["conv_rate"][0]
kwargs["acc_rate"] = feature_vector["acc_rate"][0]
kwargs["avg_daily_trips"] = feature_vector["avg_daily_trips"][0]
return prompt.format(**kwargs)
prompt_template = FeastPromptTemplate(input_variables=["driver_id"])

print(prompt_template.format(driver_id=1001))

结果:

    Given the driver's up to date stats, write them note relaying those stats to them.
If they have a conversation rate above .5, give them a compliment. Otherwise, make a silly joke about chickens at the end to make them feel better

Here are the drivers stats:
Conversation rate: 0.4745151400566101
Acceptance rate: 0.055561766028404236
Average Daily Trips: 936

Your response:

1.3. 在chain中使用

现在我们可以将其用于链式调用,成功创建一个利用特征存储支持的个性化链。

from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain

API Reference:

chain = LLMChain(llm=ChatOpenAI(), prompt=prompt_template)
chain.run(1001)

结果:

    "Hi there! I wanted to update you on your current stats. Your acceptance rate is 0.055561766028404236 and your average daily trips are 936. While your conversation rate is currently 0.4745151400566101, I have no doubt that with a little extra effort, you'll be able to exceed that .5 mark! Keep up the great work! And remember, even chickens can't always cross the road, but they still give it their best shot."

2. Tecton

上面,我们展示了如何在LangChain中使用Feast,这是一个流行的开源自托管特征存储。接下来的示例将展示类似的集成,但使用Tecton。Tecton是一个完全托管的特征平台,用于编排完整的机器学习特征生命周期,从转换到在线服务,具备企业级的SLA保证。

2.1 准备工作

  • Tecton Deployment (sign up at https://tecton.ai)
  • TECTON_API_KEY environment variable set to a valid Service Account key

2.2 准备和加载特征

我们将使用Tecton教程中的"user_transaction_counts"特征视图作为Feature Service的一部分。为简单起见,我们仅使用一个特征视图;但是,更复杂的应用可能需要更多的特征视图来检索其提示所需的特征。

user_transaction_metrics = FeatureService(
name = "user_transaction_metrics",
features = [user_transaction_counts]
)

上述Feature Service预计应用于一个活动的工作空间。对于此示例,我们将使用"prod"工作空间。

import tecton

workspace = tecton.get_workspace("prod")
feature_service = workspace.get_feature_service("user_transaction_metrics")

2.3 配置Prompts

在这里,我们将设置一个自定义的TectonPromptTemplate。这个提示模板将接受一个user_id,查找他们的统计信息,并将这些统计信息格式化成一个提示。

请注意,这个提示模板的输入只是user_id,因为这是唯一由用户定义的部分(所有其他变量都在提示模板内部查找)。

from langchain.prompts import PromptTemplate, StringPromptTemplate

template = """Given the vendor's up to date transaction stats, write them a note based on the following rules:

1. If they had a transaction in the last day, write a short congratulations message on their recent sales
2. If no transaction in the last day, but they had a transaction in the last 30 days, playfully encourage them to sell more.
3. Always add a silly joke about chickens at the end

Here are the vendor's stats:
Number of Transactions Last Day: {transaction_count_1d}
Number of Transactions Last 30 Days: {transaction_count_30d}

Your response:"""
prompt = PromptTemplate.from_template(template)


class TectonPromptTemplate(StringPromptTemplate):
def format(self, **kwargs) -> str:
user_id = kwargs.pop("user_id")
feature_vector = feature_service.get_online_features(
join_keys={"user_id": user_id}
).to_dict()
kwargs["transaction_count_1d"] = feature_vector[
"user_transaction_counts.transaction_count_1d_1d"
]
kwargs["transaction_count_30d"] = feature_vector[
"user_transaction_counts.transaction_count_30d_1d"
]
return prompt.format(**kwargs)
prompt_template = TectonPromptTemplate(input_variables=["user_id"])
print(prompt_template.format(user_id="user_469998441571"))
    Given the vendor's up to date transaction stats, write them a note based on the following rules:

1. If they had a transaction in the last day, write a short congratulations message on their recent sales
2. If no transaction in the last day, but they had a transaction in the last 30 days, playfully encourage them to sell more.
3. Always add a silly joke about chickens at the end

Here are the vendor's stats:
Number of Transactions Last Day: 657
Number of Transactions Last 30 Days: 20326

Your response:

2.4 在链中使用

现在我们可以将其用于链式结构,成功创建一个通过Tecton特征平台支持的个性化链。

from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
chain = LLMChain(llm=ChatOpenAI(), prompt=prompt_template)
chain.run("user_469998441571")
    'Wow, congratulations on your recent sales! Your business is really soaring like a chicken on a hot air balloon! Keep up the great work!'

3. Featureform

最后,我们将使用 Featureform 一个开源的企业级特征存储来运行相同的示例。Featureform 允许你使用你的基础设施,比如 Spark 或本地,来定义你的特征转换。

3.1 初始化 Featureform

你可以按照 README 中的说明来初始化你的转换和特征在 Featureform。

import featureform as ff

client = ff.Client(host="demo.featureform.com")

3.2 提示

这里我们将设置一个自定义的 FeatureformPromptTemplate。这个提示模板将接受用户每笔交易的平均金额。

注意,这个提示模板的输入只是 avg_transaction,因为这是唯一由用户定义的部分(所有其他变量都在提示模板中查找)。

from langchain.prompts import PromptTemplate, StringPromptTemplate

template = """
Given the amount a user spends on average per transaction, let them know if they are a high roller. Otherwise, make a silly joke about chickens at the end to make them feel better

Here are the user's stats:
Average Amount per Transaction: ${avg_transcation}

Your response:
"""

prompt = PromptTemplate.from_template(template)

class FeatureformPromptTemplate(StringPromptTemplate):
def format(self, **kwargs) -> str:
user_id = kwargs.pop("user_id")
fpf = client.features([("avg_transactions", "quickstart")], {"user": user_id})
return prompt.format(**kwargs)
prompt_template = FeatureformPrompTemplate(input_variables=["user_id"])
print(prompt_template.format(user_id="C1410926"))

3.3 在链中使用

现在,我们可以在链中使用这个,成功地创建一个由 Featureform 特性平台支持的个性化链。

from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
chain = LLMChain(llm=ChatOpenAI(), prompt=prompt_template)
chain.run("C1410926")

4. AzureML 管理的特性存储

我们将使用 AzureML 管理的特性存储 来运行以下示例。

4.1 前提条件

  • 使用此处的说明创建具有在线实体化的特性存储。

  • 按照说明成功创建的特性存储应具有版本为1account特性集。它将具有accountID作为索引列,特性为accountAgeaccountCountrynumPaymentRejects1dPerUser

4.2 提示

  • 在这里,我们将设置一个自定义的AzureMLFeatureStorePromptTemplate。此提示模板将输入一个account_id和一个可选的query。然后,它从特性存储中获取特性值,并将这些特性格式化为输出提示。请注意,此提示模板的必需输入只是account_id,因为这是用户定义的唯一部分(提示模板内查找所有其他变量)。

  • 还请注意,这是一个展示如何利用AzureML管理的特性存储进行LLM应用的引导示例。开发者可以进一步改进提示模板以满足他们的需要。

import os
os.environ['AZURE_ML_CLI_PRIVATE_FEATURES_ENABLED'] = 'True'

import pandas

from pydantic import Extra
from langchain.prompts import PromptTemplate, StringPromptTemplate
from azure.identity import AzureCliCredential
from azureml.featurestore import FeatureStoreClient, init_online_lookup, get_online_features

class AzureMLFeatureStorePromptTemplate(StringPromptTemplate, extra=Extra.allow):

def __init__(self, subscription_id: str, resource_group: str, feature_store_name: str, **kwargs):
# this is an example template for proof of concept and can be changed to suit the developer needs
template = """
{query}
###
account id = {account_id}
account age = {account_age}
account country = {account_country}
payment rejects 1d per user = {payment_rejects_1d_per_user}
###
"""
prompt_template=PromptTemplate.from_template(template)
super().__init__(prompt=prompt_template, input_variables=["account_id", "query"])

# use AzureMLOnBehalfOfCredential() in spark context
credential = AzureCliCredential()

self._fs_client = FeatureStoreClient(
credential=credential,
subscription_id=subscription_id,
resource_group_name=resource_group,
name=feature_store_name)

self._feature_set = self._fs_client.feature_sets.get(name="accounts", version=1)

init_online_lookup(self._feature_set.features, credential, force=True)


def format(self, **kwargs) -> str:
if "account_id" not in kwargs:
raise "account_id needed to fetch details from feature store"
account_id = kwargs.pop("account_id")

query=""
if "query" in kwargs:
query = kwargs.pop("query")

# feature set is registered with accountID as entity index column.
obs = pandas.DataFrame({'accountID': [account_id]})

# get the feature details for the input entity from feature store.
df = get_online_features(self._feature_set.features, obs)

# populate prompt template output using the fetched feature values.
kwargs["query"] = query
kwargs["account_id"] = account_id
kwargs["account_age"] = df["accountAge"][0]
kwargs["account_country"] = df["accountCountry"][0]
kwargs["payment_rejects_1d_per_user"] = df["numPaymentRejects1dPerUser"][0]

return self.prompt.format(**kwargs)

4.3 测试

# Replace the place holders below with actual details of feature store that was created in previous steps

prompt_template = AzureMLFeatureStorePromptTemplate(
subscription_id="",
resource_group="",
feature_store_name="")
print(prompt_template.format(account_id="A1829581630230790"))
                ###
account id = A1829581630230790
account age = 563.0
account country = GB
payment rejects 1d per user = 15.0
###

4.4 在链中使用

我们现在可以在一个链中使用这个,成功地创建一个由AzureML 管理的特性存储支持的个性化链。

os.environ["OPENAI_API_KEY"]="" # Fill the open ai key here

from langchain.chat_models import ChatOpenAI
from langchain import LLMChain

chain = LLMChain(llm=ChatOpenAI(), prompt=prompt_template)

# NOTE: developer's can further fine tune AzureMLFeatureStorePromptTemplate
# for getting even more accurate results for the input query
chain.predict(account_id="A1829581630230790", query ="write a small thank you note within 20 words if account age > 10 using the account stats")
    'Thank you for being a valued member for over 10 years! We appreciate your continued support.'