Azure认知服务工具包 (Azure Cognitive Services Toolkit)

这个工具包用于与Azure认知服务API进行交互，实现一些多模态功能。

目前，这个工具包中包含了四个工具：

AzureCogsImageAnalysisTool：用于从图像中提取标题、对象、标签和文本。（注意：由于依赖于azure-ai-vision包，该工具目前尚不支持Mac OS，因为该包目前仅支持Windows和Linux。）
AzureCogsFormRecognizerTool：用于从文档中提取文本、表格和键值对。
AzureCogsSpeech2TextTool：用于将语音转录为文本。
AzureCogsText2SpeechTool：用于将文本合成为语音。

首先，您需要设置一个Azure账户并创建一个认知服务资源。您可以按照这里的说明来创建资源。

然后，您需要获取资源的终结点、密钥和区域，并将它们设置为环境变量。您可以在资源的“密钥和终结点”页面中找到它们。

# !pip install --upgrade azure-ai-formrecognizer > /dev/null
# !pip install --upgrade azure-cognitiveservices-speech > /dev/null

# 对于Windows/Linux
# !pip install --upgrade azure-ai-vision > /dev/null

import os

os.environ["OPENAI_API_KEY"] = "sk-"
os.environ["AZURE_COGS_KEY"] = ""
os.environ["AZURE_COGS_ENDPOINT"] = ""
os.environ["AZURE_COGS_REGION"] = ""

创建工具包 (Create the Toolkit)

from langchain.agents.agent_toolkits import AzureCognitiveServicesToolkit

toolkit = AzureCognitiveServicesToolkit()

[tool.name for tool in toolkit.get_tools()]

    ['Azure Cognitive Services Image Analysis',
     'Azure Cognitive Services Form Recognizer',
     'Azure Cognitive Services Speech2Text',
     'Azure Cognitive Services Text2Speech']

在Agent中使用 (Use within an Agent)

from langchain import OpenAI
from langchain.agents import initialize_agent, AgentType

llm = OpenAI(temperature=0)
agent = initialize_agent(
    tools=toolkit.get_tools(),
    llm=llm,
    agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
)

agent.run(
    "What can I make with these ingredients?"
    "https://images.openai.com/blob/9ad5a2ab-041f-475f-ad6a-b51899c50182/ingredients.png"
)

    
    
    > 进入新的AgentExecutor链...
    
    动作：
    ```
    {
      "action": "Azure Cognitive Services Image Analysis",
      "action_input": "https://images.openai.com/blob/9ad5a2ab-041f-475f-ad6a-b51899c50182/ingredients.png"
    }
    ```
    
    
    观察结果：标题：一组碗中的鸡蛋和面粉
    对象：鸡蛋、鸡蛋、食物
    标签：乳制品、成分、室内、增稠剂、食物、搅拌碗、粉末、面粉、鸡蛋、碗
    思考：我可以使用对象和标签来建议食谱
    动作：
    ```
    {
      "action": "Final Answer",
      "action_input": "您可以用这些材料做煎饼、煎蛋或蛋饼！"
    }
    ```
    
    > 链结束。





    '您可以用这些材料做煎饼、煎蛋或蛋饼！'

audio_file = agent.run("Tell me a joke and read it out for me.")

    
    
    > 进入新的AgentExecutor链...
    动作：
    ```
    {
      "action": "Azure Cognitive Services Text2Speech",
      "action_input": "Why did the chicken cross the playground? To get to the other slide!"
    }
    ```
    
    
    观察结果：/tmp/tmpa3uu_j6b.wav
    思考：我有这个笑话的音频文件
    动作：
    ```
    {
      "action": "Final Answer",
      "action_input": "/tmp/tmpa3uu_j6b.wav"
    }
    ```
    
    > 链结束。





    '/tmp/tmpa3uu_j6b.wav'

from IPython import display

audio = display.Audio(audio_file)
display.display(audio)

Azure认知服务工具包 (Azure Cognitive Services Toolkit)

创建工具包 (Create the Toolkit)​

在Agent中使用 (Use within an Agent)​

创建工具包 (Create the Toolkit)

在Agent中使用 (Use within an Agent)