Skip to main content

SQL数据库代理

This notebook展示了一个与SQL数据库交互的代理。该代理基于SQLDatabaseChain构建,并设计用于回答关于数据库的更一般性问题,以及从错误中恢复。

请注意,由于该代理正在积极开发中,所有答案可能不正确。此外,不能保证代理在某些问题上不对您的数据库执行DML语句。在运行时请小心处理敏感数据!

这个示例使用了Chinook数据库。要设置它,请按照https://database.guide/2-sample-databases-sqlite/上的说明进行操作,并将.db文件放置在此存储库根目录下的notebooks文件夹中。

初始化 (Initialization)

from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.sql_database import SQLDatabase
from langchain.llms.openai import OpenAI
from langchain.agents import AgentExecutor
from langchain.agents.agent_types import AgentType
from langchain.chat_models import ChatOpenAI
db = SQLDatabase.from_uri("sqlite:///../../../../../notebooks/Chinook.db")
toolkit = SQLDatabaseToolkit(db=db, llm=OpenAI(temperature=0))

使用ZERO_SHOT_REACT_DESCRIPTION

这展示了如何使用ZERO_SHOT_REACT_DESCRIPTION代理类型来初始化代理。

agent_executor = create_sql_agent(
llm=OpenAI(temperature=0),
toolkit=toolkit,
verbose=True,
agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
)

使用OpenAI函数

这展示了如何使用OPENAI_FUNCTIONS代理类型来初始化代理。请注意,这是与上述方法的另一种选择。

# agent_executor = create_sql_agent(
# llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613"),
# toolkit=toolkit,
# verbose=True,
# agent_type=AgentType.OPENAI_FUNCTIONS
# )

免责声明 ⚠️ (Disclaimer)

查询链可能会生成插入/更新/删除查询。当不希望出现这种情况时,请使用自定义提示或创建一个没有写权限的SQL用户。

最终用户可能会通过提出一个简单的问题,如“运行最大的查询”,来使您的SQL数据库超负荷运行。生成的查询可能如下所示:

SELECT * FROM "public"."users"
JOIN "public"."user_permissions" ON "public"."users".id = "public"."user_permissions".user_id
JOIN "public"."projects" ON "public"."users".id = "public"."projects".user_id
JOIN "public"."events" ON "public"."projects".id = "public"."events".project_id;

对于事务性SQL数据库,如果上述任一表包含数百万行数据,该查询可能会对使用同一数据库的其他应用程序造成麻烦。

大多数面向数据仓库的数据库支持用户级别的配额,用于限制资源使用。

示例:描述一个表

agent_executor.run("描述playlisttrack表")
    

> 进入新的链...

调用: `list_tables_sql_db` with `{}`


Album, Artist, Track, PlaylistTrack, InvoiceLine, sales_table, Playlist, Genre, Employee, Customer, Invoice, MediaType
调用: `schema_sql_db` with `PlaylistTrack`



CREATE TABLE "PlaylistTrack" (
"PlaylistId" INTEGER NOT NULL,
"TrackId" INTEGER NOT NULL,
PRIMARY KEY ("PlaylistId", "TrackId"),
FOREIGN KEY("TrackId") REFERENCES "Track" ("TrackId"),
FOREIGN KEY("PlaylistId") REFERENCES "Playlist" ("PlaylistId")
)

/*
3 rows from PlaylistTrack table:
PlaylistId TrackId
1 3402
1 3389
1 3390
*/`PlaylistTrack`表有两列:`PlaylistId`和`TrackId`。它是一个连接表,表示播放列表和曲目之间的关系。

这是`PlaylistTrack`表的模式:

CREATE TABLE "PlaylistTrack" ( "PlaylistId" INTEGER NOT NULL, "TrackId" INTEGER NOT NULL, PRIMARY KEY ("PlaylistId", "TrackId"), FOREIGN KEY("TrackId") REFERENCES "Track" ("TrackId"), FOREIGN KEY("PlaylistId") REFERENCES "Playlist" ("PlaylistId") )


这是`PlaylistTrack`表的三个示例行:

PlaylistId TrackId 1 3402 1 3389 1 3390


如果还有其他问题需要帮助,请告诉我。

> 链结束。



'The `PlaylistTrack` table has two columns: `PlaylistId` and `TrackId`. It is a junction table that represents the relationship between playlists and tracks. \n\nHere is the schema of the `PlaylistTrack` table:\n\n```\nCREATE TABLE "PlaylistTrack" (\n\t"PlaylistId" INTEGER NOT NULL, \n\t"TrackId" INTEGER NOT NULL, \n\tPRIMARY KEY ("PlaylistId", "TrackId"), \n\tFOREIGN KEY("TrackId") REFERENCES "Track" ("TrackId"), \n\tFOREIGN KEY("PlaylistId") REFERENCES "Playlist" ("PlaylistId")\n)\n```\n\nHere are three sample rows from the `PlaylistTrack` table:\n\n```\nPlaylistId TrackId\n1 3402\n1 3389\n1 3390\n```\n\nPlease let me know if there is anything else I can help you with.'

示例:描述一个表,从错误中恢复

在这个示例中,代理尝试搜索一个不存在的表,但找到了最接近的结果

agent_executor.run("描述playlistsong表")
    

> 进入新的AgentExecutor链...
动作:list_tables_sql_db
动作输入:""
观察结果:Genre, PlaylistTrack, MediaType, Invoice, InvoiceLine, Track, Playlist, Customer, Album, Employee, Artist
思考:我应该查看PlaylistSong表的模式
动作:schema_sql_db
动作输入:"PlaylistSong"
观察结果:错误:数据库中找不到表名为{'PlaylistSong'}的表
思考:我应该检查表名的拼写
动作:list_tables_sql_db
动作输入:""
观察结果:Genre, PlaylistTrack, MediaType, Invoice, InvoiceLine, Track, Playlist, Customer, Album, Employee, Artist
思考:该表名为PlaylistTrack
动作:schema_sql_db
动作输入:"PlaylistTrack"
观察结果:
CREATE TABLE "PlaylistTrack" (
"PlaylistId" INTEGER NOT NULL,
"TrackId" INTEGER NOT NULL,
PRIMARY KEY ("PlaylistId", "TrackId"),
FOREIGN KEY("TrackId") REFERENCES "Track" ("TrackId"),
FOREIGN KEY("PlaylistId") REFERENCES "Playlist" ("PlaylistId")
)

SELECT * FROM 'PlaylistTrack' LIMIT 3;
PlaylistId TrackId
1 3402
1 3389
1 3390
思考:我现在知道最终答案了
最终答案:PlaylistTrack表包含两列,PlaylistId和TrackId,都是整数类型,用于链接Playlist和Track表。

> 链结束。





'PlaylistTrack表包含两列,PlaylistId和TrackId,都是整数类型,用于链接Playlist和Track表。'

示例:运行查询

agent_executor.run(
"List the total sales per country. Which country's customers spent the most?"
)
    

> 进入新的AgentExecutor链...
动作:list_tables_sql_db
动作输入:""
观察结果:Invoice, MediaType, Artist, InvoiceLine, Genre, Playlist, Employee, Album, PlaylistTrack, Track, Customer
思考:我应该查看相关表的模式,看看可以使用哪些列。
动作:schema_sql_db
动作输入:"Invoice, Customer"
观察结果:
CREATE TABLE "Customer" (
"CustomerId" INTEGER NOT NULL,
"FirstName" NVARCHAR(40) NOT NULL,
"LastName" NVARCHAR(20) NOT NULL,
"Company" NVARCHAR(80),
"Address" NVARCHAR(70),
"City" NVARCHAR(40),
"State" NVARCHAR(40),
"Country" NVARCHAR(40),
"PostalCode" NVARCHAR(10),
"Phone" NVARCHAR(24),
"Fax" NVARCHAR(24),
"Email" NVARCHAR(60) NOT NULL,
"SupportRepId" INTEGER,
PRIMARY KEY ("CustomerId"),
FOREIGN KEY("SupportRepId") REFERENCES "Employee" ("EmployeeId")
)

SELECT * FROM 'Customer' LIMIT 3;
CustomerId FirstName LastName Company Address City State Country PostalCode Phone Fax Email SupportRepId
1 Luís Gonçalves Embraer - Empresa Brasileira de Aeronáutica S.A. Av. Brigadeiro Faria Lima, 2170 São José dos Campos SP Brazil 12227-000 +55 (12) 3923-5555 +55 (12) 3923-5566 luisg@embraer.com.br 3
2 Leonie Köhler None Theodor-Heuss-Straße 34 Stuttgart None Germany 70174 +49 0711 2842222 None leonekohler@surfeu.de 5
3 François Tremblay None 1498 rue Bélanger Montréal QC Canada H2G 1A7 +1 (514) 721-4711 None ftremblay@gmail.com 3


CREATE TABLE "Invoice" (
"InvoiceId" INTEGER NOT NULL,
"CustomerId" INTEGER NOT NULL,
"InvoiceDate" DATETIME NOT NULL,
"BillingAddress" NVARCHAR(70),
"BillingCity" NVARCHAR(40),
"BillingState" NVARCHAR(40),
"BillingCountry" NVARCHAR(40),
"BillingPostalCode" NVARCHAR(10),
"Total" NUMERIC(10, 2) NOT NULL,
PRIMARY KEY ("InvoiceId"),
FOREIGN KEY("CustomerId") REFERENCES "Customer" ("CustomerId")
)

SELECT * FROM 'Invoice' LIMIT 3;
InvoiceId CustomerId InvoiceDate BillingAddress BillingCity BillingState BillingCountry BillingPostalCode Total
1 2 2009-01-01 00:00:00 Theodor-Heuss-Straße 34 Stuttgart None Germany 70174 1.98
2 4 2009-01-02 00:00:00 Ullevålsveien 14 Oslo None Norway 0171 3.96
3 8 2009-01-03 00:00:00 Grétrystraat 63 Brussels None Belgium 1000 5.94
思考:我应该查询Invoice和Customer表,以获取每个国家的总销售额。
动作:query_sql_db
动作输入:SELECT c.Country, SUM(i.Total) AS TotalSales FROM Invoice i INNER JOIN Customer c ON i.CustomerId = c.CustomerId GROUP BY c.Country ORDER BY TotalSales DESC LIMIT 10
观察结果:[('USA', 523.0600000000003), ('Canada', 303.9599999999999), ('France', 195.09999999999994), ('Brazil', 190.09999999999997), ('Germany', 156.48), ('United Kingdom', 112.85999999999999), ('Czech Republic', 90.24000000000001), ('Portugal', 77.23999999999998), ('India', 75.25999999999999), ('Chile', 46.62)]
思考:我现在知道最终答案了。
最终答案:美国的客户花费最多,总计523.06美元。

> 链结束。





'美国的客户花费最多,总计523.06美元。'
agent_executor.run(
"Show the total number of tracks in each playlist. The Playlist name should be included in the result."
)
    

> 进入新的AgentExecutor链...
动作:list_tables_sql_db
动作输入:""
观察结果:Invoice, MediaType, Artist, InvoiceLine, Genre, Playlist, Employee, Album, PlaylistTrack, Track, Customer
思考:我应该查看Playlist和PlaylistTrack表的模式,看看可以使用哪些列。
动作:schema_sql_db
动作输入:"Playlist, PlaylistTrack"
观察结果:
CREATE TABLE "Playlist" (
"PlaylistId" INTEGER NOT NULL,
"Name" NVARCHAR(120),
PRIMARY KEY ("PlaylistId")
)

SELECT * FROM 'Playlist' LIMIT 3;
PlaylistId Name
1 Music
2 Movies
3 TV Shows


CREATE TABLE "PlaylistTrack" (
"PlaylistId" INTEGER NOT NULL,
"TrackId" INTEGER NOT NULL,
PRIMARY KEY ("PlaylistId", "TrackId"),
FOREIGN KEY("TrackId") REFERENCES "Track" ("TrackId"),
FOREIGN KEY("PlaylistId") REFERENCES "Playlist" ("PlaylistId")
)

SELECT * FROM 'PlaylistTrack' LIMIT 3;
PlaylistId TrackId
1 3402
1 3389
1 3390
思考:我可以使用SELECT语句获取每个播放列表中的曲目总数。
动作:query_checker_sql_db
动作输入:SELECT Playlist.Name, COUNT(PlaylistTrack.TrackId) AS TotalTracks FROM Playlist INNER JOIN PlaylistTrack ON Playlist.PlaylistId = PlaylistTrack.PlaylistId GROUP BY Playlist.Name
观察结果:

SELECT Playlist.Name, COUNT(PlaylistTrack.TrackId) AS TotalTracks FROM Playlist INNER JOIN PlaylistTrack ON Playlist.PlaylistId = PlaylistTrack.PlaylistId GROUP BY Playlist.Name
思考:查询看起来正确,我现在可以执行它了。
动作:query_sql_db
动作输入:SELECT Playlist.Name, COUNT(PlaylistTrack.TrackId) AS TotalTracks FROM Playlist INNER JOIN PlaylistTrack ON Playlist.PlaylistId = PlaylistTrack.PlaylistId GROUP BY Playlist.Name LIMIT 10
观察结果:[('90’s Music', 1477), ('Brazilian Music', 39), ('Classical', 75), ('Classical 101 - Deep Cuts', 25), ('Classical 101 - Next Steps', 25), ('Classical 101 - The Basics', 25), ('Grunge', 15), ('Heavy Metal Classic', 26), ('Music', 6580), ('Music Videos', 1)]
思考:我现在知道最终答案了。
最终答案:每个播放列表中的曲目总数为:'90’s Music' (1477), 'Brazilian Music' (39), 'Classical' (75), 'Classical 101 - Deep Cuts' (25), 'Classical 101 - Next Steps' (25), 'Classical 101 - The Basics' (25), 'Grunge' (15), 'Heavy Metal Classic' (26), 'Music' (6580), 'Music Videos' (1)。

> 链结束。





"每个播放列表中的曲目总数为:'90’s Music' (1477), 'Brazilian Music' (39), 'Classical' (75), 'Classical 101 - Deep Cuts' (25), 'Classical 101 - Next Steps' (25), 'Classical 101 - The Basics' (25), 'Grunge' (15), 'Heavy Metal Classic' (26), 'Music' (6580), 'Music Videos' (1)。"

从错误中恢复

在这个例子中,代理在最初尝试访问一个不存在的属性(Track.ArtistId)后能够从错误中恢复。

agent_executor.run("谁是前三名最畅销的艺术家?")
    

> 进入新的AgentExecutor链...
动作:list_tables_sql_db
动作输入:""
观察结果:MediaType, Track, Invoice, Album, Playlist, Customer, Employee, InvoiceLine, PlaylistTrack, Genre, Artist
思考:我应该查看Artist、InvoiceLine和Track表的模式,以了解可以使用的列。
动作:schema_sql_db
动作输入:"Artist, InvoiceLine, Track"
观察结果:
CREATE TABLE "Artist" (
"ArtistId" INTEGER NOT NULL,
"Name" NVARCHAR(120),
PRIMARY KEY ("ArtistId")
)

SELECT * FROM 'Artist' LIMIT 3;
ArtistId Name
1 AC/DC
2 Accept
3 Aerosmith


CREATE TABLE "Track" (
"TrackId" INTEGER NOT NULL,
"Name" NVARCHAR(200) NOT NULL,
"AlbumId" INTEGER,
"MediaTypeId" INTEGER NOT NULL,
"GenreId" INTEGER,
"Composer" NVARCHAR(220),
"Milliseconds" INTEGER NOT NULL,
"Bytes" INTEGER,
"UnitPrice" NUMERIC(10, 2) NOT NULL,
PRIMARY KEY ("TrackId"),
FOREIGN KEY("MediaTypeId") REFERENCES "MediaType" ("MediaTypeId"),
FOREIGN KEY("GenreId") REFERENCES "Genre" ("GenreId"),
FOREIGN KEY("AlbumId") REFERENCES "Album" ("AlbumId")
)

SELECT * FROM 'Track' LIMIT 3;
TrackId Name AlbumId MediaTypeId GenreId Composer Milliseconds Bytes UnitPrice
1 For Those About To Rock (We Salute You) 1 1 1 Angus Young, Malcolm Young, Brian Johnson 343719 11170334 0.99
2 Balls to the Wall 2 2 1 None 342562 5510424 0.99
3 Fast As a Shark 3 2 1 F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman 230619 3990994 0.99


CREATE TABLE "InvoiceLine" (
"InvoiceLineId" INTEGER NOT NULL,
"InvoiceId" INTEGER NOT NULL,
"TrackId" INTEGER NOT NULL,
"UnitPrice" NUMERIC(10, 2) NOT NULL,
"Quantity" INTEGER NOT NULL,
PRIMARY KEY ("InvoiceLineId"),
FOREIGN KEY("TrackId") REFERENCES "Track" ("TrackId"),
FOREIGN KEY("InvoiceId") REFERENCES "Invoice" ("InvoiceId")
)

SELECT * FROM 'InvoiceLine' LIMIT 3;
InvoiceLineId InvoiceId TrackId UnitPrice Quantity
1 1 2 0.99 1
2 1 4 0.99 1
3 2 6 0.99 1
思考:我应该查询数据库以获取前三名最畅销的艺术家。
动作:query_sql_db
动作输入:SELECT Artist.Name, SUM(InvoiceLine.Quantity) AS TotalQuantity FROM Artist INNER JOIN Track ON Artist.ArtistId = Track.ArtistId INNER JOIN InvoiceLine ON Track.TrackId = InvoiceLine.TrackId GROUP BY Artist.Name ORDER BY TotalQuantity DESC LIMIT 3
观察结果:错误:(sqlite3.OperationalError)没有这样的列:Track.ArtistId
[SQL: SELECT Artist.Name, SUM(InvoiceLine.Quantity) AS TotalQuantity FROM Artist INNER JOIN Track ON Artist.ArtistId = Track.ArtistId INNER JOIN InvoiceLine ON Track.TrackId = InvoiceLine.TrackId GROUP BY Artist.Name ORDER BY TotalQuantity DESC LIMIT 3]
(有关此错误的背景信息,请访问:https://sqlalche.me/e/14/e3q8)
思考:在执行之前,我应该再次检查我的查询。
动作:query_checker_sql_db
动作输入:SELECT Artist.Name, SUM(InvoiceLine.Quantity) AS TotalQuantity FROM Artist INNER JOIN Track ON Artist.ArtistId = Track.ArtistId INNER JOIN InvoiceLine ON Track.TrackId = InvoiceLine.TrackId GROUP BY Artist.Name ORDER BY TotalQuantity DESC LIMIT 3
观察结果:

SELECT Artist.Name, SUM(InvoiceLine.Quantity) AS TotalQuantity
FROM Artist
INNER JOIN Track ON Artist.ArtistId = Track.ArtistId
INNER JOIN InvoiceLine ON Track.TrackId = InvoiceLine.TrackId
GROUP BY Artist.Name
ORDER BY TotalQuantity DESC
LIMIT 3;
思考:我现在知道最终答案了。
动作:query_sql_db
动作输入:SELECT Artist.Name, SUM(InvoiceLine.Quantity) AS TotalQuantity FROM Artist INNER JOIN Album ON Artist.ArtistId = Album.ArtistId INNER JOIN Track ON Album.AlbumId = Track.AlbumId INNER JOIN InvoiceLine ON Track.TrackId = InvoiceLine.TrackId GROUP BY Artist.Name ORDER BY TotalQuantity DESC LIMIT 3
观察结果:[('Iron Maiden', 140), ('U2', 107), ('Metallica', 91)]
思考:我现在知道最终答案了。
最终答案:前三名最畅销的艺术家是Iron Maiden、U2和Metallica。

> 链结束。





'前三名最畅销的艺术家是Iron Maiden、U2和Metallica。'