GitBook
GitBook是一个现代化的文档平台,团队可以在这里记录从产品到内部知识库和API的所有内容。
本笔记本展示了如何从任何GitBook
中提取页面数据。
from langchain.document_loaders import GitbookLoader
从单个GitBook页面加载
loader = GitbookLoader("https://docs.gitbook.com")
page_data = loader.load()
page_data
[Document(page_content='Introduction to GitBook\nGitBook is a modern documentation platform where teams can document everything from products to internal knowledge bases and APIs.\nWe want to help \nteams to work more efficiently\n by creating a simple yet powerful platform for them to \nshare their knowledge\n.\nOur mission is to make a \nuser-friendly\n and \ncollaborative\n product for everyone to create, edit and share knowledge through documentation.\nPublish your documentation in 5 easy steps\nImport\n\nMove your existing content to GitBook with ease.\nGit Sync\n\nBenefit from our bi-directional synchronisation with GitHub and GitLab.\nOrganise your content\n\nCreate pages and spaces and organize them into collections\nCollaborate\n\nInvite other users and collaborate asynchronously with ease.\nPublish your docs\n\nShare your documentation with selected users or with everyone.\nNext\n - Getting started\nOverview\nLast modified \n3mo ago', lookup_str='', metadata={'source': 'https://docs.gitbook.com', 'title': 'Introduction to GitBook'}, lookup_index=0)]
从给定GitBook的所有路径加载
为了使其工作,GitbookLoader需要使用根路径(在本例中为https://docs.gitbook.com
)进行初始化,并将load_all_paths
设置为True
。
loader = GitbookLoader("https://docs.gitbook.com", load_all_paths=True)
all_pages_data = loader.load()
从 https://docs.gitbook.com/ 获取文本
从 https://docs.gitbook.com/getting-started/overview 获取文本
从 https://docs.gitbook.com/getting-started/import 获取文本
从 https://docs.gitbook.com/getting-started/git-sync 获取文本
从 https://docs.gitbook.com/getting-started/content-structure 获取文本
从 https://docs.gitbook.com/getting-started/collaboration 获取文本
从 https://docs.gitbook.com/getting-started/publishing 获取文本
从 https://docs.gitbook.com/tour/quick-find 获取文本
从 https://docs.gitbook.com/tour/editor 获取文本
从 https://docs.gitbook.com/tour/customization 获取文本
从 https://docs.gitbook.com/tour/member-management 获取文本
从 https://docs.gitbook.com/tour/pdf-export 获取文本
从 https://docs.gitbook.com/tour/activity-history 获取文本
从 https://docs.gitbook.com/tour/insights 获取文本
从 https://docs.gitbook.com/tour/notifications 获取文本
从 https://docs.gitbook.com/tour/internationalization 获取文本
从 https://docs.gitbook.com/tour/keyboard-shortcuts 获取文本
从 https://docs.gitbook.com/tour/seo 获取文本
从 https://docs.gitbook.com/advanced-guides/custom-domain 获取文本
从 https://docs.gitbook.com/advanced-guides/advanced-sharing-and-security 获取文本
从 https://docs.gitbook.com/advanced-guides/integrations 获取文本
从 https://docs.gitbook.com/billing-and-admin/account-settings 获取文本
从 https://docs.gitbook.com/billing-and-admin/plans 获取文本
从 https://docs.gitbook.com/troubleshooting/faqs 获取文本
从 https://docs.gitbook.com/troubleshooting/hard-refresh 获取文本
从 https://docs.gitbook.com/troubleshooting/report-bugs 获取文本
从 https://docs.gitbook.com/troubleshooting/connectivity-issues 获取文本
从 https://docs.gitbook.com/troubleshooting/support 获取文本
print(f"获取到{len(all_pages_data)}个文档。")
# 显示第二个文档
all_pages_data[2]
获取到28个文档.
Document(page_content="Import\nFind out how to easily migrate your existing documentation and which formats are supported.\nThe import function allows you to migrate and unify existing documentation in GitBook. You can choose to import single or multiple pages although limits apply. \nPermissions\nAll members with editor permission or above can use the import feature.\nSupported formats\nGitBook supports imports from websites or files that are:\nMarkdown (.md or .markdown)\nHTML (.html)\nMicrosoft Word (.docx).\nWe also support import from:\nConfluence\nNotion\nGitHub Wiki\nQuip\nDropbox Paper\nGoogle Docs\nYou can also upload a ZIP\n \ncontaining HTML or Markdown files when \nimporting multiple pages.\nNote: this feature is in beta.\nFeel free to suggest import sources we don't support yet and \nlet us know\n if you have any issues.\nImport panel\nWhen you create a new space, you'll have the option to import content straight away:\nThe new page menu\nImport a page or subpage by selecting \nImport Page\n from the New Page menu, or \nImport Subpage\n in the page action menu, found in the table of contents:\nImport from the page action menu\nWhen you choose your input source, instructions will explain how to proceed.\nAlthough GitBook supports importing content from different kinds of sources, the end result might be different from your source due to differences in product features and document format.\nLimits\nGitBook currently has the following limits for imported content:\nThe maximum number of pages that can be uploaded in a single import is \n20.\nThe maximum number of files (images etc.) that can be uploaded in a single import is \n20.\nGetting started - \nPrevious\nOverview\nNext\n - Getting started\nGit Sync\nLast modified \n4mo ago", lookup_str='', metadata={'source': 'https://docs.gitbook.com/getting-started/import', 'title': 'Import'}, lookup_index=0)