Skip to content

Commit e08b5c2

Browse files
committed
Created separate component page for each tool, created provider page, and updated packages.yml
1 parent d31aeaa commit e08b5c2

File tree

5 files changed

+583
-91
lines changed

5 files changed

+583
-91
lines changed
Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# ScrapingBee\n",
8+
"*The Best Web Scraping API to Avoid Getting Blocked*\n",
9+
"\n",
10+
"## Overview\n",
11+
"The ScrapingBee web scraping API handles headless browsers, rotates proxies for you, and offers AI-powered data extraction.\n",
12+
"\n",
13+
"## Installation\n",
14+
"\n",
15+
"```bash\n",
16+
"pip install -U langchain-scrapingbee\n",
17+
"```\n",
18+
"\n",
19+
"And you should configure credentials by setting the following environment variables:\n",
20+
"\n",
21+
"* SCRAPINGBEE_API_KEY\n",
22+
"\n",
23+
"You can get your API KEY and 1000 free credits by signing up [here](https://app.scrapingbee.com/account/register).\n",
24+
"\n",
25+
"## Tools\n",
26+
"\n",
27+
"ScrapingBee Integration provides you acceess to the following tools:\n",
28+
"\n",
29+
"* [ScrapeUrlTool](../../../../docs/docs/integrations/tools/scrapingbee_scrapeurl.ipynb) - Scrape the contents of any public website.\n",
30+
"* [GoogleSearchTool](../../../../docs/docs/integrations/tools/scrapingbee_googlesearch.ipynb) - Search Google to obtain the following types of information regular search (classic), news, maps, and images.\n",
31+
"* [CheckUsageTool](../../../../docs/docs/integrations/tools/scrapingbee_checkusage.ipynb) — Monitor your ScrapingBee credit or concurrency usage using this tool.\n",
32+
"\n",
33+
"## Example"
34+
]
35+
},
36+
{
37+
"cell_type": "code",
38+
"execution_count": null,
39+
"metadata": {
40+
"id": "y8ku6X96sebl"
41+
},
42+
"outputs": [],
43+
"source": [
44+
"import os\n",
45+
"import getpass\n",
46+
"from langchain_scrapingbee import (\n",
47+
" ScrapeUrlTool,\n",
48+
" GoogleSearchTool,\n",
49+
" CheckUsageTool,\n",
50+
")\n",
51+
"\n",
52+
"api_key = os.environ.get(\"SCRAPINGBEE_API_KEY\")\n",
53+
"if not api_key:\n",
54+
" print(\n",
55+
" \"SCRAPINGBEE_API_KEY environment variable is not set. Please enter the API Key here:\"\n",
56+
" )\n",
57+
" os.environ[\"SCRAPINGBEE_API_KEY\"] = getpass.getpass()\n",
58+
"\n",
59+
"scrape_tool = ScrapeUrlTool(api_key=os.environ.get(\"SCRAPINGBEE_API_KEY\"))\n",
60+
"search_tool = GoogleSearchTool(api_key=os.environ.get(\"SCRAPINGBEE_API_KEY\"))\n",
61+
"usage_tool = CheckUsageTool(api_key=os.environ.get(\"SCRAPINGBEE_API_KEY\"))\n",
62+
"\n",
63+
"# --- Test Case 1: Scrape a standard HTML page ---\n",
64+
"print(\"--- 1. Testing ScrapeUrlTool (HTML) ---\")\n",
65+
"html_result = scrape_tool.invoke({\"url\": \"http://httpbin.org/html\"})\n",
66+
"print(html_result)\n",
67+
"\n",
68+
"\n",
69+
"# --- Test Case 2: Scrape a PDF file ---\n",
70+
"print(\"--- 2. Testing ScrapeUrlTool (PDF) ---\")\n",
71+
"pdf_result = scrape_tool.invoke(\n",
72+
" {\n",
73+
" \"url\": \"https://treaties.un.org/doc/publication/ctc/uncharter.pdf\",\n",
74+
" \"params\": {\"render_js\": False},\n",
75+
" }\n",
76+
")\n",
77+
"print(pdf_result)\n",
78+
"\n",
79+
"\n",
80+
"# --- Test Case 3: Google Search ---\n",
81+
"print(\"--- 3. Testing GoogleSearchTool ---\")\n",
82+
"search_result = search_tool.invoke({\"search\": \"What is LangChain?\"})\n",
83+
"print(search_result)\n",
84+
"\n",
85+
"\n",
86+
"# --- Test Case 4: Check Usage ---\n",
87+
"print(\"--- 4. Testing CheckUsageTool ---\")\n",
88+
"usage_result = usage_tool.invoke({}) # No arguments needed\n",
89+
"print(usage_result)"
90+
]
91+
},
92+
{
93+
"cell_type": "markdown",
94+
"metadata": {},
95+
"source": [
96+
"## Example Using Agent"
97+
]
98+
},
99+
{
100+
"cell_type": "code",
101+
"execution_count": null,
102+
"metadata": {},
103+
"outputs": [],
104+
"source": [
105+
"import os\n",
106+
"from langchain_scrapingbee import (\n",
107+
" ScrapeUrlTool,\n",
108+
" GoogleSearchTool,\n",
109+
" CheckUsageTool,\n",
110+
")\n",
111+
"from langchain_google_genai import ChatGoogleGenerativeAI\n",
112+
"from langgraph.prebuilt import create_react_agent\n",
113+
"\n",
114+
"if not os.environ.get(\"GOOGLE_API_KEY\") or not os.environ.get(\"SCRAPINGBEE_API_KEY\"):\n",
115+
" raise ValueError(\n",
116+
" \"Google and ScrapingBee API keys must be set in environment variables.\"\n",
117+
" )\n",
118+
"\n",
119+
"llm = ChatGoogleGenerativeAI(temperature=0, model=\"gemini-2.5-flash\")\n",
120+
"scrapingbee_api_key = os.environ.get(\"SCRAPINGBEE_API_KEY\")\n",
121+
"\n",
122+
"tools = [\n",
123+
" ScrapeUrlTool(api_key=scrapingbee_api_key),\n",
124+
" GoogleSearchTool(api_key=scrapingbee_api_key),\n",
125+
" CheckUsageTool(api_key=scrapingbee_api_key),\n",
126+
"]\n",
127+
"\n",
128+
"agent = create_react_agent(llm, tools)\n",
129+
"\n",
130+
"user_input = (\n",
131+
" \"If I have enough API Credits, search for pdfs about langchain and save 3 pdfs.\"\n",
132+
")\n",
133+
"\n",
134+
"# Stream the agent's output step-by-step\n",
135+
"for step in agent.stream(\n",
136+
" {\"messages\": user_input},\n",
137+
" stream_mode=\"values\",\n",
138+
"):\n",
139+
" step[\"messages\"][-1].pretty_print()"
140+
]
141+
},
142+
{
143+
"cell_type": "markdown",
144+
"metadata": {},
145+
"source": [
146+
"## Documentation\n",
147+
"* [HTML API](https://www.scrapingbee.com/documentation/)\n",
148+
"* [Google Search API](https://www.scrapingbee.com/documentation/google/)\n",
149+
"* [Data Extraction](https://www.scrapingbee.com/documentation/data-extraction/)\n",
150+
"* [JavaScript Scenario](https://www.scrapingbee.com/documentation/js-scenario/)"
151+
]
152+
}
153+
],
154+
"metadata": {
155+
"colab": {
156+
"provenance": []
157+
},
158+
"kernelspec": {
159+
"display_name": "Python 3 (ipykernel)",
160+
"language": "python",
161+
"name": "python3"
162+
},
163+
"language_info": {
164+
"codemirror_mode": {
165+
"name": "ipython",
166+
"version": 3
167+
},
168+
"file_extension": ".py",
169+
"mimetype": "text/x-python",
170+
"name": "python",
171+
"nbconvert_exporter": "python",
172+
"pygments_lexer": "ipython3",
173+
"version": "3.10.11"
174+
}
175+
},
176+
"nbformat": 4,
177+
"nbformat_minor": 1
178+
}
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "raw",
5+
"id": "10238e62-3465-4973-9279-606cbb7ccf16",
6+
"metadata": {},
7+
"source": [
8+
"---\n",
9+
"sidebar_label: Scrapingbee\n",
10+
"---"
11+
]
12+
},
13+
{
14+
"cell_type": "markdown",
15+
"id": "a6f91f20",
16+
"metadata": {},
17+
"source": [
18+
"# Scrapingbee CheckUsageTool\n",
19+
"\n",
20+
"This tool allows you to keep track of your credits and concurrency usage while you are scraping the web.\n",
21+
"\n",
22+
"## Overview\n",
23+
"\n",
24+
"### Integration details\n",
25+
"\n",
26+
"\n",
27+
"| Class | Package | Serializable | JS support | Package latest |\n",
28+
"| :--- | :--- | :---: | :---: | :---: |\n",
29+
"| [CheckUsageTool](https://pypi.org/project/langchain-scrapingbee/) | [langchain-scrapingbee](https://pypi.org/project/langchain-scrapingbee/) | ✅ | ❌ | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-community?style=flat-square&label=%20) |\n",
30+
"\n",
31+
"## Setup\n",
32+
"\n",
33+
"```bash\n",
34+
"pip install -U langchain-scrapingbee\n",
35+
"```"
36+
]
37+
},
38+
{
39+
"cell_type": "markdown",
40+
"id": "b15e9266",
41+
"metadata": {},
42+
"source": [
43+
"### Credentials\n",
44+
"\n",
45+
"You should configure credentials by setting the following environment variables:\n",
46+
"\n",
47+
"* SCRAPINGBEE_API_KEY"
48+
]
49+
},
50+
{
51+
"cell_type": "code",
52+
"execution_count": 2,
53+
"id": "e0b178a2-8816-40ca-b57c-ccdd86dde9c9",
54+
"metadata": {},
55+
"outputs": [],
56+
"source": [
57+
"import getpass\n",
58+
"import os\n",
59+
"\n",
60+
"# if not os.environ.get(\"SCRAPINGBEE_API_KEY\"):\n",
61+
"# os.environ[\"SCRAPINGBEE_API_KEY\"] = getpass.getpass(\"SCRAPINGBEE API key:\\n\")"
62+
]
63+
},
64+
{
65+
"cell_type": "markdown",
66+
"id": "1c97218f-f366-479d-8bf7-fe9f2f6df73f",
67+
"metadata": {},
68+
"source": [
69+
"## Instantiation\n",
70+
"\n",
71+
"The `CheckUsageTool` only require the API Key during instantiation. If not set up in environment vairable, you can provide it directly here.\n",
72+
"\n",
73+
"Here we show how to instantiate an instance of the `CheckUsageTool`:"
74+
]
75+
},
76+
{
77+
"cell_type": "code",
78+
"execution_count": null,
79+
"id": "8b3ddfe9-ca79-494c-a7ab-1f56d9407a64",
80+
"metadata": {},
81+
"outputs": [],
82+
"source": [
83+
"from langchain_scrapingbee import CheckUsageTool\n",
84+
"\n",
85+
"usage_tool = CheckUsageTool(api_key=os.environ.get(\"SCRAPINGBEE_API_KEY\"))"
86+
]
87+
},
88+
{
89+
"cell_type": "markdown",
90+
"id": "74147a1a",
91+
"metadata": {},
92+
"source": [
93+
"## Invocation\n",
94+
"\n",
95+
"This tool doesn't require any arguments. Invoking this tool will check your ScrapingBee API usage data and returns the following information:\n",
96+
"\n",
97+
"* max_api_credit\n",
98+
"* used_api_credit\n",
99+
"* max_concurrency\n",
100+
"* current_concurrency\n",
101+
"* renewal_subscription_date"
102+
]
103+
},
104+
{
105+
"cell_type": "code",
106+
"execution_count": null,
107+
"id": "65310a8b-eb0c-4d9e-a618-4f4abe2414fc",
108+
"metadata": {},
109+
"outputs": [],
110+
"source": [
111+
"usage_tool.invoke({})"
112+
]
113+
},
114+
{
115+
"cell_type": "markdown",
116+
"id": "d6e73897",
117+
"metadata": {},
118+
"source": [
119+
"### Example Using Agent"
120+
]
121+
},
122+
{
123+
"cell_type": "code",
124+
"execution_count": null,
125+
"id": "f90e33a7",
126+
"metadata": {},
127+
"outputs": [],
128+
"source": [
129+
"import os\n",
130+
"from langchain_scrapingbee import CheckUsageTool\n",
131+
"from langchain_google_genai import ChatGoogleGenerativeAI\n",
132+
"from langgraph.prebuilt import create_react_agent\n",
133+
"\n",
134+
"if not os.environ.get(\"GOOGLE_API_KEY\") or not os.environ.get(\"SCRAPINGBEE_API_KEY\"):\n",
135+
" raise ValueError(\n",
136+
" \"Google and ScrapingBee API keys must be set in environment variables.\"\n",
137+
" )\n",
138+
"\n",
139+
"llm = ChatGoogleGenerativeAI(temperature=0, model=\"gemini-2.5-flash\")\n",
140+
"scrapingbee_api_key = os.environ.get(\"SCRAPINGBEE_API_KEY\")\n",
141+
"\n",
142+
"usage_tool = CheckUsageTool(api_key=os.environ.get(\"SCRAPINGBEE_API_KEY\"))\n",
143+
"\n",
144+
"agent = create_react_agent(llm, [usage_tool])\n",
145+
"\n",
146+
"user_input = \"How many api credits do I have available in my account?\"\n",
147+
"\n",
148+
"# Stream the agent's output step-by-step\n",
149+
"for step in agent.stream(\n",
150+
" {\"messages\": user_input},\n",
151+
" stream_mode=\"values\",\n",
152+
"):\n",
153+
" step[\"messages\"][-1].pretty_print()"
154+
]
155+
},
156+
{
157+
"cell_type": "markdown",
158+
"id": "4ac8146c",
159+
"metadata": {},
160+
"source": [
161+
"## API reference\n",
162+
"\n",
163+
"For more details on our `usage` endpoint, please check out this [link](https://www.scrapingbee.com/documentation/#usage-endpoint)."
164+
]
165+
}
166+
],
167+
"metadata": {
168+
"kernelspec": {
169+
"display_name": "poetry-venv-311",
170+
"language": "python",
171+
"name": "poetry-venv-311"
172+
},
173+
"language_info": {
174+
"codemirror_mode": {
175+
"name": "ipython",
176+
"version": 3
177+
},
178+
"file_extension": ".py",
179+
"mimetype": "text/x-python",
180+
"name": "python",
181+
"nbconvert_exporter": "python",
182+
"pygments_lexer": "ipython3",
183+
"version": "3.11.9"
184+
}
185+
},
186+
"nbformat": 4,
187+
"nbformat_minor": 5
188+
}

0 commit comments

Comments
 (0)