Skip to content

Commit bf1af7b

Browse files
authored
docs: fix imports of cheerio in examples (#664)
Related: #661
1 parent 845b9d7 commit bf1af7b

File tree

12 files changed

+34
-37
lines changed

12 files changed

+34
-37
lines changed

sources/academy/tutorials/node_js/dealing_with_dynamic_pages.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,6 @@ So, we've gotta scroll down the page to load these images. Luckily, because we'r
144144

145145
```js
146146
import { PuppeteerCrawler, utils, Dataset } from 'crawlee';
147-
import cheerio from 'cheerio';
148147

149148
const BASE_URL = 'https://demo-webstore.apify.org';
150149

sources/academy/webscraping/puppeteer_playwright/common_use_cases/paginating_through_results.md

Lines changed: 16 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ import TabItem from '@theme/TabItem';
1616

1717
If you're trying to [collect data](../executing_scripts/extracting_data.md) on a website that has millions, thousands, or even just hundreds of results, it is very likely that they are paginating their results to reduce strain on their backend as well as on the users loading and rendering the content.
1818

19-
![Amazon pagination](https://apify-docs.s3.amazonaws.com/master/docs/assets/tutorials/images/pagination.jpg)
19+
![Amazon pagination](../../advanced_web_scraping/images/pagination.png)
2020

2121
Attempting to scrape thousands to tens of thousands of results using a headless browser on a website that only shows 30 results at a time might be daunting at first, but be rest assured that by the end of this lesson you'll feel confident when faced with this use case.
2222

@@ -53,7 +53,6 @@ Let's grab this number now with a little bit of code:
5353

5454
```javascript
5555
import { chromium } from 'playwright';
56-
import { load } from 'cheerio';
5756

5857
const repositories = [];
5958

@@ -79,7 +78,6 @@ await browser.close();
7978

8079
```javascript
8180
import puppeteer from 'puppeteer';
82-
import { load } from 'cheerio';
8381

8482
const repositories = [];
8583

@@ -118,7 +116,7 @@ And since we're already on the first page, we'll go ahead and scrape the repos f
118116

119117
```javascript
120118
import { chromium } from 'playwright';
121-
import { load } from 'cheerio';
119+
import * as cheerio from 'cheerio';
122120

123121
const repositories = [];
124122

@@ -127,7 +125,7 @@ const REPOSITORIES_URL = `${BASE_URL}/orgs/facebook/repositories`;
127125

128126
// Create a function which grabs all repos from a page
129127
const scrapeRepos = async (page) => {
130-
const $ = load(await page.content());
128+
const $ = cheerio.load(await page.content());
131129

132130
return [...$('li.Box-row')].map((item) => {
133131
const elem = $(item);
@@ -163,7 +161,7 @@ await browser.close();
163161

164162
```javascript
165163
import puppeteer from 'puppeteer';
166-
import { load } from 'cheerio';
164+
import * as cheerio from 'cheerio';
167165

168166
const repositories = [];
169167

@@ -172,7 +170,7 @@ const REPOSITORIES_URL = `${BASE_URL}/orgs/facebook/repositories`;
172170

173171
// Create a function which grabs all repos from a page
174172
const scrapeRepos = async (page) => {
175-
const $ = load(await page.content());
173+
const $ = cheerio.load(await page.content());
176174

177175
return [...$('li.Box-row')].map((item) => {
178176
const elem = $(item);
@@ -260,15 +258,15 @@ After all is said and done, here's what our final code looks like:
260258

261259
```javascript
262260
import { chromium } from 'playwright';
263-
import { load } from 'cheerio';
261+
import * as cheerio from 'cheerio';
264262

265263
const repositories = [];
266264

267265
const BASE_URL = 'https://github.com';
268266
const REPOSITORIES_URL = `${BASE_URL}/orgs/facebook/repositories`;
269267

270268
const scrapeRepos = async (page) => {
271-
const $ = load(await page.content());
269+
const $ = cheerio.load(await page.content());
272270

273271
return [...$('li.Box-row')].map((item) => {
274272
const elem = $(item);
@@ -321,7 +319,7 @@ await browser.close();
321319

322320
```javascript
323321
import puppeteer from 'puppeteer';
324-
import { load } from 'cheerio';
322+
import * as cheerio from 'cheerio';
325323

326324
const repositories = [];
327325

@@ -330,7 +328,7 @@ const REPOSITORIES_URL = `${BASE_URL}/orgs/facebook/repositories`;
330328

331329
// Create a function which grabs all repos from a page
332330
const scrapeRepos = async (page) => {
333-
const $ = load(await page.content());
331+
const $ = cheerio.load(await page.content());
334332

335333
return [...$('li.Box-row')].map((item) => {
336334
const elem = $(item);
@@ -402,7 +400,6 @@ We're going to scrape the brand and price from the first 75 results on the **Abo
402400

403401
```javascript
404402
import { chromium } from 'playwright';
405-
import { load } from 'cheerio';
406403

407404
// Create an array where all scraped products will
408405
// be pushed to
@@ -421,7 +418,6 @@ await browser.close();
421418

422419
```javascript
423420
import puppeteer from 'puppeteer';
424-
import { load } from 'cheerio';
425421

426422
// Create an array where all scraped products will
427423
// be pushed to
@@ -543,7 +539,9 @@ Now, the `while` loop will exit out if we've reached the bottom of the page.
543539
Within the loop, we can grab hold of the total number of items on the page. To avoid extracting and pushing duplicate items to the **products** array, we can use the `.slice()` method to cut out the items we've already scraped.
544540

545541
```js
546-
const $ = load(await page.content());
542+
import * as cheerio from 'cheerio';
543+
544+
const $ = cheerio.load(await page.content());
547545

548546
// Grab the newly loaded items
549547
const items = [...$('a[data-testid*="productTile"]')].slice(products.length);
@@ -569,7 +567,7 @@ With everything completed, this is what we're left with:
569567

570568
```javascript
571569
import { chromium } from 'playwright';
572-
import { load } from 'cheerio';
570+
import * as cheerio from 'cheerio';
573571

574572
const products = [];
575573

@@ -592,7 +590,7 @@ while (products.length < 75) {
592590
// Allow the products 1 second to load
593591
await page.waitForTimeout(1000);
594592

595-
const $ = load(await page.content());
593+
const $ = cheerio.load(await page.content());
596594

597595
// Grab the newly loaded items
598596
const items = [...$('a[data-testid*="productTile"]')].slice(products.length);
@@ -628,7 +626,7 @@ await browser.close();
628626

629627
```javascript
630628
import puppeteer from 'puppeteer';
631-
import { load } from 'cheerio';
629+
import * as cheerio from 'cheerio';
632630

633631
const products = [];
634632

@@ -651,7 +649,7 @@ while (products.length < 75) {
651649
// Allow the products 1 second to load
652650
await page.waitForTimeout(1000);
653651

654-
const $ = load(await page.content());
652+
const $ = cheerio.load(await page.content());
655653

656654
// Grab the newly loaded items
657655
const items = [...$('a[data-testid*="productTile"]')].slice(products.length);

sources/academy/webscraping/web_scraping_for_beginners/crawling/finding_links.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ We'll start from a boilerplate that's very similar to the scraper we built in [B
5353

5454
```js title=crawler.js
5555
import { gotScraping } from 'got-scraping';
56-
import cheerio from 'cheerio';
56+
import * as cheerio from 'cheerio';
5757

5858
const storeUrl = 'https://warehouse-theme-metal.myshopify.com/collections/sales';
5959

sources/academy/webscraping/web_scraping_for_beginners/crawling/first_crawl.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ In the previous lessons, we collected and filtered all the URLs pointing to indi
2121

2222
```js title=crawler.js
2323
import { gotScraping } from 'got-scraping';
24-
import cheerio from 'cheerio';
24+
import * as cheerio from 'cheerio';
2525

2626
const WEBSITE_URL = 'https://warehouse-theme-metal.myshopify.com';
2727
const storeUrl = `${WEBSITE_URL}/collections/sales`;
@@ -75,7 +75,7 @@ In programming, you handle errors by catching and handling them. Typically by pr
7575
7676
```js title=crawler.js
7777
import { gotScraping } from 'got-scraping';
78-
import cheerio from 'cheerio';
78+
import * as cheerio from 'cheerio';
7979

8080
const WEBSITE_URL = 'https://warehouse-theme-metal.myshopify.com';
8181
const storeUrl = `${WEBSITE_URL}/collections/sales`;

sources/academy/webscraping/web_scraping_for_beginners/crawling/recap_extraction_basics.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ We finished off the [first section](../data_extraction/index.md) of the _Web Scr
1818
// download, extract, and convert the data we wanted
1919
import { writeFileSync } from 'fs';
2020
import { gotScraping } from 'got-scraping';
21-
import cheerio from 'cheerio';
21+
import * as cheerio from 'cheerio';
2222
import { parse } from 'json2csv';
2323

2424
// Here, we fetched the website's HTML and saved it to a new variable.

sources/academy/webscraping/web_scraping_for_beginners/crawling/relative_urls.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ Let's update the Node.js code from the [Finding links lesson](./finding_links.md
3535

3636
```js title=crawler.js
3737
import { gotScraping } from 'got-scraping';
38-
import cheerio from 'cheerio';
38+
import * as cheerio from 'cheerio';
3939

4040
const storeUrl = 'https://warehouse-theme-metal.myshopify.com/collections/sales';
4141

@@ -72,7 +72,7 @@ When we plug this into our crawler code, we will get the correct - absolute - UR
7272

7373
```js title=crawler.js
7474
import { gotScraping } from 'got-scraping';
75-
import cheerio from 'cheerio';
75+
import * as cheerio from 'cheerio';
7676

7777
// Split the base URL from the category to use it later.
7878
const WEBSITE_URL = 'https://warehouse-theme-metal.myshopify.com';

sources/academy/webscraping/web_scraping_for_beginners/crawling/scraping_the_data.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Let's start writing a script that extracts data from this single PDP. We can use
2121

2222
```js title=product.js
2323
import { gotScraping } from 'got-scraping';
24-
import cheerio from 'cheerio';
24+
import * as cheerio from 'cheerio';
2525

2626
const productUrl = 'https://warehouse-theme-metal.myshopify.com/products/denon-ah-c720-in-ear-headphones';
2727
const response = await gotScraping(productUrl);
@@ -123,7 +123,7 @@ Let's compare the above data extraction example with the crawling code we wrote
123123

124124
```js title=crawler.js
125125
import { gotScraping } from 'got-scraping';
126-
import cheerio from 'cheerio';
126+
import * as cheerio from 'cheerio';
127127

128128
const WEBSITE_URL = 'https://warehouse-theme-metal.myshopify.com';
129129
const storeUrl = `${WEBSITE_URL}/collections/sales`;
@@ -171,7 +171,7 @@ We'll start by adding our imports and constants at the top of the file, no chang
171171

172172
```js title=final.js
173173
import { gotScraping } from 'got-scraping';
174-
import cheerio from 'cheerio';
174+
import * as cheerio from 'cheerio';
175175

176176
const WEBSITE_URL = 'https://warehouse-theme-metal.myshopify.com';
177177
```

sources/academy/webscraping/web_scraping_for_beginners/data_extraction/node_continued.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Replace the code in your **main.js** with the following, and run it with `node m
3838
```js
3939
// main.js
4040
import { gotScraping } from 'got-scraping';
41-
import cheerio from 'cheerio';
41+
import * as cheerio from 'cheerio';
4242

4343
const storeUrl = 'https://warehouse-theme-metal.myshopify.com/collections/sales';
4444

@@ -110,7 +110,7 @@ The final scraper code looks like this. Replace the code in your **main.js** fil
110110
```js
111111
// main.js
112112
import { gotScraping } from 'got-scraping';
113-
import cheerio from 'cheerio';
113+
import * as cheerio from 'cheerio';
114114

115115
const storeUrl = 'https://warehouse-theme-metal.myshopify.com/collections/sales';
116116

sources/academy/webscraping/web_scraping_for_beginners/data_extraction/node_js_scraper.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ To parse the HTML with the `cheerio` library. Replace the code in your **main.js
4343
```js
4444
// main.js
4545
import { gotScraping } from 'got-scraping';
46-
import cheerio from 'cheerio';
46+
import * as cheerio from 'cheerio';
4747

4848
const storeUrl = 'https://warehouse-theme-metal.myshopify.com/collections/sales';
4949

sources/academy/webscraping/web_scraping_for_beginners/data_extraction/project_setup.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ With the libraries installed, create a new file in the project's folder called *
5757

5858
```js
5959
import gotScraping from 'got-scraping';
60-
import cheerio from 'cheerio';
60+
import * as cheerio from 'cheerio';
6161

6262
console.log('it works!');
6363
```

0 commit comments

Comments
 (0)