urlFilter terminates recursive scraping

See this [discussion thread](https://github.com/website-scraper/node-website-scraper/discussions/459) for more detail.

ISSUE:  When the `rootUrl` does not match the `urlFilter` criteria, recursive scraping terminates at the root page.

RECOMMENDED SOLUTION: Do not apply the `urlFilter` to the `rootUrl`; or make it an option to ignore the `rootUrl` from scraping.

DESIRED: I'm specifying a `rootUrl` and would like the scraper to recurse through all hyperlinks. The `rootUrl` will not be downloaded in this scenario. When the scraper finds a hyperlink ending in `.abc` it should download the file.

ACTUAL: The `rootUrl` (see code below) does not meet the `urlFilter` criteria and the scraper stops with no recursion. The scraper should find a hyperlink to http://trillian.mit.edu/~jc/music/book/SCD/Book45.abc in the `rootUrl` among other `.abc` urls, but it does not. Note that when I set the `rootUrl` equal to an `.abc` url, e.g. the example above, the file downloads as expected.

```
const rootUrl = "http://trillian.mit.edu/~jc/music/book/SCD"

scrape({
  urls: rootUrl,
  recursive: true,
  maxRecursiveDepth: 5,
  urlFilter: function(url) {
    let match = url.match(/\.abc$/);
    return (match && match[0]);
  },
  directory: savePath
});
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

urlFilter terminates recursive scraping #460

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

urlFilter terminates recursive scraping #460

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions