Add release notes for version 2.1.0 #205

Gallaecio · 2022-11-27T12:25:22Z

No description provided.

codecov · 2022-11-27T12:26:50Z

Codecov Report

Merging #205 (1f0ac03) into master (be369f1) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #205   +/-   ##
=======================================
  Coverage   96.14%   96.14%           
=======================================
  Files           8        8           
  Lines         493      493           
  Branches       92       92           
=======================================
  Hits          474      474           
  Misses          9        9           
  Partials       10       10

Impacted Files	Coverage Δ
w3lib/html.py	`95.49% <0.00%> (ø)`

Gallaecio · 2022-11-27T17:26:55Z

Let’s get #206 into the release.

wRAR · 2022-11-28T13:21:15Z

Done!

kmike · 2022-11-28T16:38:20Z

NEWS

+    .. _byte order mark: https://en.wikipedia.org/wiki/Byte_order_mark
+
+-   :func:`~w3lib.url.canonicalize_url` now strips spaces from the input URL,
+    to be more in line with the `URL living standard`_. (#132, #136)


hey @Gallaecio @wRAR! I think this is a wrong change without other changes, and we should either roll back, or change all other functions. See #136 (comment).

Shall I propose a change to move this change to safe_url_string as described in #136 (comment), so we can release a patch version with it?

Maybe! Actually, I'm not even sure my original comment is correct :) It may be a false alarm as well. safe_url_string docstring contains this:

ASCII tabs and newlines are removed as per https://url.spec.whatwg.org/#url-parsing.

And indeed, there is a regex to remove tabs and newlines.

The issue still may stand that the whitespace handling is not compatible between these functions, but I'm not sure anymore.

We also would need to decide which functions in w3lib.url expect an already stripped URL (and may fail to work otherwise), and which can support more "raw" urls, with extra whitespace.

#207 is ready for when we want to address this.

The Python docs say that, without arguments, str.strip removes "whitespace". From local tests, that seems to include more than the 3 characters safe_url_string escapes (it also removes e.g. \f), and I suspect it even includes non-ASCII space. If so, the change in canonicalize_url indeed goes too far, stripping characters that should not be stripped according to the URL living standard.

I am not sure reverting addressing the issue is urgent, but I do think there is an issue.

And I think, going forward, that both canonicalize_url and safe_url_string should aim to follow the living standard for URL parsing, which is the one that indicates the rules for stripping and removing characters from URLs before parsing. canonicalize_url should probably do the same as safe_url_string as a base, and then go further (e.g. query parameter sorting).

Add release notes for version 2.1.0

6afff36

wRAR approved these changes Nov 27, 2022

View reviewed changes

Gallaecio added 2 commits November 28, 2022 15:43

Cover scrapy#206 in the release notes

19d9cb3

Set a release date

1f0ac03

Gallaecio merged commit ec131bb into scrapy:master Nov 28, 2022

kmike reviewed Nov 28, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add release notes for version 2.1.0 #205

Add release notes for version 2.1.0 #205

Uh oh!

Gallaecio commented Nov 27, 2022

Uh oh!

codecov bot commented Nov 27, 2022 •

edited

Loading

Uh oh!

Gallaecio commented Nov 27, 2022

Uh oh!

wRAR commented Nov 28, 2022

Uh oh!

kmike Nov 28, 2022

Uh oh!

Gallaecio Nov 28, 2022

Uh oh!

kmike Nov 28, 2022

Uh oh!

Gallaecio Nov 28, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add release notes for version 2.1.0 #205

Add release notes for version 2.1.0 #205

Uh oh!

Conversation

Gallaecio commented Nov 27, 2022

Uh oh!

codecov bot commented Nov 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Gallaecio commented Nov 27, 2022

Uh oh!

wRAR commented Nov 28, 2022

Uh oh!

kmike Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

Gallaecio Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

kmike Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

Gallaecio Nov 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Nov 27, 2022 •

edited

Loading

Gallaecio Nov 28, 2022 •

edited

Loading