-
Notifications
You must be signed in to change notification settings - Fork 106
Add release notes for version 2.1.0 #205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #205 +/- ##
=======================================
Coverage 96.14% 96.14%
=======================================
Files 8 8
Lines 493 493
Branches 92 92
=======================================
Hits 474 474
Misses 9 9
Partials 10 10
|
|
Let’s get #206 into the release. |
|
Done! |
| .. _byte order mark: https://en.wikipedia.org/wiki/Byte_order_mark | ||
|
|
||
| - :func:`~w3lib.url.canonicalize_url` now strips spaces from the input URL, | ||
| to be more in line with the `URL living standard`_. (#132, #136) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @Gallaecio @wRAR! I think this is a wrong change without other changes, and we should either roll back, or change all other functions. See #136 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I propose a change to move this change to safe_url_string as described in #136 (comment), so we can release a patch version with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe! Actually, I'm not even sure my original comment is correct :) It may be a false alarm as well. safe_url_string docstring contains this:
ASCII tabs and newlines are removed as per https://url.spec.whatwg.org/#url-parsing.
And indeed, there is a regex to remove tabs and newlines.
The issue still may stand that the whitespace handling is not compatible between these functions, but I'm not sure anymore.
We also would need to decide which functions in w3lib.url expect an already stripped URL (and may fail to work otherwise), and which can support more "raw" urls, with extra whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#207 is ready for when we want to address this.
The Python docs say that, without arguments, str.strip removes "whitespace". From local tests, that seems to include more than the 3 characters safe_url_string escapes (it also removes e.g. \f), and I suspect it even includes non-ASCII space. If so, the change in canonicalize_url indeed goes too far, stripping characters that should not be stripped according to the URL living standard.
I am not sure reverting addressing the issue is urgent, but I do think there is an issue.
And I think, going forward, that both canonicalize_url and safe_url_string should aim to follow the living standard for URL parsing, which is the one that indicates the rules for stripping and removing characters from URLs before parsing. canonicalize_url should probably do the same as safe_url_string as a base, and then go further (e.g. query parameter sorting).
No description provided.