URL.host returns Punycode instead of Unicode for some URLs #3333
Unanswered
loic-bellinger
asked this question in
General
Replies: 1 comment 1 reply
-
I don't think the documentation currently say very much at all about the subtleties of the URL parameters, or if the user should expect the (Yes it should - there's quite an involved set of documentation work around the details here) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Description
The URL.host property does not decode IDNA hostnames into Unicode, which contradicts the specification. According to the httpx documentation, the host should always be returned as a string, normalized to lowercase, with IDNA hosts decoded into Unicode.
Step to reproduce
Expected behavior
The URL.host property should return the Unicode version of the host, in this case: www.égalité-femmes-hommes.gouv.fr.
Actual behavior
The URL.host property returns the Punycode-encoded version of the host: www.xn--galit-femmes-hommes-9ybf.gouv.fr.
Potential fix
It seems the issue arises in this part of the httpx code:
The use of
startswith("xn--")
checks only for Punycode-encoded hosts that begin with this prefix. However, it should handle cases where IDNA encoding is used more comprehensively.Replacing
host.startswith("xn--")
with something like if"xn--" in host
might handle a broader set of cases?Environment
httpx version: 0.27.2
Python version: 3.12.x
OS: Linux/Windows
Beta Was this translation helpful? Give feedback.
All reactions