2π
I donβt think this is a bug.
I see two parts to this question:
Why is it only unescaping the hash and not the space?
Why is it only doing the unescaping in the href and not in the visible linked text?
Here are my thoughts on the first:
A hash is a perfectly legal URL path character. It is most often used to go to anchors in HTML (example and link to docs in one!):
http://www.w3.org/TR/html4/struct/links.html#h-12.2
urlize
realizes this. It unescapes the hash in the href. It works with any letter which is a legal URL character. Here is an example with the letter f
:
>>> urlize('https://example.com/%66')
u'<a href="https://example.com/f">https://example.com/%66</a>'
A space on the other hand is not a legal URL character (although it is often tolerated). Therefore, it remains encoded to %20
both in the link and in the visible link depiction.
The second part of the question is why is it only unescaping in the link but not in the visible depiction. That also makes sense. In the href, it does not matter whether you pass in https://example.com/%66
or https://example.com/f
. The effect is the same, and the depiction is βunder the hood.β So urlize
uses the simplest form, without the unnecessary encoding. On the other hand, the visible part is presented to the user. Therefore, urlize
tries to preserve the exact depiction which it was passed in originally, as that is the least surprising thing to do.