YouAreAPIRate

5y

I'm back from the dead to rant again. This time it's punycode.

My job has to do with processing the commoncrawl web archives, and for some reason one in 20.000.000 archived webpages crashed my program. After some debugging I found this issue that seems to be the reason my code crashes https://github.com/servo/rust-url/...

To summarize the issue: Since punycode unicode characters can be encoded into domain names. But not every character is allowed. Not only do these invalid domains get registered, I need an in-depth knowledge about unicode to understand what is wrong here.

How did we turn domain names into something so complicated?

rant

pure bullshit

domains

unicode

punycode

Ranter

Comments

3

SortOfTested

19558

5y

Language is complicated, and there are plenty of people who, for whatever reason, want non-english domains.
1

IntrusionCM

13947

5y

Oh. Complicated?

Easy.

This is a very bad idea, sir...

But I want my Poop Emoji!!!! in Unicode!!!! every where!!!!!!!!!
1

YouAreAPIRate

3766

5y

@SortOfTested I agree, but I'm not sure if we didn't allow too many unicode characters at once. Now we have homoglyph-attacks and a way more complex definition of a domain-name.

Related Rants

devRant © 2021 Hexical Labs LLC
Privacy Policy | Terms of Service