Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "byte order mark"
-
*wrestling commentator voice*
"In this weeks episode of encoding hell:
The iiiinnnfamous UTF-8 Byte Order Mark veeeersus PHP!"
For an online shop we developed, there is currently a CSV upload feature in review by our client. Before we developed this feature, we created together with the client a very precise specification, including the file format and encoding (UTF-8).
After the first test day, the client informed us, that there were invalid characters after processing the uploaded file.
We checked the code and compared the customer's file with our template.
The file was encoded in ISO-8859-1 and NOT as specified UTF-8.
But what ever, we had to add an encoding check, thus allowing both encodings from now on.
Well well well welly welly fucking well...
Test day 2: We receive an email from said client, that the CSV is not working, again.
This time: UTF-8 encoding, but some fields had more colums with different values than specified.
Fucking hell.
We tell the customer that.
(I was about to write a nice death threat novel to them, but my boss held me back)
Testing day 3, today:
"The uploading feature is not working with our file, please fix it."
I tried to debug it, but only got misleading errors. After about 30 minutes, at 20 stacks of hatered, I finally had an idea to check the file in a hex editor:
God fucking what!?!!?!11?!1!!!?2!!
The encoding was valid UTF-8, all columns and fields were correct, but this time the file contained somthing different.
Something the world does not need.
Something nearly as wasteful as driving a monster truck in first gear from NYC to LA.
It was the UTF-8 Byte Order Mark.
3 bytes of pure hell.
Fucking 0xEFBBBF.
The archenemy of PHP and sane people.
If the devil had sex with the ethernet port of a rusty Mac OS X Server, then 9 microseconds later a UTF-8 BOM would have been born.
OK, maybe if PHP would actually cope with these bytes of death without crashing, that would be great.3 -
UTF-8 without BOM.
{ ä, ö, ü } are not going to be decoded correctly.
Why the fuck does BOMless UTF-8 still exist? Such a pain in the ass.7 -
It took me five hours to find a Byte Order Mark added at the beginning of a configuration file of a php application.
Everything was working but every downloaded file was corrupted (the bom sequence was prepended to the content). -
When you debug and notice that "myString".equals("myString") evaluates to false...and after an hour realize it's a 'Byte Order Mark', which of course Java doesn't handle properly. Ended up using BomInputReader from Apache commons, which made my life simpler.
If I ever want to inflict pain on other developers, I shall write files to disk with a byte order mark. 😂