devRant - A fun community for developers to connect over code, tech & life as a programmer

*wrestling commentator voice*
"In this weeks episode of encoding hell:
The iiiinnnfamous UTF-8 Byte Order Mark veeeersus PHP!"

For an online shop we developed, there is currently a CSV upload feature in review by our client. Before we developed this feature, we created together with the client a very precise specification, including the file format and encoding (UTF-8).

After the first test day, the client informed us, that there were invalid characters after processing the uploaded file.
We checked the code and compared the customer's file with our template.
The file was encoded in ISO-8859-1 and NOT as specified UTF-8.
But what ever, we had to add an encoding check, thus allowing both encodings from now on.

Well well well welly welly fucking well...

Test day 2: We receive an email from said client, that the CSV is not working, again.
This time: UTF-8 encoding, but some fields had more colums with different values than specified.
Fucking hell.
We tell the customer that.
(I was about to write a nice death threat novel to them, but my boss held me back)

Testing day 3, today:
"The uploading feature is not working with our file, please fix it."
I tried to debug it, but only got misleading errors. After about 30 minutes, at 20 stacks of hatered, I finally had an idea to check the file in a hex editor:

God fucking what!?!!?!11?!1!!!?2!!

The encoding was valid UTF-8, all columns and fields were correct, but this time the file contained somthing different.
Something the world does not need.
Something nearly as wasteful as driving a monster truck in first gear from NYC to LA.

It was the UTF-8 Byte Order Mark.
3 bytes of pure hell.
Fucking 0xEFBBBF.
The archenemy of PHP and sane people.

If the devil had sex with the ethernet port of a rusty Mac OS X Server, then 9 microseconds later a UTF-8 BOM would have been born.

OK, maybe if PHP would actually cope with these bytes of death without crashing, that would be great.