6
JsonBoa
2y

Fuck Apache TIKA.
Its supposed to be a "universal file reader" or some shit. Im trying to use it as a PDF/image parser that does OCR when needed and yelds a full-file string. It does so, but the text ends up being IN THE WRONG FUCKING ORDER.
WTF would I want to parse the text out of a PDF in any order that is not the one the text is supposed to be read?!?!
"It is more efficient to work in random ordering", says the docs. No shit, really? Wouldn't it be even more efficient to just spit out random strings? Just as useful and 100% CPU-bound.
"You can add a property to forcefully put the text in the right order". THEN WHY THE FUCK IT IS NOT THE DEFAULT SETTING?
Srsly, what's the use case to a parser that yields scrambled text?!?

Comments
  • 3
    I would expect nothing less from Apache...

    Remember how they simply CBA to make a more efficient web server until some random sysadmin decided that "fuck it, I'll make my own web server with blackjack and hookers"?

    Remember how log4j CVE came to be? The problematic string evaluation was there for a long time, but those who enabled it generally knew the consequences. Then some day Apache decided that "Let's just enable it for everyone and make it impossible to turn off".
Add Comment