So on my new position I get to work on Spark jobs. Never had to work with the infamous big data technologies. I never thought this would get SO frustrating for all the wrong reasons.

I'm currently trying to introduce integration tests for some Spark job I wrote. This isn't trivial though, as the data comes from several HBase tables. Mocking everything simply isn't feasible. So why not use the integrated HBaseTestingUtility? With it you can start a mini cluster that runs all nessecary services in the scope of your test.

Sounds great, eh? WRONG. Firstly the used mapr dependencies get in the way. The baked in configuration tries to automatically authenticate with your local cluster through Kerberos. Of course this doesn't work. And of course there is no way to reconfigure this as it happens IN A FUCKING STATIC BLOCK. AHHHH.

Ok. So after calming down I "simply" had to exclude all mapr dependencies and replace them with vanilla ones. After two days of dependency hell it FINALLY works!

...or does it? Well now we need test data. For that we got a map reduce algorithm that can import dumps. Sounds again, great, eh? WROOOONNNG.

The fucking map reduce mini cluster can't start, as it tries to write a symlink. Now take a wild guess what the sys admin here blocked. Yepp. TWO DAYS OF WORK RENDERED USELESS, BECAUSE OF SOME FUCKING SECURITY SETTING.

This is fine.

Add Comment