23

Ok friends let's try to compile Flownet2 with Torch. It's made by NVIDIA themselves so there won't be any problem at all with dependencies right?????? /s

Let's use Deep Learning AMI with a K80 on AWS, totally updated and ready to go super great always works with everything else.

> CUDA error
> CuDNN version mismatch
> CUDA versions overwrite
> Library paths not updated ever
> Torch 0.4.1 doesn't work so have to go back to Torch 0.4
> Flownet doesn't compile, get bunch of CUDA errors piece of shit code
> online forums have lots of questions and 0 answers
> Decide to skip straight to vid2vid
> More cuda errors
> Can't compile the fucking 2d kernel
> Through some act of God reinstalling cuda and CuDNN, manage to finally compile Flownet2
> Try running
> "Kernel image" error
> excusemewhatthefuck.jpg
> Try without a label map because fuck it the instructions and flags they gave are basically guaranteed not to work, it's fucking Nvidia amirite
> Enormous fucking CUDA error and Torch error, makes no sense, online no one agrees and 0 answers again
> Try again but this time on a clean machine
> Still no go
> Last resort, use the docker image they themselves provided of flownet
> Same fucking error
> While in the process of debugging, realize my training image set is also bound to have bad results because "directly concatenating" images together as they claim in the paper actually has horrible results, and the network doesn't accept 6 channel input no matter what, so the only way to get around this is to make 2 images (3 * 2 = 6 quick maths)
> Fix my training data, fuck Nvidia dude who gave me wrong info
> Try again
> Same fucking errors
> Doesn't give nay helpful information, just spits out a bunch of fucking memory addresses and long function names from the CUDA core
> Try reinstalling and then making a basic torch network, works perfectly fine
> FINALLY.png
> Setup vid2vid and flownet again
> SAME FUCKING ERROR

> Try to build the entire network in tensorflow
> CUDA error
> CuDNN version mismatch
> Doesn't work with TF
> HAVE TO FUCKING DOWNGEADE DRIVERS TOO
> TF doesn't support latest cuda because no one in the ML community can be bothered to support anything other than their own machine
> After setting up everything again, realize have no space left on 75gb machine
> Try torch again, hoping that the entire change will fix things

At this point I'll leave a space so you can try to guess what happened next before seeing the result.

Ready?

3

2

1

> SAME FUCKING ERROR

In conclusion, NVIDIA is a fucking piece of shit that can't make their own libraries compatible with themselves, and can't be fucked to write instructions that actually work.

If anyone has vid2vid working or has gotten around the kernel image error for AWS K80s please throw me a lifeline, in exchange you can have my soul or what little is left of it

Comments
  • 8
    Hahahaha, I totally feel you, CUDA is so fucking badly documented that their own engineer doesn't even know how to use it. The only thing to do is trial and error and while that shit works, don't you go touching it.
  • 2
    That stack is an uckers.
  • 1
    Have you tried turning it off and on again?
  • 1
    So, Nvidia, fuck you! - Linus Torvalds
  • 1
    I never have issues with CUDA.. mainly because I just use OpenCL instead :^)
Add Comment