3
rajkd
6y

I have one question to everyone:

I am basically a full stack developer who works with cloud technologies for platform development. For past 5-6 months I am working on product which uses machine learning algorithms to generate metadata from video.

One of the algorithm uses tensorflow to predict locale from an image. It takes an image of size ~500 kb and takes around 15 sec to predict the 5 possible locale from a pre-trained model. Now when I send more than 100 r ki requests to the code concurrently, it stops working and tensor flow throws some error. I am using a 32 core vcpu with 120 GB ram. When I ask the decision scientists from my team they say that the processing is high. A lot of calculation is happening behind the scene. It requires GPU.

As far as I understand, GPU make sense while training but while prediction or testing I do not think we will need such heavy infra. Please help me understand if I am wrong.

PS : all the decision scientists in the team basically dumb fucks, and they always have one answer use GPU.

Comments
  • 2
    That one answer these dumb fucks give is the right one as far as I know.

    Both inference and training run well on GPUs so that's what you should be using. As far as I know an Nvidia Titan X is your best bet (Don't know much of the Titan V yet apart from that it's expensive 😄)

    But I'm no expert on the field...
  • 1
    Well @froot, I appreciate the time you took answer the question.
  • 2
    @rajkd No probs. Just my 2 cents 😄
  • 0
    @Froot as far as I know the Titan V should work very well for this stuff as this is what Volta is designed for
  • 0
    @RealKC It will probably kick ass. The thing I'm not sure about is the price. Like 1 Titan V vs like 3 Titan X-es. Bang for your buck sort of considerations.
  • 0
    I may be wrong but I am in your side.
    Can you just recheck their ML solution architecture using tensorflow? Or post some Uber design here, if you want me to look into it too. (Only if it doesn't violates any NDA)

    Predicting shouldn't take time, given that the model is ready.
    And for reusing, you can save and restore the same model.

    Asynchronously or later, the model should keep on training with new data points and saved at a particular instant and used.
  • 0
    @github I can share you the code because of the problem it can cause me.
    But if you done something like this and have some open source code that I can refer to for understanding, please share it with me.
  • 1
    @rajkd not particularly with tensorflow, but that's how every ML model are used.
    Imagine, how much time than Google will take just to build the model and then predict something.

    I don't have the model saving code. Since, mostly I just keep the model in memory and use it to predict. But, their is way to save it to. And load and just use for prediction.

    I am outside right now. Will share some examples later.
Add Comment