Hi fellow devRanters, I need some advice on how to detect web traffic coming from bad/malicious bots and block them.

I have ELK (Elastic) stack set up to capture the logs from the sites, I have already blocked the ones that are obviously bad (bad user-agent, IP addresses known for spamming etc). I know you can tell by looking at how fast/frequently they crawl the site but how would I know if I block the one that's causing the malicious and non-human traffic? I am not sure if I should block access from other countries because I think the bots are from local.

I am lost, I don't know what else I can do - I can't use rate limiting on the sites and I can't sign up for a paid service cause management wants everything with the price of peanuts.


Someone asked why I can't just read through the logs (from several mid-large scale websites) and pick out the baddies.

*facepalm* Here's the gigabytes log files.

  • 1
  • 0
    @heyheni We do use cloudflare, for all the sites actually, but it doesn't filter out all the bad bots?
    This gives me quite a bit of headache because I don't know how and why we are still having malicious or non-human traffic
  • 1
    @beefo-11 Train a classifier. Or have you tried that and it was not optimum enough for you?
  • 1
    piking just the badies would not help, because malware isn' t stupid. there are some algorithms out there with which you can randomly change the ip and dnsname of a malicious server.
    So train a classifier maybe or put some money in it. Security is not for free.
  • 0
    @KartikShetty that would be my ultimate goal - get the machine to do the job instead of human

    Would scikit-learn or TensorFlow a good place to start if I have to build our own classifier? or python would do the job?

    Sorry if I am asking too many questions, this is pretty much my first machine learning related work in commercial environment.
  • 0
    @Naptic I have spent hours trying to explain this to the management but they just listen and that's it. We have spoken to different service providers and every time we try to get the budget for it, the answer is "It's too expensive, this is not an option".

    I don't think they even care about security, last time I talked to them about getting SSL and why we should have the sites secured - "Why do I have to pay so much just for a certificate".

    Yeh I still haven't got my budget for SSL

    I guess my only option is to build and train a classifier *sigh*
  • 1
    @beefo-11 as long as tey dont see some negative red numbers with a $ behind it, they wont move their a** / change their mind. As a last thing you could try to clarify and explain what could happen in a incident, like data loss, customer info leakage, etc.
  • 0
    @beefo-11 scikit-learn/tensorflow are machine learning frameworks while python is used to develop the projects using the above mentioned frameworks that have various models in them.

    For basic classifier system scikit-learn is enough. Tensorflow is generally used when you want to develop deep learning applications.
  • 0
    @beefo-11 besides what @Naptic suggested, about informing them what could happen, make it sound dramatic and let them sign it, with the argument, that you don't want to be held responsible if it happens. Scare them, that should usually do the trick.
Add Comment