Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API

From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "site downloader"
-
Based on a true story that happened right now.
Dad: "how do i download youtube videos?"
Me: "just google youtube downloader and download them from some site, thats how i do it"
Dad: "WHAT!!??? You want me to fucking google it? I dont know how to fucking google for those things, you're the IT guy and you should know how to do this, if I wanted to google it i wouldnt ask you for help. You know what, get the fuck out of my face i dont need ur help, get out"28 -
Dilbert no longer lets me leech their comics from their site... It was a good... 10 years?
Seems I need to rebuild my Downloader... using Selenium.
Now...where is that project where i used Selenium...1 -
Haven't been reading Dilbert for other a month but finally got around to it today. I read it through an app I made that downloads all the comics from the site.
Well apparently the downloader downloaded too much/too fast. It seems my IP is now blocked....
Wonder if it's temporary. O well... I got VPNs... -
Since strangely enough lack of decent site downloaders I've written one myself.
It's battle tested by downloading WHOLE devrant and a big part of molodetz. Both big sites. It makes the downloaded sites portable by making absolute urls relative.
It downloads with a high concurrency.
Reason I've made this, is because I want to have all this data is so I have a lot of spam examples to train a model on.
Project page and features here: https://retoor.molodetz.nl/retoor/.... Source code at bottom as always.
I hope someone will give it a try :)
And yes, the docs costed almost the same time as the code. Code doesn't contain unit tests, it's production tested instead. I applied many optimizations mentioned by my review tool. When i was done I was too tired for unit tests.random concurrent https absolute portable molodetz downie relative site downloader devrant crawler battle tested5