Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
data:image/s3,"s3://crabby-images/5d7dd/5d7ddd48a174d5bf9f6cb2df1e7f879cee7c8f71" alt=""
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
retoor45078dI've put whole devRant in a full text search database and can exactly see how many times someone talked about a certain subject for example. Also, who is mentioning each other. For some reason, devRant is a fun dataset to play with for me since I know a lot from it, but not all by far. So I can do tests where I expect certain outcome but surprises are still possible.
-
If you're already doing that, make devrant archives, in case the site goes down since it's not being maintained anymore
-
retoor45078d@SoldierOfCode that's also the plan. I gonna convert the local html data to a database in the structure of the api objects.
-
Why !. I'll give the downie /* Cool name, BTW. */ a spin in a day or two.
I remember needing that sort of software a few years back. I used `HTTrack` back then.
...probably did the job, but can't remember.
Related Rants
Since strangely enough lack of decent site downloaders I've written one myself.
It's battle tested by downloading WHOLE devrant and a big part of molodetz. Both big sites. It makes the downloaded sites portable by making absolute urls relative.
It downloads with a high concurrency.
Reason I've made this, is because I want to have all this data is so I have a lot of spam examples to train a model on.
Project page and features here: https://retoor.molodetz.nl/retoor/.... Source code at bottom as always.
I hope someone will give it a try :)
And yes, the docs costed almost the same time as the code. Code doesn't contain unit tests, it's production tested instead. I applied many optimizations mentioned by my review tool. When i was done I was too tired for unit tests.
random
concurrent
https
absolute
portable
molodetz
downie
relative
site downloader
devrant
crawler
battle tested