Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "scraper"
-
me: "ah, my scraper is nearly done - just need some final tweaks"
coworker: "JuSt FrOm LoOkInG aT yOuR ScReEn A fEw TiMeS tOdAy, I cAn TeLl YoU iT wOnT wOrK"
me, infuriated by his idiot mentality but not trying to start anything: "ah, its fine, I've already scraped 3000+ entities"
coworker: "but it wont work."
me: "but... its working..."
coworker: "but it won't work."
me: "ok."
sometimes its just better to just affirm the narcessistic assholes. make sure they are right.6 -
Turns out my 4chan image scraper has been running for 6 months without interruptions. I now have 106k pictures and webms of highly questionable content on my harddrive. This is how Oppenheimer must have felt.16
-
Got a report from a customer saying that our scraper does not correctly scrap content of one of their news articles. After two seconds of investigation it turned out that the "article" is just one huge JPG file with text, photos and even something looking like links.3
-
Never update the firmware of your delta-fan driven server when your girlfriend is sleeping. Got thrown out of my own room!
Fml9 -
What do you guys tell your friends when they ask what you're doing on the computer? My wife asks all the time and I usually give a generic answer like "writing code" but lately that's not good enough. Today I had browser dev tools open along with vim because I was building a web scraper in python and I needed the structure of a certain site. I tried actually explaining it but got nowhere so I ended saying I was just downloading content from a site. Do you just give generic answers to people or try to get more technical? She seems unhappy with both approaches but maybe I'm just bad at explaining.12
-
So our class had this assignment in python where we had to code up a simple web scraper that extracts data of the best seller books on Amazon. My code was ~100 lines long( for a complete newbie in python guess the amount of sweat it took) and was able to handle most error scenarios like random HTML 503 errors and different methods to extract the same piece of data from different id's of divs. The code was decently fast.
All wss fine until I came to know the average number of lines it took for the rest of the class was ~60 lines. None of the others have implemented things that I have implemented like error handling and extracting from different places in the DOM. Now I'm confused if I have complicated my code or have I made it kind of "fail proof".
Thoughts?8 -
2 Things:. Never symlink the root directory and don't try to remove a symlink with rm -rf
Nearly shit my pants today.5 -
I just made my own implementation of a scraper for NASA's Pic of the Day. It should have automagically changed my wallpaper at 4 AM to the new pic of the day.
Today's "pic of the day" is a video. It broke my shit.... So much for the excitement of seeing my automation work.4 -
When you're a hardcore web developer, the only 'action' you .get() is when you're writing a login form scraper for your three-legged oauth flow in Python7
-
I took a web scraper that took several hours, made it multithreaded and got it down to about 15 minutes. Pretty satisfying.7
-
Hi! I'm new in freelancing. I've created a program that scrapes data from a website, parses it, runs DB queries, and emails the prepared data to the customer for whom I've created this program. The whole program is written in PHP and uses a MySQL table. There's almost no front-end, it's just like an automated background process that runs with a cron job. I've bought and set up a domain and hosting for them (my cutomer paid it all). I got the core part of the program running after ~2 days, and it took me ~a week to complete the project including adding features and the testing phase. Now, I'd like to know, how much does this kind of project cost? The business operates in Silicon Valley.question php scraping webdevelopment scraper cost webdev freelancing price cost of website siliconvalley silicon valley10
-
types of programmers on social media from my school
#1: that bitch with a mac who barely knows enough python to write a keylogger or web scraper and implies they could make full on applications
#2: that other bitch who has windows and learned a little bit of html and flexes screenshots of their website
#3: the one who runs a club and promotes it on social media, they know java and run windows
#4: the knowledgeable, friendly linux user who's got a bomb ass personal website or plain as to their liking, and retweets cool news on twitter
#4 is the best.6 -
The company that I work for has recently recruited a team for Web Development, so they don't have to pay a monthly fee to the previous team who designed their website.
They have over 3000+ products in the old website, and no logical way to import them to the new website. The old team was asking for 300$ to give them an API which would return the product details in an XML format.
Obviously, paying that amount of money wasn't logical for a dying website, so the manager decided to hire someone to manually copy the content from the old admin panel to the new one, that is until I stopped him.
My solution? Write a simple web scraper to login to the old panel and collect data. Boom! 300$ saved from going to waste.
Now, the old team found about this and as much as my manager was happy, they were quite angry. So they implanted a Google reCaptcha to prevent my bot from scraping the old panel.
I spent about 20 minutes, and found out once you're logged in to the old panel, the session is saved in a cookie and you are no longer greeted by a Captcha.
So I re-written a small portion of my bot, and Boom! Instant karma from manager. We finished publishing the new site, and notified the old team, only to see the precious look on their face. Poor guy, he thought I was a wizard or something 😂😂
That's what you get for overcharging people!
TL;DR: Company's old website team wanted to overcharge us writing an API to fetch 3000+ records.
Written a basic web scraper to do the same job in less than an hour.3 -
I was really tired 2hrs ago.... but then I found some motivation.
So basically I wrote an image viewer app (also has a scraper and a downloader).
The Viewer component loads each image 1 at a time... but these are like 4MB each so there's like 5s lag...
It's now 11:30PM and I have just finished implementing a cache so it pre-loads N pictures before and after the current one.
Now moving between images is so fast and smooth...
TLDR.... girls can actually motivate you to code some amazing stuff... even when you're tired.
I clipped just the top as well... the rest is NSFW....6 -
So, in Germany apprentices at companies need to file a "Berichtsheft".
It's a thing where you have to file, for each day that is, what you did at work or in job college and how long you did it.
Basically every company keeps records of their employees activities in their CRM or other management system and all schools use services for keeping timetables that include lesson duration and activity.
So why the fuck do we apprentices have to write that shit ourselves when we could literally just acces the databases and SELECT THE SHIT FROM FILED_ACTIVITIES, I thought.
So I'm writing scripts to acces our CRM database and a puppeteer script now that scrapes the Untis (online timetable service for schools) timetables to extract everything, group it by date and format it nicely as CSV.
I'm sick of this: Digital system & Digital system = write it yourself bullshit.
Once I'm done I'll make a github repo for the Untis scraper.
Also, I'll be making the tools usable for the other apprentices at my company to spare them the suffering.9 -
i wrote a website, a server in go, a small os in c, a game in js, a game and server and web scraper and other desktop apps in java, mobile apps with flutter, a website with php also, implemented aes in go, wrote a parser in java. done sysadmin stuff on my vps and pihole/openvpn/nextcloud on my rpi. learn about c vulnerabilities and used metasploit. attempted to write an interpreted language. did some led displays with arduino. currently learning tensorflow.
i have never...
- written a driver
- made a game with a game engine
- created a file encoding
- implemented an oauth2 server
- made an api
- worked with vr
what am i missing? i want to be a very well rounded dev.13 -
Does anybody has an idea what to "code" when you have too much free time? I am done with school and waiting for my university acceptance. No Websites.
TL;DR
Project ideas?13 -
The best surprise is when I restart using one of my scraper apps which I haven't used in a long time ... And it still works.
My Dilbert one I haven't updated in years which implies they have not made any changes to the site or added anymore protection for at least 5yrs1 -
Apache why???
Your projects' page let's me view all the projects sorted by category, language, etc.
https://projects.apache.org/project...
But I can't view a description... I have to open the link...
Starts writing a scraper and realizes the project list is not static, it's loaded from a JSON document using JS... The document has all the descriptions and other info...
WHY THE FUCK DO YOU NOT SHOW THIS ON THE PAGE BUT MAKE EVERYONE OPEN ANOTHER PAGE SEE THEM...
Spends an hour writing an app in C# to parse the json because a simple Flattener isn't good enough because of the structure...
Probably going to end up creating a GUI so I can browse it more easily and Star the ones I may be interested in...3 -
I know streams are useful to enable faster per-chunk reading of large files (eg audio/ video), and in Node they can be piped, which also balances memory usage (when done correctly). But suppose I have a large JSON file of 500MB (say from a scraper) that I want to run some string content replacements on. Are streams fit for this kind of purpose? How do you go about altering the JSON file 'chunks' separately when the Buffer.toString of a chunk would probably be invalid partial JSON? I guess I could rephrase as: what is the best way to read large, structured text files (json, html etc), manipulate their contents and write them back (without reading them in memory at once)?4
-
If you want to improve your life, but your mental health and energy levels are too low to exercise, start with hygiene.
Take showers every day, continuously lowering the water temperature. Use dental floss and tongue scraper. Brush your teeth twice a day. Wash your face every morning and every evening. Use evidence-based skincare products: adapalene, panthenol, SPF 50+ sunscreen. Keep your toes and nails tidy. Shave routinely.
According to Nadya Tolokonnikova, a prominent Russian dissident who was imprisoned, denying basic hygiene is a _very_ efficient way of breaking someone into submission that is often applied to dissidents in Russian prisons. So, doing a reverse of that should improve mental health. -
How I solve programming tasks for competitions:
Example: Task from Google Code In. Build a Netflix activity scraper.
Easy, I'll do that and that I'll also need that and to run that I can do that. Perfect
How I solve a task with an algorithm
Reads ahh so I have to do that. Brain freezes and starts staring at a screen and has no idea whatsoever on how to solve it.1 -
Hello everyone!
Today, I want to show you a CLI program that I am working on.
It is called Chaker, and it is a Hacker News 'client' written in Go (or Golang) for the terminal.
(The 'client' is in quote because now it is more of a web scraper with a UI rather than an actual client that can do stuff like login/logout etc.)
It is pretty usable for now, but I am planning on other stuff too.
Check it out!
https://github.com/HoangTuan110/...2 -
Having developer skills comes sometimes in handy in certain situations.
In my case I visited a new website but first I had to choose their cookies.. but.. it was a list of about 150 radio buttons (150 advertisers), I shit you not.
And so I was like: "No, I refuse to click each one of them". I kept thinking.. hm.. how am I going to do a mass-toggle-off? And then it hit me: if the button "toggle all" toggles all buttons.. then that means if I invert the logic of the call, it means I will turn them all off! And it worked.. it was something like: "toggleAll(!-1)" and I did "toggleAll(0)".
That sure saved me some time! Oh yeah and there are of course other situations when you don't want to use a scraper for getting all the;. I don't know.. menu links out of a page. Console > import jQuery > select all elements with 'a' and text() on their DOM node! It can be done with native JavaScript as well document.getElementsById() but yeah, there are plenty of examples.
Hooray for being a developer!1 -
Wonderful experience today
I'm scraping data from an old system, saving that data as json and my next step is transforming the data and pushing it to an api (thank god the new system has an api)
Now I stumbled upon an issue, I found it a bit hard to retrieve a file with the scraper library I'm using, it was also quite difficult to set specific headers to download the file I was looking for instead of navigating to the index of the website. Then I tried a built-in language function to retrieve the files that I needed during the scrape, no luck 'cause I had to login to the website first.
I didn't want to use a different library since I worked so hard and got so far.
My quick solution: Perform a get request to the website, borrow the session ID cookie and then use the built-in function's http headers functionality to retrieve the file.
Luckily this is a throwaway script so being dirty for this once is OK, it works now :) -
Hi, I am using a Wikipedia scrapper in one of my Open Source project. The data extracted from it is the stored in Elasticsearch... Now I have decided to create library out of it so that other people can use it too... My question is should also include the Elasticsearch storing module in library or just add the scrapper... Please let me know your thoughts.8
-
Creating a webscraper using regex because there's no API available, knowing it will break the moment the source changes its markup.2
-
So I was working on a web scraper to basically download all listings with detailed info from a e-shop to my database for some analysis.
And I completely forgot throttling which is quite important when writing such things in node.js.
It's funny how in other languages you try to figure out how to make your application faster and in node you're trying to make it slower 😄
Anyhow, I apparently hit the poor site with 5000+ simultaneous requests, all of which hit their database (to gather product info). Suffice to say, the site got visibly slow 🤣
Thankfully I print out where each request is made so I quickly realised my mistake and killed the process.
Now I hope no-one comes knocking on my door lol
The adventures of being a node.js dev1 -
Amazon is screwing with me.... So I was writing that Prime Recently Added Videos scraper but it's turning out that the Search results pages' layout changes each time. Like they're running multiple versions of the search engine that return the page in different layouts...
So after figuring out one of them... The whole thing breaks since I need to parse a different html layout...2 -
Is there a HTML scraper library for JS/NPM like Selenium or HtmlAgilityPack where I can find elements by ID, XPath, element type and their attached attributes?8
-
I didn't get a Google foobar invite for all my years of working on professional projects, but I did when I was trying to put together a web scraper because I was bored. Anyways I got a job shortly afterwards so didn't really bother with it.2
-
Sticking to the man... or facebook sorta.
Using Selenium so I can get all the group feeds in Chronological order rather than Recent Activity... Why the fuck is there still no way to set the default.
Now that I think about the better way is to create a Service app that checks for updates and loads them into a DB and the Client app that just reads from DB. So Updates come from Selenium/Chrome in the backgeround thread while UI doesnt need to lag/wait...
fck... all those Async code for nothing.... (yea i m thinking while i mwriting this... an epihpany moment...)
One thing and the original question is, is there an existing Facebook scraper. OpenGraph doesnt work for Group posts or public events which is what i want the feeds for....
The problem though the AJAX calls for more posts when you scroll down. I am not sure in Selenium how to make the Driver wait for new content in the DOM... rather than just sleeping the Thread for X seconds and checking after.4 -
A c# remote procedure library.
Made years ago when i had no real science and engineering knowledge.
https://github.com/scrapes/... -
When I browsed for a Food Recipes (Especially Indian Food) Dataset, I could not find one (that I could use) online. So, I decided to create one.
The dataset can be found here: https://lnkd.in/djdh9nX
It contains following fields (self-explanatory) - ['RecipeName', 'TranslatedRecipeName', 'Ingredients', 'TranslatedIngredients', 'Prep', 'Cook', 'Total', 'Servings', 'Cuisine', 'Course', 'Diet', 'Instructions', 'TranslatedInstructions']. The datset contains a csv and a xls file. Sometimes, the content in Hindi is not visible in the csv format.
You might be wondering what the columns with the prefix 'Translated' are. So, a lot of entries in the dataset were in Hindi language. To take care of such entries and translating them to English for consistency, I went ahead and used 'googletrans'. It is a python library that implements Google Translate API underneath.
The code for the crawler, cleaning and transformation is on Github (Repo:https://lnkd.in/dYp3sBc) (@kanishk307).
The dataset has been created using Archana's Kitchen Website (https://lnkd.in/d_bCPWV). It is a great website and hosts a ton of useful content. You should definitely consider viewing it if you are interested.
#python #dataAnalytics #Crawler #Scraper #dataCleaning #dataTransformation