Do all the things like ++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatarSign Up
Get a devDuck
Rubber duck debugging has never been so cute! Get your favorite coding language devDuckBuy Now
Search - "scraping"
A client wanted me to make a website that compared the users face to that of a wrestler. We had done a lot of work, but now he wants to switch it...
So, I guess the next week of my work is gonna be scraping porn websites. NSFW, for work.23
In my school students can book rooms using a local website. The booking page is reset every night at 00:00 so you have to stay up every night and be ready with the f5 key and hope to get one. As I want my sleep I needed some solution to this problem. I ended up building a bot that goes to the website and books the room automatically. It uses the windows task scheduler to wake the pc up and run the program. Also built a small gui that takes your credentials and settings for which room you want. Been working pretty good so far, just waiting for the sysadmin to add a captcha to the site or something to undo my work :D12
I was offered to work for a startup in August last year. It required building an online platform with video calling capabilities.
I told them it would be on learn and implement basis as I didn't know a lot of the web tech. Learnt all of it and kept implementing side by side.
I was promised a share in the company at formation, but wasn't given the same at the time of formation because of some issues in documents.
Yes, I did delay at times on the delivery date of features on the product. It was my first web app, with no prior experience. I did the entire stack myself from handling servers, domains to the entire front end. All of it was done alone by me.
Later, I also did install a proxy server to expand the platform to a forum on a new server.
And yesterday after a month of no communication from their side, I was told they are scraping the old site for a new one. As I had all the credentials of the servers except the domain registration control, they transferred the domain to a new registrar and pointed it to a new server. I have a last meeting with them. I have decided to never work with them and I know they aren't going to provide me my share as promised.
I'm still in the 3rd year of my college here in India. I flunked two subjects last semester, for the first time in my life. And for 8 months of work, this is the end result of it by being scammed. I love fitness, but my love for this is more and so I did leave all fitness activities for the time. All that work day and night got me nothing of what I expected.
Though, they don't have any of my code or credentials to the server or their user base, they got the new website up very fast.
I had no contract with them. Just did work on the basis of trust. A lesson learnt for sure.
Although, I did learn to create websites completely all alone and I can do that for anyone. I'm happy that I have those skills now.
Since, they are still in the start up phase and they don't have a lot of clients, I'm planning to partner with a trusted person and release my code with a different design and branding. The same idea basically. How does that sound to you guys?
I learned that:
. No matter what happens, never ignore your health for anybody or any reason.
. Never trust in business without a solid security.
. Web is fun.
. Self-learning is the best form of learning.
. Take business as business, don't let anyone cheat you.20
My client is trying to force me to sign an ethics agreement that would allow them to sue me if found in breach of it. At the same time they are scraping eBay's data without their consent and refuse to sign the licence agreement. Apparently they don't understand irony.3
When your reworking a bot because they've realised your scraping their site and you spot this; GAME ON MF'ers7
I always have guilt complexion of saying that I'm a Data Scientist - when I'm actually spending weeks scraping and annotating data into a csv file.3
Client : We need real time analysis.
Me : But we can't just scrape thousands of results and process them on user's click.
Client : Don't do that, Real-time analysis is scraping it once and processing it everytime the user demands.
Me : Okay
WHAT THE FUCK !!!!!7
me: “Realistically, the only way to pull in this data without replicating and without an API feed is to scrape it from the site”
manager -> to the client: “basically he’s got to hack your system to do it”2
So our class had this assignment in python where we had to code up a simple web scraper that extracts data of the best seller books on Amazon. My code was ~100 lines long( for a complete newbie in python guess the amount of sweat it took) and was able to handle most error scenarios like random HTML 503 errors and different methods to extract the same piece of data from different id's of divs. The code was decently fast.
All wss fine until I came to know the average number of lines it took for the rest of the class was ~60 lines. None of the others have implemented things that I have implemented like error handling and extracting from different places in the DOM. Now I'm confused if I have complicated my code or have I made it kind of "fail proof".
Tldr; its a long introduction
I've been on this app for quite a while now. As a shy cat watching from a distance and reading all kinds of rants. Anywho I feel comfortable enough to crawl out of my shell and introduce myself. Since I feel you guys together made such a pleasant and safe community, I'm really happy to be a part of it!
Anyway I'm Sam, 24 year old, from the Netherlands. My favorite color is green. Mostly the green you can find in nature. The one that calms you down:). I'm a very introverted person but always very curious and eager to learn new things.
I started to program when I was 12. I did assembly and C++. Because I liked making cheats for online games. Later I learned about C#, Java and Python. Mostly used it for web stuff, scraping, services etc. But also chatbots (for Skype for example).
Currently I'm 2 years in as a data scientist, mostly working in Python.
But on the side as a hobby and with an ambition I have a basic understanding of full stack development.
Mostly Nodejs, express, mongo, and frontend, no frameworks.
(I will later ask you guys some more questions about that! I could really use some advice!)
Anyway enough about me! Tell a bit about yourselves! Happy to get to know you all a little better!12
So I just launched a website where you can create web-scrapers with just the click of a button:
Ok going to rant about other developers this time.
Can you please stop doing just the minimal amount of work on your games/apps?!
I understand you may not have the time to go through with a fine tooth comb but just delay it, delay it and finish the product to a state that doesn't feel half assed and broken right at the get go.
A small note that the thing that triggered me with this is Android Devs at the moment, with Google requiring you support the adaptive icons and a newer SDK, so many Devs are just scraping by and putting in no effort to bring things up to date (also put more effort into adaptive icons rather than just putting your old square Icon on a white background)
This shit is just leading to everything being 'early access' or in a constant 'beta' stage with the promise of polish later.
Don't be that guy, put the extra few days of polish in... Just please...19
A manager, a mechanical engineer, and software analyst are driving back from convention through the mountains. Suddenly, as they crest a hill, the brakes on the car go out and they fly careening down the mountain. After scraping against numerous guardrails, they come to a stop in the ditch. Everyone gets out of the car to assess the damage.
The manager says, "Let's form a group to collaborate ideas on how we can solve this issue."
The mechanical engineer suggests, "We should disassemble the car and analyze each part for failure."
The software analyst says, "Let's push it back up the hill and see if it does it again."1
So I may have got a little fed up with people complaining about problems at work... Apologies to @dfox. I'll stop scraping your website now 😬7
So one year ago, when I was second year in college and first year doing coding, I took this fun math class called topics in data science, don't ask why it's a math class.
Anyway for this class we needed to do a final project. At the time I teamed up with a freshman, junior and a senior. We talked about our project ideas I was having random thoughts, one of them is to look at one of the myths of wikipedia: if you keep clicking on the first link in the main paragraph, and not the prounounciation, eventually you will get to philosophy page.
The team thought it was a good idea and s o we started working.
The process is hard since noe of us knew web scraping at the time, and the senior and the junior? They basically didn't do shit so it's me and the freshman.
At the end, we had 20000 page links and tested their path to philosophy. The attached picture is a visualization of the project, and every node is a page name and every line means the page is connected.
This is the first open project and the first python project that I have ever done. Idk if it is something good enough that I can out on my resume, but definitely proud of this.
PS: if you recognize the picture, you probably know me. If you were the senior or the junior in the team, I'm not sorry for saying you didn't do shit cuz that's the truth. If you were the freshman, I am very happy to have you as a teamate.2
My biggest regret is the same as my best decision ever made.
The company I work for specializes in performing integrations and migrations that are supposed to be near impossible.
This means a documented api is a rare sight. We are generally happy if there even IS an (internal) api. Frequently we resort to front-end scraping, custom server side extensions and reverse-engineered clients.
When you’re in the correct mindset it’s an extreme rush to fix issues that cannot be fixed and help clients who have lost most hope. However, if your personal life is rough at the moment or you are not in a perfect mental state for a while it can be a really tough job.
Been here for 3+ years and counting. Love and hate have rarely been so close to each other.
When you're a hardcore web developer, the only 'action' you .get() is when you're writing a login form scraper for your three-legged oauth flow in Python7
So a client hired us to rebuild their website, because their current website is being held hostage at their current provider. The provider locked them out of WordPress and says they will shut down the website at the end of the month.
The client wants us to hack into the website and get the files. I told them "no chance in hell", but that their current website will be at our host later today.
What I didn't tell them is that I just scraped their website pages to flat HTML.5
Ran a script on production to scrape ~1000 sites continously and update our ~50.000 productions from the data. On the same server as our site was running. Needless to say, with traffic and scraping, our server had almost 100% CPU and ram usage all the time for 2 weeks until I realised my fuckup2
Hi! I'm new in freelancing. I've created a program that scrapes data from a website, parses it, runs DB queries, and emails the prepared data to the customer for whom I've created this program. The whole program is written in PHP and uses a MySQL table. There's almost no front-end, it's just like an automated background process that runs with a cron job. I've bought and set up a domain and hosting for them (my cutomer paid it all). I got the core part of the program running after ~2 days, and it took me ~a week to complete the project including adding features and the testing phase. Now, I'd like to know, how much does this kind of project cost? The business operates in Silicon Valley.11
So after the original idea getting scraped during a hackathon this week, we created a slack bot to fetch most relevant answers from StackOverflow using user's input. All the user had to do was input few words and the bot handled all typos, links etc and returned the link as well as the most upvoted or the accepted amswer after scraping it from the website.
The average time to find an answer was around 2 seconds, and we also told that we're planning to use flask to deploy a web application for the same.
After the presentation, one of the judge-guys called me and told me that "It isn't good enough, will not be used widely" and "Its similar to Quora".
Never ever have I wanted to punch a son of a bitch in the balls ever.3
The company that I work for has recently recruited a team for Web Development, so they don't have to pay a monthly fee to the previous team who designed their website.
They have over 3000+ products in the old website, and no logical way to import them to the new website. The old team was asking for 300$ to give them an API which would return the product details in an XML format.
Obviously, paying that amount of money wasn't logical for a dying website, so the manager decided to hire someone to manually copy the content from the old admin panel to the new one, that is until I stopped him.
My solution? Write a simple web scraper to login to the old panel and collect data. Boom! 300$ saved from going to waste.
Now, the old team found about this and as much as my manager was happy, they were quite angry. So they implanted a Google reCaptcha to prevent my bot from scraping the old panel.
I spent about 20 minutes, and found out once you're logged in to the old panel, the session is saved in a cookie and you are no longer greeted by a Captcha.
So I re-written a small portion of my bot, and Boom! Instant karma from manager. We finished publishing the new site, and notified the old team, only to see the precious look on their face. Poor guy, he thought I was a wizard or something 😂😂
That's what you get for overcharging people!
TL;DR: Company's old website team wanted to overcharge us writing an API to fetch 3000+ records.
Written a basic web scraper to do the same job in less than an hour.3
tldr: maintainers can be assholes
So there's this python package+cli tool that I found interesting while browsing github and thought of contributing to it. Now this repo has around 2000 issues and multiple open PRs so seemed like a good start.
So i submit 2 PRs implementing similar features on different sites (it is a scraping repo). This douche of a maintainer marks comments various errors in the code convention not being followed without specifying what they actually were. Now I had specified that i was new to this repo so and would need his help (I guess this is one of the jobs of the reviewer). This piece of shit comments changes in the pr with one or two word sentences like "again", "wtf" and occasionally psycopathic replies. That son of a bitch can't tell what's wrong like wtf dude, instead of having a long discussion over the comments section of the fucking pr why can't you just point out what exactly is wrong and I'll happily fix that shit, but no, you have to be a douche about out it and employ sarcasm. Well FUCK YOU TOO.1
Now I can easily scrap motivational quotes, Hell Yeah.
* btw I am building random quotes generator but want to generate quotes with web-scraping *9
I suspected that our storage appliances were prematurely pulling disks out of their pools because of heavy I/O from triggered maintenance we've been asked to automate. So I built an application that pulls entries from the event consoles in each site, from queries it makes to their APIs. It then correlates various kinds of data, reformats them for general consumption, and produces a CSV.
From this point, I am completely useless. I was able to make some graphs with gnumeric, libre calc, and (after scraping out all the identifying info) Google sheets, but the sad truth is that I'm just really bad at desktop office document apps. I wound up just sending the CSV to my boss so he can make it pretty.1
When you're web scraping and the site suddenly redirect their url to their second site so your codes becomes useless.
Just succesfully converted my entire app from using web scraping data fetching to direct API by reverse-engineering their android app to get to their private API
App is running much faster and more stable now, feels good5
When I was in college I was approached by an entrepreneur whose "search engine" idea consisted of scraping the search results of Google and posing them as his own results (after a little shuffling and filtering). Needless to say I declined.2
Been receiving packages every day but today I good good shit..
Ideias for me to try?
4 relay module, 4 mostef board, finally the gears for the motors I'm scraping, and more mostefs. My mom is saying that I have the mailman for myself lol2
So here's how the story goes.
I was in my academic writing class the other day and we were learning about APA formatting for our argumentative essays. We have a blackboard, whiteboard, projector connected to a pc and even a lovely projector screen to present with in the classroom.
I sit at the front right of the room. Closest to the window(it's behind me as all the desks face inwards)
Professor walks up to front of class and says we are going to learn how to format our typed essays properly.
Awesome, I thought. Pulled out my XPS laptop and fired it up. As I was making a new Word document, I hear scratching. I look up and the professor is writing with CHALK on the BLACKBOARD. I was astonished. Making matters worse, she started from the far left of the board from which the glare from the window was the greatest. I could not see anything. And from that point on I knew this class was going to be abysmal.
What was so depressing was my professor never once touched the projector. Scraping and erasing. Over and over. Couldn't see if it was a period or a comma after the first initial.
My eyes were never so dry from squinting, rolling my eyes and face-palming over and over. After an hour and 15 minute class, I was not far away from drowning my XPS in my tears.6
The webpage for (basically) the only movie theater chain is slow. The app, goddamn, is worse.
So I made an app to scrape the data and save it in a SQLite db for my use. However, there is one theater which doesn't belong to the same company. So I decided to also include it in the app.
But it sucks! I still have to find a way to automatically get the data from their shitty site.6
Anybody have any great tutorials about web scraping with python? The data science courses I took only covered maipulauting and visualizing data not getting it.4
I'm an aspiring coder working some chappy administrator job just to pay the bills for now. My boss found out that I may actually be more computer literate than I let on.
Boss: "I want you to make X happen automatically if I click here on this spreadsheet"
Me "X!? That means processing data from 4 different spreadsheets that aren't consistently named and scraping comparison info from the fronted of the Web cms we're using"
Boss: "if you say so.. Can you do it?"
Me: "maybe.. Can I install python?"
Me: "what about node.js or ruby?"
Boss: "no.. I don't know what you're talking about but you're not installing anything, just get it done"
Me: "Errm Ok.."
So here I am now, way over my head loving the fact that I'm unofficially a Dev and coding my first something in Powershell and vb that will be used in business :)
Sucks that I still have to keep my regular work on target whilst doing this though!2
I was assigned a project which was previously done by another fresher, the project used angular and bootstrap. That fucker wrote custom styles for the fucking bootstrap classes!!! Every time I use "btn-primary" the button won't become blue, it becomes white!! Fuck! He even wrote his own fucking styles for the grid classes!!
I was so frustrated, I had a discussion with my CTO, he told me, that after 3 months, we'll be scraping this and moving to a new frontend. So I'm stuck in this hell for 3 months.
Has anyone here worked on news scraping?
I am currently doing my academic project where I need to scrap the news headlines. I have built scrappers for some news sources using their native API. I also tried using newsapi.org, but it returns only 10 results.
If anyone have worked on similar projects or know of their existence, some advice would be highly appreciated.4
Hm... Apparently I've been doing TDD all along... it's just that I don't save the tests in a seperate project.
I just keep editing Main() to test whatever i'm working on (each class).
Also the NJTransit site is sneaky as ****. It seems the devs know a bit about how to prevent site scraping by checking Headers and Client information...
Took all afternoon to get this test to pass....
it works in Chrome but not in my code... and even after I spoofed all the headers... including GZIP.... it wouldn't work for multiple requests...
I need to create a new WebClient for each request.... no idea how it knows the difference or why it cares... maybe it's a WebClient bug...
And this is only the test app. Originally was supposed to be built in React Native but that has it's own problems...
Books are too old, the examples don't work with the latest...
But I guess this also has a upside... learn TDD and React rather than just React... hopefully can finish this week...
I'm actually on vacation... yea... i still code like a work day... 10AM - 8PM....2
What do you guys think of my new album called Low Hanging Fruit:
- Screen Scraping
- Let's timebox this
- Personal development plan
- Embrace the prototype mindset
- Decision making progress
- I am 15 minutes late
- Let's take this offline3
Went to visit a friend at a junkyard I worked for 6 months and brought some stuff I found there...
I new there was a DC motor on this piece... And what a goodie. To bad I only brought 3 hehehe1
I'm about to graduate and I have no idea what I'm doing. I tried learning the basics and even went through a lot of extra stuff. I can only say I dabbled in scripting, web scraping and a little bit of software development. However when I compare myself to my peers, I feel so out of place. I can't confidently say I know even the concepts I practiced. I am really interested in the field but I feel like I'm way behind and this is constantly nagging me. Is this normal or is there anything I can do about it?2
After all this time I’m still confused, why was Cambridge Analytica such a huge deal? I feel like a lot of people knew this in years prior, that Facebook/Google were scraping user data and activities to use for personal profiles and hence more directed as placement. Stuff like Ghostery, Privacy Badger, Disconnect, Ad Nauseum (rip it’s Chrome plug-in) etc. all focused on not allowing these same trackers to get information, so not like this case just magically busted the doors wide open screaming that all those websites you visited are now in Facebook’s database and no one knew.
I just can’t quite understand why everyone got up in arms after this.3
Today I wrote my first small python application as an exercise:
Scraping all post EuroJackpot draws from a website, save them in a database, sort them, some checks and do some combinations. Everything quite clean in classes and functions.
And the "application" is just 100 lines big. I love it so far how much can be accomplished with just a few lines.
What technologies would u suggest for a web based project that'll do some data scraping, data preprocessing and also incorporate a few ML models.
I've done data scraping in php but now I want to move on and try something new....
Planning to use AWS to host it.
Thanks in advance :)5
I really love freelancing and I'm getting a few projects here and there ,maybe once in a few months.but I'm currently studying so those few projects does financially reward me a bit. Is anyone here freelancing ,do you have any tips to get more projects.My projects are normally full Django applications and deployment, Web scraping scripts ,WordPress sites(yeah fml but some people want WordPress),Full websites with other various cms's.1
I have to download 500 images from bookreads to help a friend out. Thought I'd use this opportunity to learn about web scraping rather than downloading the images which'd be a plain and long waste of time. I've got a list of books and author names, the process I wanna automate is putting the book name and author name into the search bar, clicking it, and downloading the first image the appears on the new webpage. I'm planning to use selenium, BeautifulSoup and requests for this project. Is that the right way to go?9
me: thinking about scraping for a webapp
Random Guy: walks past me staring me dead in the eyes.
Me outloud: "how do i scrape"
Is there a way to dynamically change your IP address while scraping website so that you don't get blocked cojstantly7
So noob question, is automated web scraping a thing? What would you do if you wanted to grab the same information off similar sites and store it in a table that can be manipulated later? All you would have to do is enter the web site link after you finished coding it. I've used Chrome web scraping extensions but want a more automated solution.10
I hate that fucking Upwork for having so much fucking scrapers
Most of jobs are fucking scraping related
Looks like it is the only useful thing they can do for their projects or advertisments...
Webscraper progress delayed for several weeks due to tLinkType and tWildCard. Apparently it's spelled tWildcard
Should I switch to Chrome Headless/Puppeteer for webpage header+footer scraping or stick with Express+Cheerio?1