Do all the things like ++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatarSign Up
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple APILearn More
Search - "web scraping"
I was offered to work for a startup in August last year. It required building an online platform with video calling capabilities.
I told them it would be on learn and implement basis as I didn't know a lot of the web tech. Learnt all of it and kept implementing side by side.
I was promised a share in the company at formation, but wasn't given the same at the time of formation because of some issues in documents.
Yes, I did delay at times on the delivery date of features on the product. It was my first web app, with no prior experience. I did the entire stack myself from handling servers, domains to the entire front end. All of it was done alone by me.
Later, I also did install a proxy server to expand the platform to a forum on a new server.
And yesterday after a month of no communication from their side, I was told they are scraping the old site for a new one. As I had all the credentials of the servers except the domain registration control, they transferred the domain to a new registrar and pointed it to a new server. I have a last meeting with them. I have decided to never work with them and I know they aren't going to provide me my share as promised.
I'm still in the 3rd year of my college here in India. I flunked two subjects last semester, for the first time in my life. And for 8 months of work, this is the end result of it by being scammed. I love fitness, but my love for this is more and so I did leave all fitness activities for the time. All that work day and night got me nothing of what I expected.
Though, they don't have any of my code or credentials to the server or their user base, they got the new website up very fast.
I had no contract with them. Just did work on the basis of trust. A lesson learnt for sure.
Although, I did learn to create websites completely all alone and I can do that for anyone. I'm happy that I have those skills now.
Since, they are still in the start up phase and they don't have a lot of clients, I'm planning to partner with a trusted person and release my code with a different design and branding. The same idea basically. How does that sound to you guys?
I learned that:
. No matter what happens, never ignore your health for anybody or any reason.
. Never trust in business without a solid security.
. Web is fun.
. Self-learning is the best form of learning.
. Take business as business, don't let anyone cheat you.20
So our class had this assignment in python where we had to code up a simple web scraper that extracts data of the best seller books on Amazon. My code was ~100 lines long( for a complete newbie in python guess the amount of sweat it took) and was able to handle most error scenarios like random HTML 503 errors and different methods to extract the same piece of data from different id's of divs. The code was decently fast.
All wss fine until I came to know the average number of lines it took for the rest of the class was ~60 lines. None of the others have implemented things that I have implemented like error handling and extracting from different places in the DOM. Now I'm confused if I have complicated my code or have I made it kind of "fail proof".
So one year ago, when I was second year in college and first year doing coding, I took this fun math class called topics in data science, don't ask why it's a math class.
Anyway for this class we needed to do a final project. At the time I teamed up with a freshman, junior and a senior. We talked about our project ideas I was having random thoughts, one of them is to look at one of the myths of wikipedia: if you keep clicking on the first link in the main paragraph, and not the prounounciation, eventually you will get to philosophy page.
The team thought it was a good idea and s o we started working.
The process is hard since noe of us knew web scraping at the time, and the senior and the junior? They basically didn't do shit so it's me and the freshman.
At the end, we had 20000 page links and tested their path to philosophy. The attached picture is a visualization of the project, and every node is a page name and every line means the page is connected.
This is the first open project and the first python project that I have ever done. Idk if it is something good enough that I can out on my resume, but definitely proud of this.
PS: if you recognize the picture, you probably know me. If you were the senior or the junior in the team, I'm not sorry for saying you didn't do shit cuz that's the truth. If you were the freshman, I am very happy to have you as a teamate.2
Tldr; its a long introduction
I've been on this app for quite a while now. As a shy cat watching from a distance and reading all kinds of rants. Anywho I feel comfortable enough to crawl out of my shell and introduce myself. Since I feel you guys together made such a pleasant and safe community, I'm really happy to be a part of it!
Anyway I'm Sam, 24 year old, from the Netherlands. My favorite color is green. Mostly the green you can find in nature. The one that calms you down:). I'm a very introverted person but always very curious and eager to learn new things.
I started to program when I was 12. I did assembly and C++. Because I liked making cheats for online games. Later I learned about C#, Java and Python. Mostly used it for web stuff, scraping, services etc. But also chatbots (for Skype for example).
Currently I'm 2 years in as a data scientist, mostly working in Python.
But on the side as a hobby and with an ambition I have a basic understanding of full stack development.
Mostly Nodejs, express, mongo, and frontend, no frameworks.
(I will later ask you guys some more questions about that! I could really use some advice!)
Anyway enough about me! Tell a bit about yourselves! Happy to get to know you all a little better!12
I’m going through the book automate the boring stuff and I’m working on the chapter with web scraping right.
Well I wanted to just count all of the comic links that are in the xkcd archive as a small exercise to help me get used to and better learn web scraping.
I go through hell trying to do this but after more than a few hours later I finally have done it I returned every link of ONLY the comics, so it was time to start counting them.
I implemented the counting. The total number as of today is 2279 and it my code counted 2278, and I started to lose it.
So I go through this motherfucker manually to see where my loops count and the count on the tags start to differ. I found it, whoever made it went from 403 to 405. The euphoria I felt for this incredibly small task was incredible. (Still haven’t pieced it together yet)
I found the email of the guy who I assume owns the site and I started writing an email that basically said “hey the count of your comics is off by one and you made me rethink existence trying to figure out why, you skipped number 404-”
I look at the gap between 403 and 405 Then the words “Error 404 Not Found” popped into my head. I proceeded to scream for a second and stopped writing the email and now I’m trying to come to terms with this.
TL:DR the guy who runs xkcd comics trolled me with a simple error 404 joke3
So after the original idea getting scraped during a hackathon this week, we created a slack bot to fetch most relevant answers from StackOverflow using user's input. All the user had to do was input few words and the bot handled all typos, links etc and returned the link as well as the most upvoted or the accepted amswer after scraping it from the website.
The average time to find an answer was around 2 seconds, and we also told that we're planning to use flask to deploy a web application for the same.
After the presentation, one of the judge-guys called me and told me that "It isn't good enough, will not be used widely" and "Its similar to Quora".
Never ever have I wanted to punch a son of a bitch in the balls ever.3
So here's my problem. I've been employed at my current company for the last 12 months (next week is my 1 year anniversary) and I've never been as miserable in a development job as this.
I feel so upset and depressed about working in this company that getting out of bed and into the car to come here is soul draining. I used to spend hours in the evenings studying ways to improve my code, and was insanely passionate about the product, but all of this has been exterminated due to the following reasons.
Here's my problems with this place:
1 - Come May 2019 I'm relocating to Edinburgh, Scotland and my current workplace would not allow remote working despite working here for the past year in an office on my own with little interaction with anyone else in the company.
2 - There is zero professionalism in terms of work here, with there being no testing, no planning, no market research of ideas for revenue generation – nothing. This makes life incredibly stressful. This has led to countless situations where product A was expected, but product B was delivered (which then failed to generate revenue) as well as a huge amount of development time being wasted.
3 - I can’t work in a business that lives paycheck to paycheck. I’ve never been somewhere where the salary payment had to be delayed due to someone not paying us on time. My last paycheck was 4 days late.
4 - The management style is far too aggressive and emotion driven for me to be able to express my opinions without some sort of backlash.
5 - My opinions are usually completely smashed down and ignored, and no apology is offered when it turns out that they’re 100% correct in the coming months.
6 - I am due a substantial pay rise due to the increase of my skills, increase of experience, and the time of being in the company, and I think if the business cannot afford to pay £8 per month for email signatures, then I know it cannot afford to give me a pay rise.
7 - Despite having continuously delivered successful web development projects/tasks which have increased revenue, I never receive any form of thanks or recognition. It makes me feel like I am not cared about in this business in the slightest.
8 - The business fails to see potential and growth of its employees, and instead criticises based on past behaviour. 'Josh' (fake name) is a fine example of this. He was always slated by 'Tom' and 'Jerry' as being worthless, and lazy. I trained him in 2 weeks to perform some basic web development tasks using HTML, CSS, Git and SCSS, and he immediately saw his value outside of this company and left achieving a 5k pay rise during. He now works in an environment where he is constantly challenged and has reviews with his line manager monthly to praise him on his excellent work and diverse set of skills. This is not rocket science. This is how you keep employees motivated and happy.
9 - People in the business with the least or zero technical understanding or experience seem to be endlessly defining technical deadlines. This will always result in things going wrong. Before our mobile app development agency agreed on the user stories, they spent DAYS going through the specification with their developers to ensure they’re not going to over promise and under deliver.
10 - The fact that the concept of ‘stealing data’ from someone else’s website by scraping it daily for the information is not something this company is afraid to do, only further bolsters the fact that I do not want to work in such an unethical, pathetic organisation.
11 - I've been told that the MD of the company heard me on the phone to an agency (as a developer, I get calls almost every week), and that if I do it again, that the MD apparently said he would dock my pay for the time that I’m on the phone. Are you serious?! In what world is it okay for the MD of a company to threaten to punish their employees for thinking about leaving?! Why not make an attempt at nurturing them and trying to find out why they’re upset, and try to retain the talent.
Now... I REALLY want to leave immediately. Hand my notice in and fly off. I'll have 4 weeks notice to find a new role, and I'll be on garden leave effective immediately, but it's scary knowing that I may not find a role.
My situation is difficult as I can't start a new role unless it's remote or a local short term contract because my moving situation in May, and as a Junior to Mid Level developer, this isn't the easiest thing to do on the planet.
I've got a few interviews lined up (one of which was a final interview which I completed on Friday) but its still scary knowing that I may not find a new role within 4 weeks.
Advice? Thoughts? Criticisms?
Love you DevRant <33
I just found out google web dev tools let you copy a request as curl command!
Time to scrape some websites baby!8
I had a wonderful run-in with corporate security at a credit card processing company last year (I won't name them this time).
I was asked design an application that allowed users in a secure room to receive instructions for putting gift cards into envelopes, print labels and send the envelopes to the post. There were all sorts of rules about what combinations of cards could go in which envelopes etc etc, but that wasn't the hard part.
These folks had a dedicated label printer for printing the address labels, in their secure room.
The address data was in a database in the server room.
On separate networks.
And there was absolutely no way that the corporate security folks would let an application that had access to a printer that was on a different network also have access to the address data.
So I took a look at the legacy application to see what they did, to hopefully use as a precedent.
They had an unsecured web page (no, not an API, a web page) that listed the addresses to be printed. And a Windows application running on the users' PC that was quietly scraping that page to print the labels.
Luckily, it ceased to be an issue for me, as the whole IT department suddenly got outsourced to India, so it became some Indian's problem to solve.2
The company that I work for has recently recruited a team for Web Development, so they don't have to pay a monthly fee to the previous team who designed their website.
They have over 3000+ products in the old website, and no logical way to import them to the new website. The old team was asking for 300$ to give them an API which would return the product details in an XML format.
Obviously, paying that amount of money wasn't logical for a dying website, so the manager decided to hire someone to manually copy the content from the old admin panel to the new one, that is until I stopped him.
My solution? Write a simple web scraper to login to the old panel and collect data. Boom! 300$ saved from going to waste.
Now, the old team found about this and as much as my manager was happy, they were quite angry. So they implanted a Google reCaptcha to prevent my bot from scraping the old panel.
I spent about 20 minutes, and found out once you're logged in to the old panel, the session is saved in a cookie and you are no longer greeted by a Captcha.
So I re-written a small portion of my bot, and Boom! Instant karma from manager. We finished publishing the new site, and notified the old team, only to see the precious look on their face. Poor guy, he thought I was a wizard or something 😂😂
That's what you get for overcharging people!
TL;DR: Company's old website team wanted to overcharge us writing an API to fetch 3000+ records.
Written a basic web scraper to do the same job in less than an hour.3
Now I can easily scrap motivational quotes, Hell Yeah.
* btw I am building random quotes generator but want to generate quotes with web-scraping *9
Just succesfully converted my entire app from using web scraping data fetching to direct API by reverse-engineering their android app to get to their private API
App is running much faster and more stable now, feels good3
When you're web scraping and the site suddenly redirect their url to their second site so your codes becomes useless.
Anybody have any great tutorials about web scraping with python? The data science courses I took only covered maipulauting and visualizing data not getting it.4
I'm an aspiring coder working some chappy administrator job just to pay the bills for now. My boss found out that I may actually be more computer literate than I let on.
Boss: "I want you to make X happen automatically if I click here on this spreadsheet"
Me "X!? That means processing data from 4 different spreadsheets that aren't consistently named and scraping comparison info from the fronted of the Web cms we're using"
Boss: "if you say so.. Can you do it?"
Me: "maybe.. Can I install python?"
Me: "what about node.js or ruby?"
Boss: "no.. I don't know what you're talking about but you're not installing anything, just get it done"
Me: "Errm Ok.."
So here I am now, way over my head loving the fact that I'm unofficially a Dev and coding my first something in Powershell and vb that will be used in business :)
Sucks that I still have to keep my regular work on target whilst doing this though!2
I'm about to graduate and I have no idea what I'm doing. I tried learning the basics and even went through a lot of extra stuff. I can only say I dabbled in scripting, web scraping and a little bit of software development. However when I compare myself to my peers, I feel so out of place. I can't confidently say I know even the concepts I practiced. I am really interested in the field but I feel like I'm way behind and this is constantly nagging me. Is this normal or is there anything I can do about it?2
What technologies would u suggest for a web based project that'll do some data scraping, data preprocessing and also incorporate a few ML models.
I've done data scraping in php but now I want to move on and try something new....
Planning to use AWS to host it.
Thanks in advance :)5
I have to download 500 images from bookreads to help a friend out. Thought I'd use this opportunity to learn about web scraping rather than downloading the images which'd be a plain and long waste of time. I've got a list of books and author names, the process I wanna automate is putting the book name and author name into the search bar, clicking it, and downloading the first image the appears on the new webpage. I'm planning to use selenium, BeautifulSoup and requests for this project. Is that the right way to go?9
Are you out of your free medium articles?😢 My Scrapy is here for the rescue.💸
This is simple application of web scraping, it scrapes the articles of medium and allows you to read or hear the article. If you use this on computer there will be a number of accents in the option.
The audio feature is provided only to the premium medium users, so here comes My Scrapy to save your 5$/month. 💸
Tech Stack used :
Python, beautiful soup, Django, speech synthesis
PS: This application was built for educational purpose and the source code for this application is not open sourced anywhere.
Fun Fact : You can still read any medium articles if they ask you to upgrade, you must be wondering how? Well, copy the link of the article and browse it in incognito mode on any browser.😂🤣
Try the app and lemme know if you liked it:
I really love freelancing and I'm getting a few projects here and there ,maybe once in a few months.but I'm currently studying so those few projects does financially reward me a bit. Is anyone here freelancing ,do you have any tips to get more projects.My projects are normally full Django applications and deployment, Web scraping scripts ,WordPress sites(yeah fml but some people want WordPress),Full websites with other various cms's.1
Do you prefer audiobooks? Are you an active medium reader? Do you want audio for the medium articles you read? Are you out of your free medium articles?😢 My Scrapy is here for the rescue.💸
This is a simple application of web scraping, it scrapes the articles of medium and allows you to read or hear the article. If you use this on computer there will be a number of accents in the option.
The audio feature is provided only to the premium medium users, so here comes My Scrapy to save your 5$/month. 💸
Tech Stack used :
Python, beautiful soup, Django, speech synthesis
PS: This application was built for educational purpose.
Fun Fact: You can still read any medium articles if they are asking you to upgrade, you must be wondering how? Well, copy the link of the article and browse it in incognito mode on any browser or sign out and read it.😂🤣
Just a legal question here.
Is web scraping legal in USA? I am asking here for the sole reason that I am sure that someone might have developed projects with web scraping.
I've heard that Walmart does it a lot.24
trying to scrap a site since long now. too frustrating to see how google is detecting the bot faster than before. any hacks if you people have please share.
using python for the same.3
Helo everyone , I am planning on learning some new tools and languages side by side working on this project which would be an application for creating or managing some lists for any tasks or some web series and categorize them as on going or completed or planning for now and . I want to web scrap information about that task with some pictures and text information whenever you add an entry to make it catchy and informative.
For example if I add any series name like FRIENDS and label it as on going so to make that entry not look boring my app will add some pictures and texts from google using web scraping.
So which language and tools would be helpful for developoy such application ?
Ps: i am pursuing my undergraduate computer science engineering .
I have basics of python and c++ with some data structures as well as basics of Mysql data base.
So noob question, is automated web scraping a thing? What would you do if you wanted to grab the same information off similar sites and store it in a table that can be manipulated later? All you would have to do is enter the web site link after you finished coding it. I've used Chrome web scraping extensions but want a more automated solution.10
Should I switch to Chrome Headless/Puppeteer for webpage header+footer scraping or stick with Express+Cheerio?1
Recently my teachers have started hassling us to get our ‘better’ projects on github for a ‘sorta’ portfolio
I have a simple C# script I wrote for a class assignment many months ago
Inside that script I call an exe created by using pyinstaller on a simple python program to grab info from the web related to the script’s purpose just to see how pyinstaller and web-scraping works
If I put this on github should it be two separate repositories or one with the python stuff in its own contained folder???
Thank you in advance2