Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
This made me wonder and I found an interesting read on the subject.
https://resources.distilnetworks.com/... -
taigrr8786y@aldoblack I'm pretty sure it's illegal if you have to log in, if you agreed to not do it in ToS. It's considered trespassing, or something. Can't remember exactly why atm.
-
taigrr8786y@electrineer I'm aware that's the case for most things. But I think there was a common law case where using a login + breaking tos constituted trespassing.
Hence my mention of trespassing. -
Root797676yI doubt it.
The information is public.
If you need a login and have one, it's still available to you.
Now, if you take information given to you and share it with others without access, that could get you in trouble. But as @electrineer said, doing so is breaking a contract, but probably not the law. So it's likely not illegal. -
@Root AT&T got Andrew Auernheimer thrown in jail for accessing information that they mistakenly made public that he was able to receive without a login. He did not profit off of it -- just obtained it.
-
Root797676y@steaksauce Sounds like a paid off judge to me. That's like saying a newspaper could have someone thrown in jail for copying a classified ad the paper had mistakenly published.
Once published, the information is public. If anyone should go to jail, it should be those responsible for publishing it. -
It's written in the HTTP1.1 Release Papers that you should respectfully not make crawlers, as they generate huge amounts of traffic.
-
I love this debate. I've done it enough times before, so let's dive in:
define scraping? many people simply say using a bot to request a web page, and then act on the returned page.
you could argue a browser is just a scraper. what it scrapes is determined by user input, and it's action is to display it.
or that `curl http://somewebsite.com/ >/dev/null` is just scraping; the action you take is to disregard the results completely!
"now that's a bit out there. you're intentionally playing it loose to make all web activity seem like scraping!"
then let's try again: scraping is just browsing, but faster! much faster.
so 1000 users accessing the same site at the same time manually, provided they all come from the same IP, looks near indistinguishable. does it count? up to you.
if you want an interesting read on this, look into LinkedIn v. hiQ -
Scrapping, done correctly, should look like normal traffic. Selenium+Chrome is awesome at this because you can’t scrape fast.
-
another question: from the server side, the only real difference is speed of the requests (assuming the bot is smart enough to not put WebScraper/2.1 in it's user agent. we're trying to look legit here.)
how fast do humans have to refresh for it to count?
how slow do scripts have to run for it to not?
or is it all about what we actually do to the data? or are we just against the idea that every web request need not start with a human and a keyboard?
I don't think web scraping could ever be illegal. at least not enforcably. it's just not defined what is and isn't scraping. nor can you tell from the server side assuming you're decent.
so I guess a better question for webadmins is rate limiting: how often is too often in terms of periodically refreshing? -
UPDATE 2: The data that are supposed to be scraped are ONLY FOR INTERNAL USE inside the company. Data analysis mostly. NOT FOR RESALE.
-
@aldoblack you mentioned Walmart in your original post and then with Update 2 gave some verbiage that sounds like Retail Link, either the DSS or OTIF pages. There are companies that do that Harvest Corp, Retail Solutions, Mookster, Atlas, and a lot of suppliers do that internally (If they don’t get their RL data via EDI). I have a program that scrapes DSS and OTIF data, but I don’t have a RL login so I can’t use it. As long as you use valid RL logins and you work for the supplier (internal) you are fine. If you don’t work for the supplier, WM has a “third party” form that authorizes your access, the company will have to engage the RM for that form.
-
@Root Why does everything have to be a law nowadays to stay conform with something that many people agree upon?
Thats like you basically saying: "Well there's no law that I can't drive my car like an absolute asshole, so I will"
No one can stop you from doing it directly, but you'll not make any friends that way. -
@beggarboy the question was about legality. And driving like an asshole is probably illegal, depending on the type of asshole you mean.
-
Root797676y@beggarboy Scraping isn't immoral or mean though? So your analogy doesn't fit. Also, the original question was about legality anyway.
-
@Root ....Scraping is like a small form of DOSing because you are generating a lot of unnecessary exponential traffic.
-
Root797676y@beggarboy Depends on rate, but no matter how you look at it increased traffic isn't a denial of service. Imagine the amount of scraping you would have to do in order to overload a server! Most sites don't have even close to that much content.
You might cost the company a few cents more in hosting, but if you're going to be collecting the same data manually anyway, the traffic would only differ in speed, not volume, so it would cost them the same regardless. -
@beggarboy depends what you mean by unnecessary.
the way I see it, if I'm getting use out of constantly scraping, then I see no problem. if they dislike the speed at which I'm doing it, it's their job to rate limit me or revoke my access to the data. and I take the same approach with the servers I manage.
usually I intentionally throttle myself to the minimum speed needed out of respect for their bandwidth, but I see that as more of a courtesy.
Related Rants
Just a legal question here.
Is web scraping legal in USA? I am asking here for the sole reason that I am sure that someone might have developed projects with web scraping.
I've heard that Walmart does it a lot.
question
scraping
web
legal