Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
C0D4681385yI almost always go to java-selenium for this kind of thing, unless I need something quick and dirty then I'll just abuse file_get_contents() in PHP or cURL if I need to carry sessions around.
Selenium isn't that hard to
work out, just as long as you remember that elements must be in the browsers view or it usually haves a heart attack. -
@C0D4 I would always go for python and bs4 because I don't want the weight of spinning up a headless browser just to pull some text from a page.
-
C0D4681385y@vorticalbox I get that. But for a large site being tested or if the scraping will occur a lot and with other people involved, I'll happily spend the time doing it.
As I said, a quick and dirty scrape, I wouldn't bother and just use something I can write in 10 min and let it rip. -
C0D4681385y@vorticalbox other then being node based, first glance it looks like it works basically the same way as selenium. Maybe I'm missing something, but how does this differ to selenium?
-
@C0D4 doesn't as far as I know, we used it once at work for testing in code pipelines and the new studio looks nice.
-
@hitko oh WOW, my generic advise for most websites fails on a specific type of website?? Unbelievable!!! How could that happen???? Whats next, you gonna tell me you dont like my left foot shoved up all the way in your rectum??? Imagine my shock!
-
@hitko maybe but I've been scraping for years and only needed to use a headless browser from one site.
tldr; selenium-java (my newest learned tool) vs beautifulsoup4 (my most experience with) or scrapy(average experience, mediocre ability with). Which should I use if allowed to use any for web scrapeing assignment
We were explicitly told we can use anything we know from class or self study (slight bonus for self study implementations) for the group project, but would it be OK/fair for me to use beautifulsoup4 or scrapy to pull the data from the assigned site rather than the selenium-java we were taught in class
If I did use bs4 or scrapy my group wouldn't be able to edit if needed but the data collection is only a small (if immensely important) part of the assignment and I'd have the bs4 script done a lot quicker than with selenium which I have learned more recently (for class) and have less experience with
question