Does anybody know a good open source webscraping tool for java?

  • 1
    I've done some webscrapping with Selenium, but it is not quite meant for that and it's not the best option.

    If you can work outside Java, I do all my webscrapping with Python Scrapy.
  • 0
    @antorqs thanks! I want to get into webscraping and don't really know where to start... Since i only know Java and i couldn't find any good open source tools for the things i am planning to do i thought i might ask here. I am currently learning python in order to be able to use scrapy
  • 0
    @hufhufhuf don't learn Python, it's terrible. I work in it in my day job, Kotlin is so vastly superior.

    Global methods are used for things like map, len, etc. Which is very procedural feeling.

    The language is dynamically typed, which means nothing is known and verified at compile time (which is why the standard library has typing errors in it).

    Basic operations like counting the number of occurrences of something in an array require that you instantiate an entire new object (Counter) or write the solution imperatively, since the developers of Python have a serious problem with higher order functions.

    I can go on.
  • 0
    @Nevoic yeah i know a bunch of people already told me that and (i mean i am still at the very begining but still...) I already stumbled across some weird stuff... But the idea is to use python together with scrapy for the webscraping and parsing the contents to a file or something like that and then using some other language to process the output
  • 0
    @hufhufhuf do you need JS in your scraper?
  • 0
    @Nevoic why would i need js?
  • 0
    @hufhufhuf if the stuff you're scraping is rendered via JS.

    To check, disable JS in your browser, and then go to the webpage you're scraping. If it has the content you need, you don't need JS, and can use something like JSoup instead of Selenium.
  • 0
    @Nevoic :O i did not know that thanks!
  • 2
    @Nevoic discouraging the learning of a language like Python because a personal bad experience is not right.

    I work everyday with Python, and I've been working with Scrapy in the last two years and I think it couldn't be simpler to do webscrapping.

    I'm also a Java dev and I won't say one is superior over the other. They both are quite strong for different scenarios.
  • 1
    Jsoup was fine for Java.
    I use requests + Beautiful soup in Python though.
Add Comment