15
StanTheMan
261d

Scrape the Twitter frontend API without any authentication and restriction.

Project Type
Existing open source project
Summary

Scrape the Twitter frontend API without any authentication and restriction.

Description
Twitter's API is annoying to work with, and has lots of limitations — luckily their frontend (JavaScript) has it's own API, which I reverse–engineered. No API rate limits. No restrictions. Extremely fast. You can use this library to get the text of any user's Tweets trivially. Extract user tweets with all meta-data Extracts external links, hashtags and mentions from a tweet Extracts reply, favorite and retweet counts of a tweet
Tech Stack
python3, lxml, xpath
Current Team Size
1
URL
Comments
  • 0
    This is my first pypi release so, any suggestions, feedbacks are welcome.
    You are more than welcome to contribute.
  • 1
    Was looking for something exactly like this, sweet!
  • 0
    @bolobz thinking of adding topic modelling in the next release...
  • 1
    This is very nice, thank you so much for this. I'm working an a Twitter sentiment analyzer project (for my university) and using Tweepy is frustrating, indeed.
    I'll check out your code, seems very cool!
  • 2
    Pure curiosity: is this legal?

    I mean, in theory their front-end is "closed" source (even if visible to everyone) and copyrighted.
    And this also allows you to, as you said, use their service without restirctions, for example to create an army of bots, or just to steal credentials with your custom authtentication process.

    Awesome project btw. 😜
  • 1
    @JS96 yeah I am kind of new to the whole scrapping world, so I don't know how much of it is legal and all... I hope I don't get in to trouble... Although to be respectful I have a added a max of 25 pages scrapping in one instance ... But I know that won't stop some one out there to abuse it...
  • 0
    Being respectful is not the concern here. There are a lot of terms and conditions for developers using the Twitter API and you will be circumventing all of them. This will get you into a HUGE amount of trouble.

    Particularly when GDPR comes into affect in Europe. This has a lot of implications for those who would use such a tool to gather and collect data. The developer t&c’s give you a set of guidelines, if you follow them, you can use them to protect yourself. By circumventing that, you will have no defense.

    This is probably considered breaching someone’s privacy as you haven’t agreed to the rules Twitter have put in place.

    ... on a completely unrelated note, what’s the issues with Twitters API’s? Apart from rate limiting, which you know is a “sorry we need to make money from this at some point” kind of thing, I’ve not had any issues. Built many tools using the streaming API without any hassle
  • 1
    @practiseSafeHex I worked with the Twitter API through Python and official Tweepy module so I can speak only of this.
    First, the docs were badly written... For example, the main page regarding RateLimit exceptions had outdated examples (related to an old version), I found the updated answer on Stackoverflow (lol) and they linked me to the update notes page, literally hidden.

    Also, it was basically impossible making advanced search given a followers list (for example, if I get your followers I can't filter the list by countries, languages or geo... This works only with hashtag search, as far as I know, but it's a different method)

    Also, regarding user timelines, docs said that it was possible to filter by dates range... But I had mixed results (sometimes I got tweets between my two dates, sometimes a lot of tweets before my starting date (?!).

    I don't know if recent releases fixed these problems but, honestly, it was fucking bad working on it... Aside from the awful limits
Your Job Suck?
Get a Better Job
Add Comment