15
Condor
4y

So I just spent the last few hours trying to get an intro of given Wikipedia articles into my Telegram bot. It turns out that Wikipedia does have an API! But unfortunately it's born as a retard.

First I looked at https://www.mediawiki.org/wiki/API and almost thought that that was a Wikipedia article about API's. I almost skipped right over it on the search results (and it turns out that I should've). Upon opening and reading that, I found a shitload of endpoints that frankly I didn't give a shit about. Come on Wikipedia, just give me the fucking data to read out.

Ctrl-F in that page and I find a tiny little link to https://mediawiki.org/wiki/... which is basically what I needed. There's an example that.. gets the data in XML form. Because JSON is clearly too much to ask for. Are you fucking braindead Wikipedia? If my application was able to parse XML/HTML/whatevers, that would be called a browser. With all due respect but I'm not gonna embed a fucking web browser in a bot. I'll leave that to the Electron "devs" that prefer raping my RAM instead.

OK so after that I found on third-party documentation (always a good sign when that's more useful, isn't it) that it does support JSON. Retardpedia just doesn't use it by default. In fact in the example query that was a parameter that wasn't even in there. Not including something crucial like that surely is a good way to let people know the feature is there. Massive kudos to you Wikipedia.. but not really. But a parameter that was in there - for fucking CORS - that was in there by default and broke the whole goddamn thing unless I REMOVED it. Yeah because CORS is so useful in a goddamn fucking API.

So I finally get to a functioning JSON response, now all that's left is parsing it. Again, I only care about the content on the page. So I curl the endpoint and trim off the bits I don't need with jq... I was left with this monstrosity.

curl "https://en.wikipedia.org/w/api.php/...=*" | jq -r '.query.pages[0].revisions[0].slots.main.content'

Just how far can you nest your JSON Wikipedia? Are you trying to find the limits of jq or something here?!

And THEN.. as an icing on the cake, the result doesn't quite look like JSON, nor does it really look like XML, but it has elements of both. I had no idea what to make of this, especially before I had a chance to look at the exact structured output of that command above (if you just pipe into jq without arguments it's much less readable).

Then a friend of mine mentioned Wikitext. Turns out that Wikipedia's API is not only retarded, even the goddamn output is. What the fuck is Wikitext even? It's the Apple of wikis apparently. Only Wikipedia uses it.

And apparently I'm not the only one who found Wikipedia's API.. irritating to say the least. See e.g. https://utcc.utoronto.ca/~cks/...

Needless to say, my bot will not be getting Wikipedia integration at this point. I've seen enough. How about you make your API not retarded first Wikipedia? And hopefully this rant saves someone else the time required to wade through this clusterfuck.

Comments
  • 8
    Just for parsing XML you're calling a program a browser? How is that remotely close to truth? ;D
  • 8
    I'm sure at least one library exists for querying Wikipedia in whichever language you want. This doesn't sound like something you'd want to throw your time at.
  • 4
    Your result is in Wikitext (it is literally written in the response...). You are also able to query HTML though, and then just strip all markup.
    Also you are requesting revisions, so you get them...

    Alternatively use the "extract" API (it is in the second link you posted), which might even be more suitable for your bot (I do not know what exactly you want...), and should only contain the simplest form of Wikitext (headers).

    And Wikipedia has a sandbox where you can tinker with all possible parameters (e.g. like response type or whatnot).
  • 1
    @sbiewald fair enough.. might poke at it again later, thanks :)
  • 2
  • 3
    i thought api access to Wikipedia was Wikidata.

    https://m.wikidata.org/wiki/...
    Wikidata:Tools/For programmers - Wikidata
  • 3
    Your language has an XML parser unless it's C or lower. But then you already probably include 3rd-party libs.
    We're all very sorry that a project that has to beg for funding on big red fucking banners didn't update their API from one capable data format to another equally capable data format.
    We're sorry that the blatant laziness of the volunteers caused you this inconvenience.
    We're also sorry that you had to deal with the CORS features of Mediawiki that uniquely allow it to be accessed from the browser without a proxy. It was clearly a mistake, no one would ever want to do that.
    The docs aren't exactly tidy, but overall I would say they're still on the better end of underfunded gargantuan opensource projects.
  • 5
    @heyheni Wikidata is to access the "hard facts": Dates, names... It is the data source for the "fact panels" (screenshot attached). If those panels point to wikidata, only a single source has to be updated for updating all versions with different languages.
  • 1
    @homo-lorens I'm so sorry for offending you by ranting about Wikipedia. Not like every open source project and their API's are underfunded.
  • 1
    Here's what I would call conventionally good API's:

    - corona.lmao.ninja (https://git.ghnou.su/ghnou/cv uses it)

    - devRant's API (just don't talk about logins)

    - iplist.cc (not very detailed but it does what it needs to)

    - Jikan.moe (I used to run an instance of this for a friend's application, but there is a ratelimited public instance too)

    I could go on for a while. But I am not paying for any of these. I do however self-host quite a few (doesn't cost anything if it stays in your network right) and I used to be a Wikipedia mirror. I can tell you that the English Wikipedia is actually not very large. It was too large for me to keep it, given that I couldn't do anything with it. I did one download in my Transmission seedbox, and waited until I had one upload. Then I deleted it. But it was "only" around 100GB large. That's pretty small in server land. If Wikipedia has actual hosting issues, make it federated. But they choose to beg, quite publicly so.
  • 2
    Expanding a little on Jikan: friend of mine already hit the ratelimits on the public instance pretty hard. His application was pretty damn popular, until twist.moe themselves stepped in and asked him to take it down because THEY had too much traffic originating from his application. Which brings me to the only other such online service provider I could think of that does the begging. Compared to Wikipedia at a fairly meager 100GB (not accounting for different languages here but still, and the localized ones are much smaller, images can be deduplicated etc), video hosting is no fucking joke. You can reach Wikipedia's size with under 100 videos, especially when you deal with 20-minute long anime. If anyone is in a position to beg, they would be it. Not Wikipedia. I hate to say it but that is NOT an excuse for me.
  • 1
    What do I need to do to add an entry on the wiki?
Add Comment