Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "wikipedia api"
-
"I'll just fetch the data from Wikipedia, it must be really simple using their API"
I've never been so wrong.2 -
So I just spent the last few hours trying to get an intro of given Wikipedia articles into my Telegram bot. It turns out that Wikipedia does have an API! But unfortunately it's born as a retard.
First I looked at https://www.mediawiki.org/wiki/API and almost thought that that was a Wikipedia article about API's. I almost skipped right over it on the search results (and it turns out that I should've). Upon opening and reading that, I found a shitload of endpoints that frankly I didn't give a shit about. Come on Wikipedia, just give me the fucking data to read out.
Ctrl-F in that page and I find a tiny little link to https://mediawiki.org/wiki/... which is basically what I needed. There's an example that.. gets the data in XML form. Because JSON is clearly too much to ask for. Are you fucking braindead Wikipedia? If my application was able to parse XML/HTML/whatevers, that would be called a browser. With all due respect but I'm not gonna embed a fucking web browser in a bot. I'll leave that to the Electron "devs" that prefer raping my RAM instead.
OK so after that I found on third-party documentation (always a good sign when that's more useful, isn't it) that it does support JSON. Retardpedia just doesn't use it by default. In fact in the example query that was a parameter that wasn't even in there. Not including something crucial like that surely is a good way to let people know the feature is there. Massive kudos to you Wikipedia.. but not really. But a parameter that was in there - for fucking CORS - that was in there by default and broke the whole goddamn thing unless I REMOVED it. Yeah because CORS is so useful in a goddamn fucking API.
So I finally get to a functioning JSON response, now all that's left is parsing it. Again, I only care about the content on the page. So I curl the endpoint and trim off the bits I don't need with jq... I was left with this monstrosity.
curl "https://en.wikipedia.org/w/api.php/...=*" | jq -r '.query.pages[0].revisions[0].slots.main.content'
Just how far can you nest your JSON Wikipedia? Are you trying to find the limits of jq or something here?!
And THEN.. as an icing on the cake, the result doesn't quite look like JSON, nor does it really look like XML, but it has elements of both. I had no idea what to make of this, especially before I had a chance to look at the exact structured output of that command above (if you just pipe into jq without arguments it's much less readable).
Then a friend of mine mentioned Wikitext. Turns out that Wikipedia's API is not only retarded, even the goddamn output is. What the fuck is Wikitext even? It's the Apple of wikis apparently. Only Wikipedia uses it.
And apparently I'm not the only one who found Wikipedia's API.. irritating to say the least. See e.g. https://utcc.utoronto.ca/~cks/...
Needless to say, my bot will not be getting Wikipedia integration at this point. I've seen enough. How about you make your API not retarded first Wikipedia? And hopefully this rant saves someone else the time required to wade through this clusterfuck.12 -
Did anyone here ever play with Wikipedia API? I'm trying to get data about some cities, but I don't know how to pick the city if there are different meanings for the same name (city, river, etc).1
-
Third day of working on my recruitment task, and I'm starting to get pissed. I'm applying for Junior JS developer (suprised that they even picked me, I had 1 JS project in my resume, rest was Java). The task seemed simple, create website with autocomplete field which gets 10 cities with most polluted air from given country and get cities deacription from Wikipedia. But hell no. First, the air quality API that they told me to use sucks horse dick. Like seriousy, you can get a fucking timeout while fetching data, because as author explained, someone decided to make 2 fucking queries per request, one to count all possible results, and then the second one for actual data. Like, WTF, why would you do that. After I got that shit to work from time to time, it was time to Wikipedia API. And the shitshow starts again. Because it turns out that you can't filter the results based on the category. Which means that if the city has the same name as river or some fucking guy doing sports, I won't get the fucking description, because it will simply return info, that there are more more that 1 result. At this point, I'm so fucking pissed, I am barely keeping it together. I want to work at this company, because the pay is great, there are a lot of opportunities and shot, but god dammit, if I finish this task, I'm getting drunk for 3 days straight.
EDIT: even author of the air quality API says that it is not a good fit for given task...4