Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Related Rants
Been working on trying to get JMdict (relatively comprehensive Japanese dictionary file) into a database so I can do some analysis on the data therein, and it's been a bit of a pain. The KANJIDIC XML file had me thinking it'd be fairly straightforward, but this thing uses just about every trick possible to complicate what one would think would be a straightforward dictionary file:
* Readings and Spellings/Kanji usage are done in a many-to-many manner, with the only thing tying them together being an arbitrary ID. Not everything is related, however, as there can be certain readings that only apply to specific spellings within the group and vice versa. In short, there's no way to really meaningfully establish a headword fora given entry.
* Definitions are buried within broader Sense groups, which clumsily attach metadata and have the same many-to-many (except when not) structure as the readings/spellings.
Suffice to say, this has made coming up with a logical database schema for it a bit more interesting than usual.
It's at least an improvement over the original format, however, which had a couple different ways of setting up the headword section and could splatter tagging information across any part of a given entry. Fine if you're going to grep the flat file, but annoying if you're looking for something more nuanced.
Was looking online last night to see if anyone had a PHP class written to handle entries and didn't turn anything up, but *did* find this amusing exchange from a while back where the creator basically said, "I like my idiosyncratic format and it works for me. Deal with it!": https://sci.lang.japan.narkive.com/...
Grateful to the creator for producing the dictionary I've used most in my studies over the years, but still...
rant
idiocyncracies
japanese
dictionary
xml