Slowly getting better with RegEx problems! Warning, lots of non-computer linguistic geekiness ahead.

Been working on some functions recently to replicate the furigana (Chinese character annotation) functions available over at JP.SE in PHP for a project.

Managed to get the basic cases down fairly quick:

[Chinese character][reading] => <ruby><rb>Chinese Character</rb><rt>Reading</rt></ruby>

However I realized this evening that there are patterns where this repeats twice for one word, such as the following:

[Chinese Character][helper Japanese character(s)][Chinese Character][possibly optional word ending][reading for the whole thing]

Managed to get it working for both cases initially, but then I found out that adding a Japanese character to either of my test strings (see graphic) would cause the annotations to fall grossly out of sync. The next two hours disappeared pretty fast before discovering that the issue was that I was removing the wrong string length from the annotation string, and just happened to luck out with a test case where it worked the first time.

Probably going to do a code review of it with the intern next time he's in. One of the things I've been stressing to him lately is that however easy a task may be for a human, there are all kinds of extra things that need to be tracked in order for a computer to be able to follow your logic.

  • 10
    The fully-rendered result, if anyone is curious what it looks like
  • 1
    I was going to ask if you were a linguist and I found out that you are by looking at your profile! This is great my dude!
  • 2
    @AleCx04 Languages are my first passion. Unfortunately, I haven't had a regular speaking partner to practice my Japanese with more or less since graduating college. On the bright side, however, that's kind of how I got back into programming again about 10 years ago—building tools to help me keep in practice and study new material on my own.
  • 2
    @Kaji makes me think of Larry Wall, the creator of Perl. He is a linguist. I have no clue where or how he got his mad skills, but you reminded me of him.

    Japanese is on my bucket list of languages I would love to learn. I know English and Spanish and fluent Portuguese, and it is Portuguese the one that I have no one to practice with. My wife is a linguist :D she speaks English, Spanish and French and she has been helping me with French although I have a horrible accent.

    I lived in S.Korea for almost 4 years and took classes but it didn't stick with me. I want to get back into it. One of my closest friends over there spoke English, his native Korean, Chinese and Japanese, and i got pretty excited to learn that I would not butcher asian languages(I butcher French)
  • 1
    @AleCx04 I hear ya on the butchery! I dated a Vietnamese girl for 9 years, and while I learned how to pronounce the family members' names well enough, any time I tried forming sentences I swiftly got told to stop because she found it painful.
  • 1
    When I was doing my senior year of college in Japan I managed to test completely out of the exchange student program and take courses alongside the Japanese students, which was a lot of fun. Since I was already doing an independent study on Japanese (which ended up going into Classical Japanese), I decided to take Chinese on the side—logic was that by learning a third language in my second language, I'd better appreciate the nuances between the two, and between one of them and English I could more or less triangulate the meaning of uncertainties in the third. Stuck with the reading (I'm a kanji addict), but if you ask me to read aloud you're more likely to get a basic outline in English. I know just enough Chinese to be dangerous—get the main idea, but not the details.
  • 0
    Hurts my eyes
Add Comment