12
Condor
31d

So I just had a bit of a shower thought. Suppose you could get the linguists to break a language down and define all the rules that make up that language as if it were a protocol - exceptions included. If you get an arbitrary string of text, could you match against those rules, then break that down to the information it contains, and use that information against a new rule set to construct a new valid sentence containing the same information. Would you just have made the ultimate translator?

Comments
  • 3
    That is the current understanding. But language is complex. The first Duden (the most used dictionary in DACH) had 27000 words, the 28 version has 148000 words, with many words that are bew or removed.
  • 6
    In theory, yes. In practice the rules are not that strict and errors are there as well, which makes automatic analysis harder.
  • 5
    Wouldn't really work because language is a lot more than only structure (grammar) and tokens (vocabulary). It's also how the tokens are interrelated with each other, invoking side aspects of meaning.

    You can express the same underlying topic in different words e.g. for framing events in different narratives, and even if the facts remain the same, the meaning can change drastically. That's a key principle in subtle propaganda.

    Language isn't like a taxonomy that you can define. It's a self-referential system, basically a huge circular loop. That's no accident because it's how mind works.

    It's also why taxonomic object hierarchies havn't been working out as well as people thought when they were high on multiple level inheritance crack in the 90s - only to discover that you will always run into catdogs sooner or later.
  • 3
    It's not crazy — but it's way harder than anyone anticipated. The computational failure of these sorts of analyses of linguistics highlights fundamental problems in efforts such as the minimalist program, and going further back, even the foundations of structural linguistics.

    As anyone who has thought about language, it is clear that there _are_ rules, of a sort. But reductionist approaches have up to now captured only some aspects of how language works. We essentially have a bunch of different linguistic theories, each of which capture in great detail some part of the logic behind some part of the process. Attacking language in the same way that we attacked, say, physics, has been at best only a partial success.

    This is why statistical and ML-founded approaches to machine translation have proved to be much more successful.

    It's not hopeless though: there is still room for new sorts of linguistic theory.
  • 4
    The whole problem behind this idea is that natural language is ambiguous and constantly changing including dialects.

    Try writing a C compiler that handles 3 different meanings for the 'return' keyword based on some other context, can change its behaviour if a comment contains sarcasm and can adapt to different syntax variants according to the programmer's place of birth.

    Some researchers even stopped any attempt at transcribing spoken language into text before translation and are now using neural nets to directly translate the audio signal into another language, and it works surprisingly well.
  • 1
    @deadlyRants my theory is, that this works, because neural nets found the origin of language or an base relative near to it.
  • 2
    Very often text refers to some implication of the statement (e.g. sarcasm refers to the implications of the statement regarding the speaker) or the phonetic properties of the text (puns) or some falsely implicable but convenient logical connection (doublespeak)
    All of these are difficult for computers to even identify, and when done right none of them produce noticeable syntactic anomalies.
  • 2
    @stop Another reason could be that spoken language is usually orders of magnitude simpler than written language.
  • 1
    @homo-lorens i would say its rather complicated. The Text and phonetic influenceceach other. "I love You" can be negated alone with the way it was spoken.
  • 2
    @stop Certainly, but tone is much simpler for a neural net to crack than the rich context of written text, or the grammatical structures that would represent this negation. I think this is why audio translation works much better, plus the fact that spoken language usually stresses the important parts in a sentence.
  • 3
    @homo-lorens some parts make it easier, but in my opinion spoken language is harder because:
    1. not everyone speaks the same, especially when it comes to inside jokes that a human sometimes uses.
    2. everyone speaks in a dialect, for example how would you translate "Nahd"?
    3. not everyone uses the same words/sounds in the same meaning.
    As an whole i see spoken language as understanding the text plus understandingwhat modifies the text plus understanding the differences from rhe common language.
  • 1
    What you're talking about is something Noam Chomsky formalized with his grammars for generating language sentences. This is what natural language processing folks do, but in practice those grammar rules are not easy to construct at all because of the irregular nature of most languages.

    Which is why modern translation systems use deep learning to learn representations that drive translations instead (but yes, there is a lot of domain knowledge that goes into translation). NLP is a pretty fascinating topic.

    @Fast-Nop yes but replace @Condor 's "linguists" with deep learning with modern techniques like attention, memory, and recurrence (though practically nobody uses RNNs anymore) and they can learn good enough embeddings and representations to do decently accurate translation. Of course this has to be tuned by experts, but the system works quite well in practice.
  • 1
    People had tried it already. Actually, the first MT engines were rule based. We had several translators working on this (that's when I worked on Lionbridge, long time ago).

    But AI have better results. Language is more than different words or sentences. Sometimes the string match, but has other feeling. A totally normal phrase in English might sound pretentious in other language even when perfectly translated.

    That's why we tend to use native translators whenever is possible.

    Language is not so easy.
  • 0
    @stop You have a gross misunderstanding of what machine learning is.
  • 0
    @junon probably.
  • 3
    @junon "Machine Learning" is really just another name for "block chain", which is another name for "cloud" (aka SaaS - Service as a Service).
  • 3
    @Fast-Nop and all of it is Blazing Fast.
Add Comment