Comment by ickoonite - Hacker Neue

ickoonite Feb 1, 2023 parent

> but the level of information you can encode with a syllabary is much higher

[citation needed]

It’s not clear how you can encode any more information with hiragana/katakana - the Japanese syllabaries - than you can with an alphabet. Indeed, it’s fairly clear the reverse is true - you can only really encode sounds for which the syllabary has symbols; conversely, as English demonstrates, you can encode a vast array of sounds while only having 26 distinct letters.

culi Feb 1, 2023

Hmm I'm not sure you're completely clear on how syllabaries (including katakana, hiragana, kanji, etc) work. You can use them to encode anything as well

Orthographic English is probably the best example to show the inefficiencies alphabets sometimes bring. The English language has ~24 constants which are often well-represented, but then you have things like "ng" or "sh" which is actually a single phoneme that we lack a symbol for. On the flip side, English has an unusually large number of vowel phonemes, around 13 monophthongs and 7 dipthongs. Yet we have only 5 symbols for vowels and often use them in very ambiguous and end up having strange combinations of them leading to ghoti:

https://en.wikipedia.org/wiki/Ghoti

The point is you only have 26 letters, but now you end up having to memorize a vast array of combinations and how they work in different contexts. You're really not saving yourself any more memory space than if you'd learned a syllabary

ickoonite OP Feb 1, 2023

I suppose it depends what you consider efficient: I would counter that using a mere 26 letters to encode all the varying sounds of English is wonderfully parsimonious, an incredibly efficient use of those characters. Such an efficient encoding does however, as you point out, make decoding more cumbersome, as it requires memorisation of the specific pronunciations of strings of letters up to and including whole words. In that sense, however, it is very similar to the Japanese (ab)use of kanji, which - as I pointed out at the very top of this thread - has the same problem. For a given kanji, you need to see it in context to be able to have a reasonable chance of pronouncing it correctly (and sometimes even that isn’t enough).

What I’m slightly puzzled by is your apparent confusion as to what a syllabary is: as I gently tried to hint in my reply (and someone else has now more explicitly pointed out), hiragana and katakana are syllabaries; kanji is not, even if it is occasionally used that way (当て字). I’m not sure to what extent that undermines what you were trying to say.

But, to engage with the substance of your point on the efficiency of Japanese syllabaries, we first have to put aside the fact that they retain two distinct systems to encode the same sounds (a baroque inefficiency surely without peer in any other language). It is true that modern kana allow for efficient decoding - there is almost no ambiguity in the sounds, は for ha/wa excepted. That reliable decoding does, however, impose a fairly hard limit on the number of sounds they can express, so I am not sure what you mean when you say “[y]ou can use them to encode anything as well”.

Zababa Feb 1, 2023

If we're bringing accenting into this (as with ghoti, that uses the "o" from "women"), then syllabaries are far from optimal as well, since Japanese has different ways of accenting each word that are not encoded into the syllabaries themselves. You then end up with having to memorize a vast array of combinations and how they work in different context. So it's not really a syllabary but an alphabet with more letters. Which is totally fine, but then calling it a syllabary creates confusion since people expect to be able to pronounce words easily, which they can't with only the word written (just like with "women" or "ghoti").

Kanjis are not a syllabary, they're originally ideograms but they're not really, some of them "make sense", some don't really. So they become mostly another layer of mapping symbols to meaning, except this time you have tens of thousands that can't be decomposed properly into smaller parts (like words with letters), which is terrible for many reasons.

On the other hand kanjis offer you the opportunity to play around with different meanings, in a way that you just can't in English. That makes Japanese richer and more interesting, at the cost of being a harder language. I'm glad both exist.

smsm42 Feb 5, 2023

Practically, the difference is not that big. There are many sounds English letters can't encode, and for others it cheats by saying "let's pretend th sounds like this, despite it having little to do with t or h, and then zh sounds like that, and ae like this, and so on". You could do the same with a syllabary - and to some measure Japanese does, aided with special marks and other tricks, but as English inevitably misses some sounds, so does Japanese. It's inevitable - look at IPA symbol set to see how many there are needed, and I'm sure even that doesn't cover all the possibilities.

What you lose with syllabary is to be able to encode some patterns - like Czech "strč prst skrz krk" - pretty much no way to encode it in Japanese I think, unless you resort to a lot of cheating like inserting "u" everywhere and then declaring "u is silent" (which is pretty normal for Japanese in general but in this case kinda looks like cheating). But tbh English encoding wouldn't adequately describe how it's pronounced either.

ksenzee Feb 1, 2023

I think they meant the encoding is more efficient, so you can encode more information with fewer characters.

Tor3 Feb 1, 2023

If you think of Japanese writing as a whole (kanji, hiragana, katakana where needed) then indeed you can encode more information in less space. Which is easy to see if you compare the Japanese sections with the English sections of dual-language user manuals (those who actually include exactly the same amount of information of course). The Japanese sections are about 30% shorter than their English (or any other language written with Latin letters). One manual I looked at was 60 pages in Japanese, 90 in English, including illustrations (same on both).

ickoonite OP Feb 1, 2023

Absolutely. Japanese Twitter is another great example.

I’ve often liked to describe kanji as a form of compression: the problem is the encoding and decoding are done in your head rather than by a computer.

smsm42 Feb 6, 2023

But since you need more bits to encode a single character, at least in most common encodings without inventing a custom one, it's not really much more efficient.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous