Preferences

I feel like machine translation is the unsung hero of the recent AI wave. Gone are the days of just barely being able to discern the meaning of Google Translate. Now I can just read it.

I don't know how useful LLMs will ultimately turn out to be for most things, but a freaking universal translator that allows me to understand any language? Incredible!


Machine translation has certainly become better, and that's amazing and wonderful to see. Definitely an amazing thing that has come out of the AI boom.

However, it has led to many websites to automatically enable it (like reddit), and one has to find a way to opt out for each website, if one speaks the language already. Especially colloquial language that uses lots of idioms gets translated quite weirdly still.

It's a bit sad that websites can't rely on the languages the browser advertises as every browser basically advertises english, so they often auto translate from english anyways if they detect a non-english IP address.

What do you mean "every browser advertises English"?

In my experience, users who genuinely don't want English will most definitely have their browser language set to the language they do want.

I think what you might be seeing is that many users are OK with English even if it's not their native language.

Not sure that every browser advertises English, but mine certainly does. However, as I'm in Portugal, many websites ignore what my browser says and send me to translated versions, I assume based on my IP. That causes problems because the translations are often quite bad, and they do it with redirects to PT URLs so I can't share links with people who don't speak the language.
I have the same problem in Argentina. Worse, I'm pretty sure that Google and other search engines decide that I don't deserve to receive good information because I live in a Spanish-speaking country, so they send me to terrible low-quality pages because often that's all that's available in Spanish.
Does "advertises" in this context mean what's put in the "Accept-Language" HTTP header? Might be worth seeing what that value specifically is the next time this happens. A "clever" IP-based language choice server-side seems far too complicated and error prone, but I guess that's what makes it so "clever."
Yeah I've seen this a few times on the backend that decides this. The standard should be to use the accept-language header, but all the time when people write their own code on top of frameworks (or maybe use niche shitty ones) they just geoip for language.

For business use cases sometimes it's based on the company's default language that you're an employee for.

Try to use any Google site while traveling. I have two languages in my Accept-Language header, but Google always give me language based on location if I'm not logged in. There are also many other sites that does the same, often without any option to change language
Early in my career I spent a lot of time thinking that HTML was antiquated. "Obviously they had 20th century ideas on what websites would be. As if we're all just publishing documents." But the beauty of HTML eventually clicked for me: it's describing the semantics of a structured piece of data, which means you can render a perfectly valid view of it however you want if you've got the right renderer!

I imagine language choice to be the same idea: they're just different views of the same data. Yes, there's a canonical language which, in many cases, contains information that gets lost when translated (see: opinions on certain books really needing to be read in their original language).

I think Chrome got it right at one point where it would say "This looks like it's in French. Want to translate it? Want me to always do this?" (Though I expect Chrome to eventually get it wrong as they keep over-fitting their ad engagement KPIs)

This is all a coffee morning way of saying: I believe that the browser must own the rendering choices. Don't reimplement pieces of the browser in your website!

> I imagine language choice to be the same idea: they're just different views of the same data

This is a tempting illusion, but the evidence implies it’s false. Translation is simulation, not emulation.

What evidence are you thinking of?

The parent comment is essentially correct that translations of the same material into different languages represent different views of the same data. A human translator must put in quite a bit of effort establishing what underlying situation is being described by a stretch of language.

Machine translations don't do this; they attempt to map one piece of language to another piece of (a different) language directly.

Relatedly, I tend to think of translations somewhat similar to a lossy system like those used in (say) image compression.

ie a compressed jpg of an image can retain quite a lot of the detail of the original, but it can introduce its own artifacts and lose some of the details

For things where the overall shape and picture is all that's required, that's fine. For things where the fine details matter, it's less fine.

Translations seem to be similar in that regard.

Yeah! I don't know what methods Safari on iOS uses, but in general translation has become pretty magical. It feels like we've kind of slepwalked through the invention of the Universal Translator. I just haven't heard as much gushing about it as I feel it deserves. I can just effortlessly read a sciency news article originally written in Portuguese!
In terms of translation quality, it's still DeepL > Google > Apple, with Apple a fair bit behind and generally more stilted (and far fewer languages).
A nice thing with LLMs is that you can ask them for a more comprehensive and detailed translation, and explain the nuances and ambiguities rather than trying to match the style of the original. This is great for things like group chats in a foreign language, where it’s full of colloquial expressions, shorthand, and typos.

This item has no comments currently.