Very neat, love how there’s a formality level selection! Google translate has such bad tendencies to use very formal language (at least when translating into Thai) that it’s almost useless in real life. Some English to Thai examples I tried so far have been quite natural.
Very nice! Thanks for the links.
I just did some more research, which you might find interesting. Thinking actually makes them translate worse!
https://nuenki.app/blog/the_more_llms_think_the_worse_they_t...
Also very interesting! Excellent research design and presentation, too.
Your results accord with my own (much less systematic) tests of the translation of short texts by reasoning models. The issue becomes more fuzzy with the translation of longer texts, where quality is more difficult to evaluate objectively. I'll drop you an email with some thoughts.
I built something kinda similar, and made it open source. It picks the top x models based on my research, translates with them, then has a final judge model critique, compare, and synthesise a combined best translation. You can try it at https://nuenki.app/translator if you're interested, and my data is at https://nuenki.app/blog