Also very interesting! Excellent research design and presentation, too.
Your results accord with my own (much less systematic) tests of the translation of short texts by reasoning models. The issue becomes more fuzzy with the translation of longer texts, where quality is more difficult to evaluate objectively. I'll drop you an email with some thoughts.
https://nuenki.app/blog/the_more_llms_think_the_worse_they_t...