I made a tool which translates sentences as you browse, for immersion[0]. I solved this by giving the model a code (specifically, "483") to return in any refusal. Then, if I detect that in the output, I fail over to another model+provider.
I also have a few heuristics (e.g. "I can't translate" in many different languages) to detect if it deviates from that.
I also have a few heuristics (e.g. "I can't translate" in many different languages) to detect if it deviates from that.
It works pretty well.
[0] https://nuenki.app