I made a tool which translates sentences as you browse, for immersion[0]. I solved this by giving the model a code (specifically, "483") to return in any refusal. Then, if I detect that in the output, I fail over to another model+provider.
I also have a few heuristics (e.g. "I can't translate" in many different languages) to detect if it deviates from that.
It works pretty well.
It can be as simple as discuss one’s own religion