Thanks for the feedback, very interesting.
In the meantime I've seen this model https://huggingface.co/FPHam/Karen_TheEditor_V2_STRICT_Mistr... , I would be interested how it compares. (though it seems specifically fine-tuned for american english)
In the meantime I've seen this model https://huggingface.co/FPHam/Karen_TheEditor_V2_STRICT_Mistr... , I would be interested how it compares. (though it seems specifically fine-tuned for american english)
We use about 4-6 calls per improvement and use a mix of Anthropic and OpenAI. Interestingly we really couldn't get sufficiently good performance from just one model. It's interesting how they can be good or bad at different tasks where one task doesn't seem materially harder than the other.