Comment by jzebedee - Hacker Neue

jzebedee Oct 7, 2025 parent

It's good to see more open models approaching on-device inference. We need more stops on the fast<>good quality spectrum than just piper and VITS.

My first impressions of it:

* The cloning was decent at imitating voices, but the prosody is quite bad

* There's noticeable crackling in the GGUF models and the quality drop from base model to Q8 was significant

* Q4 models are apparently bugged on platforms outside of Linux

* The speed is nowhere near realtime even using all the latency reductions (Q4 backbone, pre-encoding, ONNX codec decoder), it was still lucky to hit a real-time factor of 4x

> Optimised for on-device deployment - provided in GGML format, ready to run on phones, laptops, or even Raspberry Pis

All of this testing was on a beefy 24 core AMD with 64GiB of RAM. There's no way this model would even come close to realtime on any Pi I know.

This item has no comments currently.