I've only done some light benchmarking so far. I had it running on a two-core DigitalOcean machine with sustained write load to test for race bugs and it was replicating 1K+ writes per second. But honestly I haven't even tried optimizing the code yet. I'm mainly focused on correctness right now. I would bet it could get a lot faster.
Do you have some benchmark results by any chance? Although it’s a bit of a can of worms, and I would understand if you didn’t want to get into it at this time.