As someone who has managed many public Ceph clusters at Linode (now Akamai) since 2016 for both block and object storage, I wish the Hetzner engineers good luck!
There are a _lot_ of challenges in keeping the clusters secure, reliable, and performant. Make sure you have systems or tools in place to prevent abuse. Be aware of the little nuances of Ceph, like what time lifecycle policies kick off, or when dynamic bucket resharding will kick in (and block client writes!).
If possible, conduct extensive failure testing in a lab environment under simulated load to see how your clusters will really behave when it eventually happens. Triple check all of your tunables and your pool configuration. Some things like erasure coding profiles are set in stone, and once you have customer data on your clusters, there is no turning back.
There are a _lot_ of challenges in keeping the clusters secure, reliable, and performant. Make sure you have systems or tools in place to prevent abuse. Be aware of the little nuances of Ceph, like what time lifecycle policies kick off, or when dynamic bucket resharding will kick in (and block client writes!).
If possible, conduct extensive failure testing in a lab environment under simulated load to see how your clusters will really behave when it eventually happens. Triple check all of your tunables and your pool configuration. Some things like erasure coding profiles are set in stone, and once you have customer data on your clusters, there is no turning back.