Comment by pylotlight

Isn't that the exact problem that k8s workloads solve by scaling onto new nodes first etc? No downtime required.

loloquwowndueo 1 day ago

Right but incus is not k8s. You can stand up spares and switch traffic, but it’s not built in functionality and requires extra orchestration.

goku12 1 day ago

It is a built-in functionality [1] and requires no extra orchestration. In a cluster setup, you would be using virtualized storage (ceph based) and virtualized network (ovn). You can replace a container/VM on one host with another on a different host with the same storage volumes, network and address. This is what k8s does with pod migrations too (edit: except the address).

There are a couple of differences though. The first is the pet vs cattle treatment of containers by Incus and k8s respectively. Incus tries to resurrect dead containers as faithfully as possible. This means that Incus treats container crashes like system crashes, and its recovery involves systemd bootup inside the container (kernel too in case of VMs). This is what accounts for the delay. K8s on the other hand, doesn't care about dead containers/pods at all. It just creates another pod, likely with a different address and expects it to handle the interruption.

Another difference is the orchestration mechanism behind this. K8s, as you may be aware, uses control loops on controller nodes to detect the crash and initiate the recovery. The recovery is mediated by the kubelets on the worker nodes. Incus seems to have the orchestrator on all nodes. They take decisions based on consensus and manage the recovery process themselves.

[1] https://linuxcontainers.org/incus/docs/main/howto/cluster_ma...

mdaniel 1 day ago

> and address. This is what k8s does with pod migrations too.

That's not true of Pods; each Pod has its own distinct network identity. You're correct about the network, though, since AFAIK Service and Pod CIDR are fixed for the lifespan of the k8s cluster

You spoke to it further down, but guarded it with "likely" and I can say with certainty that it's not likely, it unconditionally does. That's not to say address re-use isn't possible over a long enough time horizon, but that bookkeeeping is delegated to the CNI

---

Your "dead container" one also has some nuance, in that kubelet will for sure restart a failed container, in place, with the same network identity. When fresh identity comes into play is if the Node fails, or the control loop determines something in the Pod's configuration has changed (env-vars, resources, scheduling constraints, etc) in which case it will be recreated, even if by coincidence on the same Node

moondev 1 day ago

> I can say with certainty that it's not likely, it unconditionally does. That's not to say address re-use isn't possible over a long enough time horizon, but that bookkeeeping is delegated to the CNI

You are 100% wrong then. The kube-ovn CNI enables static address assignment and "sticky" IPAM on both pods and kubevirt vms.

https://kubeovn.github.io/docs/v1.12.x/en/guide/static-ip-ma...

mdaniel 1 day ago

Heh, I knew I was going to get in trouble since the CNI could do whatever it likes, but felt safe due to Pods having mostly random identities. But at that second I had forgotten about StatefulSets, which I agree with your linked CNI's opinion would actually be a great candidate for static address assignment

Sorry for the lapse and I'll try to be more careful when using "unconditional" to describe pluggable software

moondev 1 day ago

All good and i'll cheers you on the composability of k8s for sure

goku12 1 day ago

I agree with everything you pointed out. They were what I had in my mind too. However, I avoided those points on purpose for the sake of brevity. It was getting too long winded and convoluted for my liking. Thanks for adding a separate clarification, though.

This item has no comments currently.