I’m having an issue on since I dist-upgraded a node to Debian Bullseye: the pods on that node are getting killed by kubelet every 5 minutes.
k describe pods
shows this event:
Normal SandboxChanged 31s (x8 over 40m) kubelet Pod sandbox changed, it will be killed and re-created.
And kubelet logs show this:
Dec 10 16:38:16 controller01 kubelet[148695]: I1210 16:38:16.002180 148695 kuberuntime_container.go:714] "Killing container with a grace period" pod="kube-system/metallb-speaker-zdx7b" podUID=b1ae7e94-734d-4bf3-b2c3-ad3c4ced26cf containerName="metallb-speaker" containerID="[containerd://33ad83d0fc2eeace3b0ec8ee938fed2904d318b76a399d5d77e932110284d51](containerd://33ad83d0fc2eeace3b0ec8ee938fed2904d318b76a399d5d77e932110284d51)6" gracePeriod=30
After increasing log level, I was able to find this:
Dec 10 15:41:40 controller01 kubelet[145763]: I1210 15:41:40.050522 145763 cgroup_manager_linux.go:267] "The cgroup has some missing controllers" cgroupName=[kubepods besteffort] controllers=map[cpuset:{}]
Looks like I’m entering this condition: https://github.com/kubernetes/kubernetes/blob/v1.22.4/pkg/kubelet/cm/cgroup_manager_linux.go#L267
And, indeed, the cpuset
cgroup controller seems to disappear after a while for some reason:
==> /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/cgroup.controllers <==
cpuset cpu io memory hugetlb pids rdma
==> /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod2f18b18e_162b_4d41_933d_f819047802ac.slice/cgroup.controllers <==
cpuset cpu io memory hugetlb pids rdma
==> /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb1ae7e94_734d_4bf3_b2c3_ad3c4ced26cf.slice/cgroup.controllers <==
cpuset cpu io memory hugetlb pids rdma
root@controller01 [~] head -n 1 /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/**/cgroup.controllers
==> /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/cgroup.controllers <==
cpuset cpu io memory hugetlb pids rdma
==> /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod2f18b18e_162b_4d41_933d_f819047802ac.slice/cgroup.controllers <==
cpu io memory hugetlb pids rdma
==> /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb1ae7e94_734d_4bf3_b2c3_ad3c4ced26cf.slice/cgroup.controllers <==
cpu io memory hugetlb pids rdma```
I found this post on SO that seems to be related but was removed: [https://web.archive.org/web/20211128193108/https://stackoverflow.com/questions/70118[…]to-linux-5-10-0-9-amd64-debian-bullseye-repeatedly-kills-etcd](https://web.archive.org/web/20211128193108/https://stackoverflow.com/questions/70118302/upgrading-from-to-linux-5-10-0-9-amd64-debian-bullseye-repeatedly-kills-etcd)
I’m running k8s 1.22.4 (“the hard way”) with containerd 1.4.12 on Linux 5.10.70-1 (Debian Bullseye default kernel).
Any idea what could cause that cgroup controller to disappear? The pods scheduled on that node have CPU request but no CPU limit.