-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
We have a containerised application that uses UDP sockets with SO_RCVTIMEO. When it is run as a pod we see that every 10 seconds or so the syscall returns with EINTR. The system is configured with cpumanager policy static and reserved-cpus. When the cpu manger reconcile loop is executed (default every 10 seconds) it tries to set the cpuset and at this time we think that the container is FROZEN and took back to running. At this time we see that EINTR is seen by the container aplication.
systemd --version
systemd 234
+PAM -AUDIT +SELINUX -IMA +APPARMOR -SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 -IDN default-hierarchy=hybrid
sudo cat /var/lib/kubelet/cpu_manager_state
{"policyName":"static","defaultCpuSet":"0-3,6,8-51,54,56-95","entries":{"38e714ed-b6f5-402d-8303-4877245856f8":{"trex":"4,52"},"b516c5cd-2082-45da-9932-dbc7d129129e":{"trex-sriov":"5,7,53,55"}},"checksum":3895732860}
periodic reconcile loop by the cpumanager that updates the cpuset for the containers
2021-07-07T21:10:11.167630+00:00 pool1-dc309-1-wk1-n26 kubelet[24156]: I0707 23:10:11.167565 24156 cpu_manager.go:407] "ReconcileState: ignoring terminated container" pod="kube-system/eric-tm-external-connectivity-frontend-speaker-2krg5" containerID="9d83fc3b4c8e7e8728368f5bf09919a59641393ccf36c72856d5d76f4865e012"
The system is not using cgroup v2
sudo ls /sys/fs/cgroup/cgroup.controllers
ls: cannot access '/sys/fs/cgroup/cgroup.controllers': No such file or directory
The test application that can be used to recreate the issue is iperf2 (https://sourceforge.net/projects/iperf2/) run inside a container.
There are two pods with iperf
server : iperf -s -u -e -i 30 -p 5201 and
client : iperf -c -u -p 5201 -t 300 -i 300 -z -e -b 100pps
If we do an strace we can see EINTR as well as the server will keep creating a new connection when it gets EINTR since server do not handle EINTR.
This issue is seen with runc version v1.0.0-rc91 and above. If we run v1.0.0-rc10 then this issue is never observed. We are aware that there are lot of changes between these two versions.
We did kernel tracing and looks like some one is putting the container to freeze state and at this point the system call gets EINTR and we did a test to see if it is the cpu manager or not. If we change the reconcile period then the EINTR also aligns with the new time and if we do not have cpu manager policy static then the issue is never seen with old or new runc versions. If we swap the versions then the behaviour changes keeping all other settings same. So we are pretty sure that this is some thing runc brings. We have tried with version v1.0.0 and the issue is seen as well.
The strace of the perf server is taken with strace -f -p -o where you can see the EINTR and interrupted system call logs.
Uploading server-trace-non-working.txt…