cgroup: devices updates appear to be broken

This affects both versions, but in quite different ways:

* [x] For cgroupv1, #2205 highlighted that on device cgroup updates, we temporarily block all devices. This results in spurious errors in the container (such as programs being unable to open `/dev/null`). We've seen this happen on customer systems under Kubernetes, so this is definitely a real issue.
  + This is actually a more complicated issue than it first appears because `runc` actually incorrectly implements the spec here -- technically `runc` actually is a black-list by default and users have to convert `runc` to be a white-list. Aside from not following the spec this is a worrying security stance.

* [ ] For cgroupv2, devices cgroup updates are implemented by appending a new BPF program to the cgroup. This means that only new denials have an effect, and thus it's incorrectly implemented. (EDIT: This also means that we "leak" eBPF programs and thus after 64+ applications we start getting errors -- see #2474.)
  + ~~Unfortunately this is a bit complicated to fix, but I have figured out how to do it. We need to make an eBPF map of type `BPF_MAP_TYPE_PROG_ARRAY`and then tail-call into it in a small stub eBPF program which we attach to the actual cgroup. This which will allow us to atomically update the devices cgroup rules (there is no way to atomically replace an eBPF program with `BPF_F_ALLOW_MULTI` -- and without any program, all device accesses would be permitted).~~
  + Ignore the above -- you cannot `bpf_tail_call` from cgroup programs. So we will need to instead implement it through an eBPF map (which we can atomically replace by mis-using `BPF_MAP_TYPE_ARRAY_OF_ARRAY`).
  + This is all slightly complicated by the fact that the entire API is fd-based (and we don't have our own monitor process so we can't stash away the fd). But luckily there is a lookup-by-id system which we can use to get the file descriptor, though the ids can be recycled so we'll need to be careful to make sure we don't start touching the wrong eBPF map -- and unlike eBPF programs, eBPF maps don't store information about when they were created.

Part of #2315.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cgroup: devices updates appear to be broken #2366

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cgroup: devices updates appear to be broken #2366

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions