Skip to content

Conversation

@kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Jul 25, 2025

To this day, cgroup managers did not have an ability to add a process
to an existing cgroup. One might use cgroups.WriteCgroupProc, but
this is cgroupfs operation and does not take into account systemd.

Let's introduce AddPid, which will add a process, identified by PID,
to the cgroup (or, optionally, a sub cgroup) of the manager.

Implementation for systemd requires coreos/go-systemd#458 (in coreos/[email protected]) and systemd >= v238.

Currently a draft until coreos/go-systemd#458 is merged and released.

Being tested in opencontainers/runc#4822.

@kolyshkin kolyshkin changed the title Add pid Implement AddPid for cgroup managers Jul 25, 2025
@kolyshkin kolyshkin force-pushed the add-pid branch 2 times, most recently from 42cf963 to e434adf Compare July 25, 2025 23:49
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jul 26, 2025
The main benefit here is when we are using a systemd cgroup driver,
we actually ask systemd to add a PID, rather than doing it ourselves.
This way, we can add rootless exec PID to a cgroup.

The implementation requires
opencontainers/cgroups#26.

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jul 26, 2025
The main benefit here is when we are using a systemd cgroup driver,
we actually ask systemd to add a PID, rather than doing it ourselves.
This way, we can add rootless exec PID to a cgroup.

The implementation requires
opencontainers/cgroups#26.

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jul 26, 2025
The main benefit here is when we are using a systemd cgroup driver,
we actually ask systemd to add a PID, rather than doing it ourselves.
This way, we can add rootless exec PID to a cgroup.

The implementation requires
opencontainers/cgroups#26.

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jul 27, 2025
The main benefit here is when we are using a systemd cgroup driver,
we actually ask systemd to add a PID, rather than doing it ourselves.
This way, we can add rootless exec PID to a cgroup.

The implementation requires
opencontainers/cgroups#26.

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin kolyshkin force-pushed the add-pid branch 2 times, most recently from e071502 to ba8c4b2 Compare July 28, 2025 22:28
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jul 28, 2025
The main benefit here is when we are using a systemd cgroup driver,
we actually ask systemd to add a PID, rather than doing it ourselves.
This way, we can add rootless exec PID to a cgroup.

The implementation requires
opencontainers/cgroups#26.

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Jul 28, 2025
The main benefit here is when we are using a systemd cgroup driver,
we actually ask systemd to add a PID, rather than doing it ourselves.
This way, we can add rootless exec PID to a cgroup.

The implementation requires
opencontainers/cgroups#26.

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin kolyshkin force-pushed the add-pid branch 2 times, most recently from 3c8e7d5 to b1c847c Compare August 6, 2025 23:34
To this day, cgroup managers did not have an ability to add a process
to an existing cgroup. One might use cgroups.WriteCgroupProc, but
this is cgroupfs operation and does not take into account systemd.

Let's introduce AddPid, which will add a process, identified by PID,
to the cgroup (or, optionally, a sub cgroup) of the manager.

This will allow runtimes like runc to ask systemd to move the process
into a proper unit, instead of using cgroupfs directly.

Implementation for systemd requires [email protected] (see [1])
and systemd >= v238.

[1]: coreos/go-systemd#458
Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin kolyshkin marked this pull request as ready for review August 20, 2025 19:11
@kolyshkin
Copy link
Contributor Author

@AkihiroSuda @cyphar @odinuge @dims PTAL

@kolyshkin kolyshkin added this to the 0.0.5 milestone Aug 20, 2025
@kolyshkin kolyshkin requested a review from dims August 20, 2025 21:24
@kolyshkin
Copy link
Contributor Author

@dims it appears that you're not a part of opencontainers org here on github; I guess that prevents adding you as a maintainer here (despite being listed in MAINTAINERS file). I have no idea who should invite you to the org (I can not), but @caniszczyk might help here.

@kolyshkin
Copy link
Contributor Author

@opencontainers/cgroups-maintainers PTAL 🙏🏻

Copy link
Contributor

@haircommander haircommander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a test could be nice but that doesn't block IMO

@kolyshkin
Copy link
Contributor Author

I think a test could be nice but that doesn't block IMO

We're testing this in runc. It would be awesome to write a proper test suite specific to these packages but I currently don't have time for it.

@kolyshkin
Copy link
Contributor Author

OK, so we have three LGTMs*, so I am merging this.

  • two of those doesn't count by github because @dims and @haircommander are not members of opencontainers.org -- this need to be fixed

@kolyshkin kolyshkin merged commit 05139c1 into opencontainers:main Sep 5, 2025
14 checks passed
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 24, 2025
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP),
if it is available. Since it requires a recent kernel and might not work,
implement a fallback to older way of joining the cgroup.

Based on work done in
 - https://go-review.googlesource.com/c/go/+/417695
 - coreos/go-systemd#458
 - opencontainers/cgroups#26
 - opencontainers#4822

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 24, 2025
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP),
if it is available. Since it requires a recent kernel and might not work,
implement a fallback to older way of joining the cgroup.

Based on work done in
 - https://go-review.googlesource.com/c/go/+/417695
 - coreos/go-systemd#458
 - opencontainers/cgroups#26
 - opencontainers#4822

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 24, 2025
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP),
if it is available. Since it requires a recent kernel and might not work,
implement a fallback to older way of joining the cgroup.

Based on work done in
 - https://go-review.googlesource.com/c/go/+/417695
 - coreos/go-systemd#458
 - opencontainers/cgroups#26
 - opencontainers#4822

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 26, 2025
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP),
if it is available. Since it requires a recent kernel and might not work,
implement a fallback to older way of joining the cgroup.

Based on work done in
 - https://go-review.googlesource.com/c/go/+/417695
 - coreos/go-systemd#458
 - opencontainers/cgroups#26
 - opencontainers#4822

Regarding E2BIG check in shouldRetryWithoutCgroupFD. The clone3 syscall
first appeared in kernel v5.3 via commit [1], which added a check that
if the size of clone_args structure passed from the userspace is larger
than known to kernel, and the "unknown" part contains non-zero values,
E2BIG is returned. A similar check was already used in other similar
scenarios at the time, and later in kernel v5.4, this was generalized by
patch series [2].

[1]: torvalds/linux@7f192e3
[2]: https://lore.kernel.org/all/[email protected]/#r

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 26, 2025
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP),
if it is available. Since it requires a recent kernel and might not work,
implement a fallback to older way of joining the cgroup.

Based on work done in
 - https://go-review.googlesource.com/c/go/+/417695
 - coreos/go-systemd#458
 - opencontainers/cgroups#26
 - opencontainers#4822

Regarding E2BIG check in shouldRetryWithoutCgroupFD. The clone3 syscall
first appeared in kernel v5.3 via commit [1], which added a check that
if the size of clone_args structure passed from the userspace is larger
than known to kernel, and the "unknown" part contains non-zero values,
E2BIG is returned. A similar check was already used in other similar
scenarios at the time, and later in kernel v5.4, this was generalized by
patch series [2].

[1]: torvalds/linux@7f192e3
[2]: https://lore.kernel.org/all/[email protected]/#r

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 26, 2025
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP),
if it is available. Since it requires a recent kernel and might not work,
implement a fallback to older way of joining the cgroup.

Based on:
 - https://go-review.googlesource.com/c/go/+/417695
 - coreos/go-systemd#458
 - opencontainers/cgroups#26
 - opencontainers#4822

Signed-off-by: Kir Kolyshkin <[email protected]>
cyphar pushed a commit to cyphar/runc that referenced this pull request Oct 8, 2025
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP),
if it is available. Since it requires a recent kernel and might not work,
implement a fallback to older way of joining the cgroup.

Based on:
 - https://go-review.googlesource.com/c/go/+/417695
 - coreos/go-systemd#458
 - opencontainers/cgroups#26
 - opencontainers#4822

Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 5af4dd4)
Signed-off-by: Aleksa Sarai <[email protected]>
cyphar pushed a commit to cyphar/runc that referenced this pull request Oct 8, 2025
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP),
if it is available. Since it requires a recent kernel and might not work,
implement a fallback to older way of joining the cgroup.

Based on:
 - https://go-review.googlesource.com/c/go/+/417695
 - coreos/go-systemd#458
 - opencontainers/cgroups#26
 - opencontainers#4822

Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 5af4dd4)
Signed-off-by: Aleksa Sarai <[email protected]>
cyphar pushed a commit to cyphar/runc that referenced this pull request Oct 8, 2025
It makes sense to make runc exec benefit from clone2(CLONE_INTO_CGROUP),
if it is available. Since it requires a recent kernel and might not work,
implement a fallback to older way of joining the cgroup.

Based on:
 - https://go-review.googlesource.com/c/go/+/417695
 - coreos/go-systemd#458
 - opencontainers/cgroups#26
 - opencontainers#4822

Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 5af4dd4)
Signed-off-by: Aleksa Sarai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants