Skip to content

Conversation

@mismithhisler
Copy link
Member

@mismithhisler mismithhisler commented Apr 21, 2025

Description

Nomad was holding open the namespace file handle, which would eventually get cleaned up by garbage collection. In batch jobs that ran very quickly, it was possible the leaked file handle was still open, and causing errors when attempting to unmount the namespace.

In addition to closing the file handle, we can use a MNT_DETACH flag when unmounting to ensure that in the event a namespace file handle is left open, the namespace is still unmounted eventually and no resources are leaked.

Fixes GH#25610

Testing & Reproduction steps

See GH#25610 for reproduction steps. In addition, you can see Nomad accessing the namespace file via shimming in an exec.Command("fuser", -v, nsPath) right before unmounting. This can be reproduced in Podman and exec2.

Links

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.

@mismithhisler mismithhisler self-assigned this Apr 21, 2025
@mismithhisler mismithhisler requested review from a team as code owners April 21, 2025 15:18
tgross
tgross previously approved these changes Apr 21, 2025
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice work on this

@mismithhisler mismithhisler requested a review from tgross April 21, 2025 20:08
@mismithhisler mismithhisler added backport/1.10.x backport to 1.10.x release line backport/ent/1.9.x+ent Changes are backported to 1.9.x+ent backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent labels Apr 21, 2025
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mismithhisler mismithhisler merged commit 6036ab8 into main Apr 21, 2025
36 of 38 checks passed
@mismithhisler mismithhisler deleted the f-close-ns-file-handle branch April 21, 2025 20:25
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 20, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/ent/1.9.x+ent Changes are backported to 1.9.x+ent backport/1.10.x backport to 1.10.x release line

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nsutil.UnmountNS: CNI-created network namespace not cleaned up for short-lived jobs

2 participants