Skip to content

nsutil.UnmountNS: CNI-created network namespace not cleaned up for short-lived jobs #25610

@jonasdemoor

Description

@jonasdemoor

Nomad version

Nomad v1.8.11+ent
BuildDate 2025-03-11T09:23:02Z
Revision f1d10f7f43b943002a505307ae896f8176c038e4+CHANGES

Operating system and Environment details

$ cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ uname -a
Linux nomadclndev03 6.1.0-32-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.129-1 (2025-03-06) x86_64 GNU/Linux
$ podman version
Client:       Podman Engine
Version:      5.2.2
API Version:  5.2.2
Go Version:   go1.23.1
Built:        Tue Sep 17 17:43:50 2024
OS/Arch:      linux/amd64
$ apt-cache policy nomad-driver-podman 
nomad-driver-podman:
  Installed: 0.6.2-1
  Candidate: 0.6.2-1
  Version table:
 *** 0.6.2-1 500
        500 http://aptly.ugent.be hashicorp/bookworm amd64 Packages
        100 /var/lib/dpkg/status
     0.6.1-1 500
        500 http://aptly.ugent.be hashicorp/bookworm amd64 Packages
     0.6.0-1 500
        500 http://aptly.ugent.be hashicorp/bookworm amd64 Packages

Issue

We are experiencing a race condition issue when using short-lived workloads with CNI. Specifically, nsutil.unmountNS

func UnmountNS(nsPath string) error {
fails during the garbage collection (GC) process due to the target network namespace being marked as busy.

This issue is only triggered when:

  • CNI networking is enabled (network { mode = "cni/private" })
  • The workload completes very quickly (e.g., exit 0 immediately)

If either of the following changes are made, the issue is no longer reproducible:

  • Removing the CNI configuration (i.e., not setting network.mode)
  • Making the job run slightly longer (e.g., using sleep 45)

We are currently mitigating this by adding an artificial delay (sleep 45) to our short-lived batch jobs. However, a more robust resolution would be ideal. Let us know if further logs or traces are helpful.

Reproduction steps

  1. Submit the job definition provided below.
  2. Observe log output on the client.
  3. Run several periodic job allocations.
  4. Within a few runs (typically 3–4), the error occurs.

Expected Result

The short-lived CNI workload terminates cleanly, and GC proceeds without errors - specifically, the container and its associated network namespace should be removed without hitting device or resource busy.

Actual Result

GC fails to unmount the network namespace with a device or resource busy error, resulting in noisy logs and potentially resource leaks.

Job file (if appropriate)

job "cni-debug" {
  type        = "batch"
  namespace   = "default"
  datacenters = ["S10"]

  periodic {
    crons            = ["*/1 * * * * *"]
    prohibit_overlap = false
  }

 # Pin on host for easier testing
  constraint {
    attribute = "${attr.unique.hostname}"
    value     = "hostname"
  }

  group "debug" {
    restart {
      attempts = 0
      mode     = "fail"
    }

    reschedule {
      attempts  = 0
      unlimited = false
    }

    count = 1

    network {
      mode = "cni/private"
    }

    task "sleep" {
      driver = "podman"

      config {
        image   = "busybox:latest"
        command = "/bin/sh"
        args    = ["-c", "exit 0"]
      }
    }
  }
}

Nomad Server logs (if appropriate)

N/A

Nomad Client logs (if appropriate)

2025-04-07T11:24:18.681+0200 [ERROR] client.alloc_runner: postrun failed: alloc_id=66f88f81-ea9c-15ea-c113-068bf09aca81 error="hook \"network\" failed: failed to unmount NS: at /var/run/netns/66f88f81-ea9c-15ea-c113-068bf09aca81: device or resource busy"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions