- 
                Notifications
    
You must be signed in to change notification settings  - Fork 2.2k
 
libct/cg: fix an error of cgroup path removal #4520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
36d06e4    to
    fa15928      
    Compare
  
    | 
           @AkihiroSuda Could you please help to see this patch has fixed the issue #4518 or not?  | 
    
| 
           Sorry, I'm on trip to kubecon, can't test this PR right now cc @samiam (the original reporter of the issue)  | 
    
| if os.IsNotExist(err) { | ||
| // Please keep this error eraser, or else it will return ErrNotExist | ||
| // for cgroupv2. | ||
| // Please see https://github.com/opencontainers/runc/issues/4518 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to have a bats test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it’s hard to simulate the similar error.
But maybe we can split this func, and add a go unit test. I don’t know it’s worth to do or not. But we should verify this patch is correct or not at first.
| 
           It appears to work via buildkit. But I'm new to buildkit, so my testing could be flawed. Buildkit changes include changing the Dockerfile and the name of the image to ensure I was using my local one. Build local images: Run the tests again: Who knew removing a directory could be so complicated? 😁  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
          
 Yes, so complicated, because cgroupfs is a pseudo-filesystem, not a real physical fs. Thanks for your check.  | 
    
        
          
                libcontainer/cgroups/utils.go
              
                Outdated
          
        
      | // Please keep this error eraser, or else it will return ErrNotExist | ||
| // for cgroupv2. | ||
| // Please see https://github.com/opencontainers/runc/issues/4518 | ||
| err = nil | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we return nil here? Resetting the err combined with the comment made me go look if the err was used in a defer (or if there was something special); returning nil here makes it more clear it's a regular "we can ignore this".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On that matter, we should do the same below in the infos loop, and also return early if there's an error;
for _, info := range infos {
	if info.IsDir() {
		// We should remove subcgroup first.
		if err := RemovePath(filepath.Join(path, info.Name())); err != nil {
			return err
		}
	}
}That would allow use to remove the if err == nil check outside of the loop, and just;
return rmdir(path, true)❓ Question though; should the infos loop also ignore os.IsNotExist() ?
for _, info := range infos {
	if !info.IsDir() {
		continue
	}
	// We should remove subcgroup first.
	if err := RemovePath(filepath.Join(path, info.Name())); err != nil && !os.IsNotExist(err) {
		return err
	}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question though; should the
infosloop also ignoreos.IsNotExist()?
I think there is no chance we will get ErrNotExist error in here.
On that matter, we should do the same below in the
infosloop, and also return early if there's an error;
👍
80008d7    to
    af8c523      
    Compare
  
    
          
  | 
    
          
 And the error from  Indeed, /sys/fs/cgroup is mounted read-only in the container in question: 
 I'm not quite sure how the cgroup directory is created in the first place in such case, but it's definitely there. If someone can shed some light that would be great. My guess is we need some special handing for such case. It's read only meaning we can't remove it no matter what. But it can be gone if there are no processes, so we can wait. (the readdir logic is obviously wrong and need to be fixed nevertheless)  | 
    
          
 How does runc write the init process pid to a ro cgroup?  | 
    
          
 Yes, it indeed retry for any errors(except ErrNotExist) in ‘release-1.1’, how about sleep and retry in rmdir for all errors except ENOENT? @kolyshkin  | 
    
| 
           Some of my analysis above are wrong. But let's continue the discussion in the issue (#4518).  | 
    
af8c523    to
    d5497de      
    Compare
  
    When fall back to the traditional path walk removal after rmdir, there is an error if the path suddenly gone, we should ignore this ErrNotExist error when we open the cgroup path. Signed-off-by: lfbzhm <[email protected]>
Co-authored-by: Sebastiaan van Stijn <[email protected]> Signed-off-by: lifubang <[email protected]>
If we remove a non-exist dir in a ro mount point, it will return EROFS in `unix.Rmdir`, so we need to check first. Test step: ```bash root@acmubuntu:/opt/bb# mkdir from to root@acmubuntu:/opt/bb# touch from/test root@acmubuntu:/opt/bb# mount -o bind,ro from to root@acmubuntu:/opt/bb# ls to test root@acmubuntu:/opt/bb# touch to/test1 touch: cannot touch 'to/test1': Read-only file system root@acmubuntu:/opt/bb# mkdir to/test1 mkdir: cannot create directory ‘to/test1’: Read-only file system root@acmubuntu:/opt/bb# ls to/test1 ls: cannot access 'to/test1': No such file or directory root@acmubuntu:/opt/bb# rmdir to/test1 rmdir: failed to remove 'to/test1': Read-only file system root@acmubuntu:/opt/bb# ``` Signed-off-by: lifubang <[email protected]>
d5497de    to
    c8ce25e      
    Compare
  
    
For cgroup path removal, when we try a fast way to remove the cgroup
path, there may be a small possibility race to return an error, though
the path was removed successfully, for example: EINTR, or other errors.
Then we will fall back to the traditional path walk removal, but there
is a regression if the path has been sucessfully removed, which was
introduced by d3d7f7d in #4102. We should erase the ErrNotExist error
when we open the cgroup path.
Fix: #4518