-
Notifications
You must be signed in to change notification settings - Fork 10
[LTS 9.4] CVE-2025-21786 #406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The "draft" status is only to prevent accidental merge, the PR is ready for review. |
|
The reason the pwq refcount was able to hit zero was because the initial pwq reference was put in So for the issue to occur, the following must happen at around the same time:
This can be triggered under high memory pressure while writing to I don't think we should bother picking this, since the Fixes commit was introduced in 6.11 and wasn't backported to any stable kernels. The CVE fix itself is only present on 6.12+ kernels upstream, so I think it's safe to say we don't need to bother with this. |
|
Thanks @kerneltoast for adding more light to this issue |
|
Closing as not applicable thank you @pvts-mat and @kerneltoast |
Use BPF_TRAMP_F_INDIRECT flag to detect struct ops and emit proper prologue and epilogue for this case. With this patch, all of the struct_ops related testcases (except struct_ops_multi_pages) passed on LoongArch. The testcase struct_ops_multi_pages failed is because the actual image_pages_cnt is 40 which is bigger than MAX_TRAMP_IMAGE_PAGES. Before: $ sudo ./test_progs -t struct_ops -d struct_ops_multi_pages ... WATCHDOG: test case struct_ops_module/struct_ops_load executes for 10 seconds... After: $ sudo ./test_progs -t struct_ops -d struct_ops_multi_pages ... #15 bad_struct_ops:OK ... #399 struct_ops_autocreate:OK ... #400 struct_ops_kptr_return:OK ... #401 struct_ops_maybe_null:OK ... #402 struct_ops_module:OK ... #404 struct_ops_no_cfi:OK ... #405 struct_ops_private_stack:SKIP ... #406 struct_ops_refcounted:OK Summary: 8/25 PASSED, 3 SKIPPED, 0 FAILED Signed-off-by: Tiezhu Yang <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
As arm64 JIT now supports timed may_goto instruction, make sure all relevant tests run on this architecture. Some tests were enabled and other required modifications to work properly on arm64. $ ./test_progs -a "stream*","*may_goto*",verifier_bpf_fastcall #404 stream_errors:OK [...] #406/2 stream_success/stream_cond_break:OK [...] #494/23 verifier_bpf_fastcall/may_goto_interaction_x86_64:SKIP #494/24 verifier_bpf_fastcall/may_goto_interaction_arm64:OK [...] #539/1 verifier_may_goto_1/may_goto 0:OK #539/2 verifier_may_goto_1/batch 2 of may_goto 0:OK #539/3 verifier_may_goto_1/may_goto batch with offsets 2/1/0:OK #539/4 verifier_may_goto_1/may_goto batch with offsets 2/0:OK #539 verifier_may_goto_1:OK #540/1 verifier_may_goto_2/C code with may_goto 0:OK #540 verifier_may_goto_2:OK Summary: 7/16 PASSED, 25 SKIPPED, 0 FAILED Signed-off-by: Puranjay Mohan <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Acked-by: Xu Kuohai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
Puranjay Mohan says: ==================== bpf, arm64: support for timed may_goto Changes in v2->v3: v2: https://lore.kernel.org/all/[email protected]/ - Rebased on bpf-next/master - Added Acked-by: tags from Xu and Kumar Changes in v1->v2: v1: https://lore.kernel.org/bpf/[email protected]/ - Added comment in arch_bpf_timed_may_goto() about BPF_REG_FP setup (Xu Kuohai) This set adds support for the timed may_goto instruction for the arm64. The timed may_goto instruction is implemented by the verifier by reserving 2 8byte slots in the program stack and then calling arch_bpf_timed_may_goto() in a loop with the stack offset of these two slots in BPF_REG_AX. It expects the function to put a timestamp in the first slot and the returned count in BPF_REG_AX is put into the second slot by a store instruction emitted by the verifier. arch_bpf_timed_may_goto() is special as it receives the parameter in BPF_REG_AX and is expected to return the result in BPF_REG_AX as well. It can't clobber any caller saved registers because verifier doesn't save anything before emitting the call. So, arch_bpf_timed_may_goto() is implemented in assembly so the exact registers that are stored/restored can be controlled (BPF caller saved registers here) and it also needs to take care of moving arguments and return values to and from BPF_REG_AX <-> arm64 R0. So, arch_bpf_timed_may_goto() acts as a trampoline to call bpf_check_timed_may_goto() which does the main logic of placing the timestamp and returning the count. All tests that use may_goto instruction pass after the changing some of them in patch 2 #404 stream_errors:OK [...] #406/2 stream_success/stream_cond_break:OK [...] #494/23 verifier_bpf_fastcall/may_goto_interaction_x86_64:SKIP #494/24 verifier_bpf_fastcall/may_goto_interaction_arm64:OK [...] #539/1 verifier_may_goto_1/may_goto 0:OK #539/2 verifier_may_goto_1/batch 2 of may_goto 0:OK #539/3 verifier_may_goto_1/may_goto batch with offsets 2/1/0:OK #539/4 verifier_may_goto_1/may_goto batch with offsets 2/0:OK #539 verifier_may_goto_1:OK #540/1 verifier_may_goto_2/C code with may_goto 0:OK #540 verifier_may_goto_2:OK Summary: 7/16 PASSED, 25 SKIPPED, 0 FAILED ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
[LTS 9.4]
CVE-2025-21786
VULN-54096
Problem
https://access.redhat.com/security/cve/CVE-2025-21786
Background
Workqueue system allows user space programs to defer some tasks to be executed asynchronously by the kernel - the "generic async execution mechanism" as expressed in
kernel/workqueue.c's header comment.A piece of work to be executed is called work item. It's represented by a simple struct
work_structcoupling a function defining the job with some additional data:kernel-src-tree/include/linux/workqueue_types.h
Lines 16 to 23 in 7339233
The work items are put through API on the work queues
kernel-src-tree/kernel/workqueue.c
Line 335 in 7339233
The type of different work queues the work items can be put on determine how they will be executed.
From there they are distributed to internal pool work queues
kernel-src-tree/kernel/workqueue.c
Line 256 in 7339233
where they await execution by the kernel threads called workers. Those can be easily observed with any process-listing tool like
psortop, eg.The workers are gathered in work pools
kernel-src-tree/kernel/workqueue.c
Line 184 in 7339233
Each work pool has a single pool work queue and zero or more workers associated. Each CPU has two work pools assigned - one for normal work items and the other for high priority ones. Apart from CPU-bound pools there are also unbound work pools (with unbound work queues mentioned in the CVE), the number of which is dynamic. (This variety of work pools exists for balancing the tradeoff between having high locality of execution (and thus efficiency) for the CPU-bound work pools and much simpler load balancing with the unbound ones.)
It's possible for the work items in a work pool to become deadlocked. For this reason the work queue contains a rescue worker
kernel-src-tree/kernel/workqueue.c
Line 348 in 7339233
which can pick up any work item from the work pool, break the deadlock and push execution forward. The rescuer's thread function
rescuer_threadis the subject of the CVE's fix e769461 in the mainline kernel.Analysis
The bug
Following the KASAN logs from https://lore.kernel.org/lkml/CAKHoSAvP3iQW+GwmKzWjEAOoPvzeWeoMO0Gz7Pp3_4kxt-RMoA@mail.gmail.com/ it can be seen that the use-after-free scenario unfolded as follows:
The rescuer thread released the pool workqueue with
put_pwq(…)atkernel-src-tree/kernel/workqueue.c
Line 3516 in d40797d
It was sure
kernel-src-tree/kernel/workqueue.c
Lines 3513 to 3514 in d40797d
worker_detach_from_pool(…)call atkernel-src-tree/kernel/workqueue.c
Line 3526 in d40797d
Simultaneously, some regular worker from the same pool released it as well
at
kernel-src-tree/kernel/kthread.c
Line 844 in d40797d
The pool workqueue, guarded by he Read-Copy-Update mechanism, was destroyed soon after by the idle thread 0, along with its worker pool:
The rescuer thread continued execution, hitting the
worker_detach_from_pool(…)call, which attempted to remove the rescuer worker from the workers list on the pool which no longer existedSee
kernel-src-tree/kernel/workqueue.c
Line 2709 in d40797d
list_del's implementation inkernel-src-tree/include/linux/list.h
Lines 193 to 197 in d40797d
The fix
The core of the fix is moving the
put_pwq(…)call after theworker_detach_from_pool(…)call to ensure the pool's ref count remains greater than zero at the moment of detaching the rescuer from it. Before:kernel-src-tree/kernel/workqueue.c
Lines 3512 to 3528 in d40797d
After:
kernel-src-tree/kernel/workqueue.c
Lines 3519 to 3535 in e769461
Although the moved function changed to
put_pwq_unlocked(…), it's actually the sameput_pwq(…)call, but wrapped in theraw_spin_lock_irq(…)/raw_spin_unlock_irq(…)pairkernel-src-tree/kernel/workqueue.c
Lines 1662 to 1664 in e769461
This can be seen even more clearly in the original proposition of the fix given by Tejun Heo in the mailing list https://lore.kernel.org/lkml/[email protected]/:
This wrapping was not necessary before because the
pool->lockwas already being held at the time ofput_pwq(pwq)call, seekernel-src-tree/kernel/workqueue.c
Line 3473 in d40797d
Applicability: no
The affected file
kernel/workqueue.cis unconditionally compiled into every kernelkernel-src-tree/kernel/Makefile
Line 9 in 499f93a
so it's part of any LTS 9.4 build regardless of the configuration used.
However, the CVE-2025-21786 bug fixed by e769461 patch does not apply to the code found under
ciqlts9_4revision and applying the patch, while not harmful on the functional level, shouldn't be done. The arguments are listed below.The "fixes" commit is missing from the LTS 9.4 history
The e769461 fix mentions 68f8305 as the commit introducing the bug and it's missing from LTS 9.4 history of
kernel/workqueue.c, neither was it backported - seeworkqueue-history.txt.
Commit's e769461 message explicitly blames changes introduced in 68f8305:
The "code waiting for the rescuer" removed in 68f8305 is present in the
ciqlts9_4revision:kernel-src-tree/kernel/workqueue.c
Lines 3720 to 3721 in 389d406
The
put_pwq(…)call is not placed randomlyExamining git history shows that the authors of the workqueue mechanism - Lai Jiangshan and Tejun Heo - took great care to place the grab/put functions in proper places. See commit 77668c8 which introduced the
put_pwq(…)call(In fact, this commit pre-emptively fixed the CVE-2023-1281 bug (not a CVE back then) which only re-surfaced after the 68f8305 commit - It addresses the same problem.)
Commit 13b1d62, in turn, dealt with the placement of
worker_detach_from_pool(…)call and explicitly related it to theput_pwq(…)call:It's only the "put_unbound_pool() will wait for it to detach" part which turned false after the introduction of 68f8305 which, again, was not done in LTS 9.4.
Using the patched version is not without any cost
From the short bug and fix analysis it should be rather clear (hopefully), that applying the CVE-2025-21786 patch is just a matter of holding a reference a little longer. It could therefore not hurt to apply the patch "just in case". However, putting aside the nevertheless nonzero degree of uncertainty around the harmlessness of this treatment, doing it requires unnecessary locking / unlocking of the
&pwq->pool->lockaroundput_pwq(pwq)call (see the fix). In general it's always better to avoid unnecessary locks, as they hurt performance and can introduce deadlocking problems not present before.RedHat's "Affected" classification doesn't hold much weight
The counter-argument to not backporting the patch can be RedHat listing "Red Hat Enterprise Linux 9" as "Affected" on the CVE-2025-21786 bug's page https://access.redhat.com/security/cve/CVE-2025-21786.
However, RH's "Affected" may in actuality mean either "affected, confirmed" or "not investigated yet":
This stands in contrast to "not affected" classification which actually means "not affected, confirmed" only.