Commit 506f7e9
rds: ib: Add cm_id generation scheme in order to detect new ones
In rds_rdma_cm_event_handler_cmn() we have a simple address compare of
the supplied cm_id and the one stored in the IB Connection struct, ic:
if (ic->i_cm_id != cm_id) {
But this address compare is flaky. Although the address is the same,
the cm_id per se, does not need to be the same.
The cm_id may be destroyed and re-created. Since we here talk about
free() + kzalloc(), there is a fair probability that the new cm_id
ends up at the very same address as the old one.
If that happens, there is a dissonance between the RDS IB Connection
state and the state of the cm_id.
This bug used to manifest itself as a NULL ptr deref in
rds_rdma_cm_event_handler_cmn(), but that was before commit
a71a60c ("rds: ib: Remove incorrect update of the path record sl
and qos_class fields").
Kernels with a71a60c may observe the following crash:
BUG: kernel NULL pointer dereference, address: 0000000000000034
RIP: 0010:dma_direct_unmap_sg+0x5a/0x220
Call Trace:
rds_ib_recv_cqe_handler+0xed/0x395 [rds_rdma]
poll_rcq+0x72/0xa0 [rds_rdma]
rds_ib_rx+0xb9/0x2b0 [rds_rdma]
rds_ib_tasklet_fn_recv+0x2f/0x40 [rds_rdma]
tasklet_action_common+0x153/0x240
handle_softirqs+0xe4/0x2ac
__irq_exit_rcu+0xab/0xd0
common_interrupt+0x85/0xa0
Drgn analyses of the crash:
>>> trace = prog.crashed_thread().stack_trace()
>>> trace
#0 sg_dma_is_bus_address (./include/linux/scatterlist.h:297:11)
#1 dma_direct_unmap_sg (kernel/dma/direct.c:452:7)
#2 ib_dma_unmap_sg_attrs (./include/rdma/ib_verbs.h:4232:3)
#3 ib_dma_unmap_sg (./include/rdma/ib_verbs.h:4294:2)
#4 rds_ib_recv_cqe_handler (net/rds/ib_recv.c:1397:2)
#5 poll_rcq (net/rds/ib_cm.c:777:4)
#6 rds_ib_rx (net/rds/ib_cm.c:870:2)
#7 rds_ib_recv_cb (net/rds/ib_cm.c:916:2)
#8 rds_ib_tasklet_fn_recv (net/rds/ib_cm.c:925:2)
#9 tasklet_action_common (kernel/softirq.c:795:7)
#10 handle_softirqs (kernel/softirq.c:561:3)
#11 __do_softirq (kernel/softirq.c:595:2)
#12 invoke_softirq (kernel/softirq.c:435:3)
#13 __irq_exit_rcu (kernel/softirq.c:644:3)
#14 common_interrupt (arch/x86/kernel/irq.c:280:1)
#15 asm_common_interrupt+0x26/0x2b (./arch/x86/include/asm/idtentry.h:693)
#16 cpuidle_enter_state (drivers/cpuidle/cpuidle.c:288:5)
#17 cpuidle_enter (drivers/cpuidle/cpuidle.c:385:9)
#18 cpuidle_idle_call (kernel/sched/idle.c:230:19)
#19 do_idle (kernel/sched/idle.c:326:4)
#20 cpu_startup_entry (kernel/sched/idle.c:424:3)
#21 rest_init (init/main.c:747:2)
#22 start_kernel (init/main.c:1105:2)
#23 x86_64_start_reservations (arch/x86/kernel/head64.c:507:2)
#24 x86_64_start_kernel (arch/x86/kernel/head64.c:488:2)
#25 secondary_startup_64+0x15d/0x15f (arch/x86/kernel/head_64.S:413)
>>> trace[4]["ic"].i_recvs[0]
(struct rds_ib_recv_work){
.r_ibinc = (struct rds_ib_incoming *)0x0,
.r_frag = (struct rds_page_frag *)0x0,
.r_wr = (struct ib_recv_wr){
.next = (struct ib_recv_wr *)0x0,
.wr_id = (u64)0,
.wr_cqe = (struct ib_cqe *)0x0,
.sg_list = (struct ib_sge *)0xffffb91c1a5b5030,
.num_sge = (int)5,
},
.r_sge = (struct ib_sge [8]){
{
.addr = (u64)33819643904,
.length = (u32)48,
.lkey = (u32)18181,
},
{
.addr = (u64)0, <=== Note
.length = (u32)4096,
.lkey = (u32)18181,
},
{
.addr = (u64)0, <=== Note
.length = (u32)4096,
.lkey = (u32)18181,
},
{
.addr = (u64)0, <=== Note
.length = (u32)4096,
.lkey = (u32)18181,
},
{
.addr = (u64)0, <=== Note
.length = (u32)4096,
.lkey = (u32)18181,
},
},
.r_ic = (struct rds_ib_connection *)0x0,
.r_posted = (int)0,
}
The prosaic version: We do get an event and processes a recv CQE, but
when performing DMA unmap, the addresses for SGE entry 1-4 are all
zero. That explains the crash.
If you look at rds_ib_recv_init_ring(), the state will be exactly what
we see above when this function has been called. So, how can this
happen?
We do get an RDMA_CM_EVENT_ROUTE_RESOLVED event with a new cm_id which
happens to have the same address as the one stored in the RDS IB
Connection (ic). Now, without a71a60c, we do not crash in
rds_rdma_cm_event_handler_cmn() but calls:
trans->cm_initiate_connect()
rds_ib_cm_initiate_connect()
rds_ib_recv_init_ring()
and the latter does:
sge = recv->r_sge;
sge->addr = ic->i_recv_hdrs_dma[i];
sge->length = sizeof(struct rds_header);
sge->lkey = ic->i_mr->lkey;
for (j = 1; j < num_send_sge; j++) {
sge = recv->r_sge + j;
sge->addr = 0; <==== Note
sge->length = PAGE_SIZE;
sge->lkey = ic->i_mr->lkey;
}
This fits 100% with the r_sge pasted above.
We fix this bug by introducing a generation number and embed that in
the last three bits of cm_id->context, since these bits are zero due
to alignment requirement.
When we use cm_id->context to find the RDS Connection (conn), we mask
away the three lower bits. When we create the cm_id, we insert the
next generation and make a copy of it in the ic.
In rds_rdma_cm_event_handler_cmn(), when we have established that the
two addresses are equal, we add an additional check to see if the
cm_id generation is the same between the rdma_cm supplied cm_id and
the generation stored in ic. If the generations differ, we increment
the s_ib_cm_id_resurrected statistics counter and return.
When handling an incoming connect, we set the cm_id generation count
stored in ic to zero.
Orabug: 37799171
Fixes: 3ae09d7 ("net/rds: Don't block workqueues "cma_wq" and "cm.wq"")
Signed-off-by: Håkon Bugge <[email protected]>
Reviewed-by: Sharath Srinivasan <[email protected]>
Signed-off-by: Alok Tiwari <[email protected]>1 parent d24ea44 commit 506f7e9
4 files changed
+31
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
95 | 100 | | |
96 | 101 | | |
97 | 102 | | |
| |||
360 | 365 | | |
361 | 366 | | |
362 | 367 | | |
| 368 | + | |
363 | 369 | | |
364 | 370 | | |
365 | 371 | | |
| |||
648 | 654 | | |
649 | 655 | | |
650 | 656 | | |
| 657 | + | |
651 | 658 | | |
652 | 659 | | |
653 | 660 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1828 | 1828 | | |
1829 | 1829 | | |
1830 | 1830 | | |
| 1831 | + | |
1831 | 1832 | | |
1832 | 1833 | | |
1833 | 1834 | | |
| |||
2153 | 2154 | | |
2154 | 2155 | | |
2155 | 2156 | | |
2156 | | - | |
| 2157 | + | |
2157 | 2158 | | |
2158 | 2159 | | |
2159 | 2160 | | |
| |||
2358 | 2359 | | |
2359 | 2360 | | |
2360 | 2361 | | |
2361 | | - | |
2362 | | - | |
| 2362 | + | |
| 2363 | + | |
| 2364 | + | |
2363 | 2365 | | |
2364 | 2366 | | |
2365 | 2367 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| |||
112 | 112 | | |
113 | 113 | | |
114 | 114 | | |
| 115 | + | |
115 | 116 | | |
116 | 117 | | |
117 | 118 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| 56 | + | |
56 | 57 | | |
57 | 58 | | |
58 | 59 | | |
| |||
111 | 112 | | |
112 | 113 | | |
113 | 114 | | |
| 115 | + | |
114 | 116 | | |
115 | 117 | | |
116 | 118 | | |
| |||
199 | 201 | | |
200 | 202 | | |
201 | 203 | | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
202 | 212 | | |
203 | 213 | | |
204 | 214 | | |
| |||
389 | 399 | | |
390 | 400 | | |
391 | 401 | | |
392 | | - | |
| 402 | + | |
| 403 | + | |
393 | 404 | | |
394 | 405 | | |
395 | 406 | | |
| |||
398 | 409 | | |
399 | 410 | | |
400 | 411 | | |
401 | | - | |
| 412 | + | |
| 413 | + | |
402 | 414 | | |
403 | 415 | | |
404 | 416 | | |
| |||
409 | 421 | | |
410 | 422 | | |
411 | 423 | | |
412 | | - | |
| 424 | + | |
413 | 425 | | |
414 | 426 | | |
415 | 427 | | |
416 | 428 | | |
417 | 429 | | |
418 | | - | |
| 430 | + | |
419 | 431 | | |
420 | 432 | | |
421 | 433 | | |
| |||
425 | 437 | | |
426 | 438 | | |
427 | 439 | | |
| 440 | + | |
428 | 441 | | |
429 | 442 | | |
430 | 443 | | |
| |||
0 commit comments