Commit 0405046
rds: ib: Add cm_id generation scheme in order to detect new ones
In rds_rdma_cm_event_handler_cmn() we have a simple address compare of
the supplied cm_id and the one stored in the IB Connection struct, ic:
if (ic->i_cm_id != cm_id) {
But this address compare is flaky. Although the address is the same,
the cm_id per se, does not need to be the same.
The cm_id may be destroyed and re-created. Since we here talk about
free() + kzalloc(), there is a fair probability that the new cm_id
ends up at the very same address as the old one.
If that happens, there is a dissonance between the RDS IB Connection
state and the state of the cm_id.
This bug used to manifest itself as a NULL ptr deref in
rds_rdma_cm_event_handler_cmn(), but that was before commit
a71a60c ("rds: ib: Remove incorrect update of the path record sl
and qos_class fields").
Kernels with a71a60c may observe the following crash:
BUG: kernel NULL pointer dereference, address: 0000000000000034
RIP: 0010:dma_direct_unmap_sg+0x5a/0x220
Call Trace:
rds_ib_recv_cqe_handler+0xed/0x395 [rds_rdma]
poll_rcq+0x72/0xa0 [rds_rdma]
rds_ib_rx+0xb9/0x2b0 [rds_rdma]
rds_ib_tasklet_fn_recv+0x2f/0x40 [rds_rdma]
tasklet_action_common+0x153/0x240
handle_softirqs+0xe4/0x2ac
__irq_exit_rcu+0xab/0xd0
common_interrupt+0x85/0xa0
Drgn analyses of the crash:
>>> trace = prog.crashed_thread().stack_trace()
>>> trace
#0 sg_dma_is_bus_address (./include/linux/scatterlist.h:297:11)
#1 dma_direct_unmap_sg (kernel/dma/direct.c:452:7)
#2 ib_dma_unmap_sg_attrs (./include/rdma/ib_verbs.h:4232:3)
#3 ib_dma_unmap_sg (./include/rdma/ib_verbs.h:4294:2)
#4 rds_ib_recv_cqe_handler (net/rds/ib_recv.c:1397:2)
#5 poll_rcq (net/rds/ib_cm.c:777:4)
#6 rds_ib_rx (net/rds/ib_cm.c:870:2)
#7 rds_ib_recv_cb (net/rds/ib_cm.c:916:2)
#8 rds_ib_tasklet_fn_recv (net/rds/ib_cm.c:925:2)
#9 tasklet_action_common (kernel/softirq.c:795:7)
#10 handle_softirqs (kernel/softirq.c:561:3)
#11 __do_softirq (kernel/softirq.c:595:2)
#12 invoke_softirq (kernel/softirq.c:435:3)
#13 __irq_exit_rcu (kernel/softirq.c:644:3)
#14 common_interrupt (arch/x86/kernel/irq.c:280:1)
#15 asm_common_interrupt+0x26/0x2b (./arch/x86/include/asm/idtentry.h:693)
#16 cpuidle_enter_state (drivers/cpuidle/cpuidle.c:288:5)
#17 cpuidle_enter (drivers/cpuidle/cpuidle.c:385:9)
#18 cpuidle_idle_call (kernel/sched/idle.c:230:19)
#19 do_idle (kernel/sched/idle.c:326:4)
#20 cpu_startup_entry (kernel/sched/idle.c:424:3)
#21 rest_init (init/main.c:747:2)
#22 start_kernel (init/main.c:1105:2)
#23 x86_64_start_reservations (arch/x86/kernel/head64.c:507:2)
#24 x86_64_start_kernel (arch/x86/kernel/head64.c:488:2)
#25 secondary_startup_64+0x15d/0x15f (arch/x86/kernel/head_64.S:413)
>>> trace[4]["ic"].i_recvs[0]
(struct rds_ib_recv_work){
.r_ibinc = (struct rds_ib_incoming *)0x0,
.r_frag = (struct rds_page_frag *)0x0,
.r_wr = (struct ib_recv_wr){
.next = (struct ib_recv_wr *)0x0,
.wr_id = (u64)0,
.wr_cqe = (struct ib_cqe *)0x0,
.sg_list = (struct ib_sge *)0xffffb91c1a5b5030,
.num_sge = (int)5,
},
.r_sge = (struct ib_sge [8]){
{
.addr = (u64)33819643904,
.length = (u32)48,
.lkey = (u32)18181,
},
{
.addr = (u64)0, <=== Note
.length = (u32)4096,
.lkey = (u32)18181,
},
{
.addr = (u64)0, <=== Note
.length = (u32)4096,
.lkey = (u32)18181,
},
{
.addr = (u64)0, <=== Note
.length = (u32)4096,
.lkey = (u32)18181,
},
{
.addr = (u64)0, <=== Note
.length = (u32)4096,
.lkey = (u32)18181,
},
},
.r_ic = (struct rds_ib_connection *)0x0,
.r_posted = (int)0,
}
The prosaic version: We do get an event and processes a recv CQE, but
when performing DMA unmap, the addresses for SGE entry 1-4 are all
zero. That explains the crash.
If you look at rds_ib_recv_init_ring(), the state will be exactly what
we see above when this function has been called. So, how can this
happen?
We do get an RDMA_CM_EVENT_ROUTE_RESOLVED event with a new cm_id which
happens to have the same address as the one stored in the RDS IB
Connection (ic). Now, without a71a60c, we do not crash in
rds_rdma_cm_event_handler_cmn() but calls:
trans->cm_initiate_connect()
rds_ib_cm_initiate_connect()
rds_ib_recv_init_ring()
and the latter does:
sge = recv->r_sge;
sge->addr = ic->i_recv_hdrs_dma[i];
sge->length = sizeof(struct rds_header);
sge->lkey = ic->i_mr->lkey;
for (j = 1; j < num_send_sge; j++) {
sge = recv->r_sge + j;
sge->addr = 0; <==== Note
sge->length = PAGE_SIZE;
sge->lkey = ic->i_mr->lkey;
}
This fits 100% with the r_sge pasted above.
We fix this bug by introducing a generation number and embed that in
the last three bits of cm_id->context, since these bits are zero due
to alignment requirement.
When we use cm_id->context to find the RDS Connection (conn), we mask
away the three lower bits. When we create the cm_id, we insert the
next generation and make a copy of it in the ic.
In rds_rdma_cm_event_handler_cmn(), when we have established that the
two addresses are equal, we add an additional check to see if the
cm_id generation is the same between the rdma_cm supplied cm_id and
the generation stored in ic. If the generations differ, we increment
the s_ib_cm_id_resurrected statistics counter and return.
When handling an incoming connect, we set the cm_id generation count
stored in ic to zero.
Orabug: 38105083
Fixes: 3ae09d7 ("net/rds: Don't block workqueues "cma_wq" and "cm.wq"")
Signed-off-by: Håkon Bugge <[email protected]>
Reviewed-by: Sharath Srinivasan <[email protected]>
Signed-off-by: Brian Maly <[email protected]>1 parent 047fcbb commit 0405046
4 files changed
+31
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
95 | 100 | | |
96 | 101 | | |
97 | 102 | | |
| |||
358 | 363 | | |
359 | 364 | | |
360 | 365 | | |
| 366 | + | |
361 | 367 | | |
362 | 368 | | |
363 | 369 | | |
| |||
629 | 635 | | |
630 | 636 | | |
631 | 637 | | |
| 638 | + | |
632 | 639 | | |
633 | 640 | | |
634 | 641 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1836 | 1836 | | |
1837 | 1837 | | |
1838 | 1838 | | |
| 1839 | + | |
1839 | 1840 | | |
1840 | 1841 | | |
1841 | 1842 | | |
| |||
2214 | 2215 | | |
2215 | 2216 | | |
2216 | 2217 | | |
2217 | | - | |
| 2218 | + | |
2218 | 2219 | | |
2219 | 2220 | | |
2220 | 2221 | | |
| |||
2419 | 2420 | | |
2420 | 2421 | | |
2421 | 2422 | | |
2422 | | - | |
2423 | | - | |
| 2423 | + | |
| 2424 | + | |
| 2425 | + | |
2424 | 2426 | | |
2425 | 2427 | | |
2426 | 2428 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| |||
98 | 98 | | |
99 | 99 | | |
100 | 100 | | |
| 101 | + | |
101 | 102 | | |
102 | 103 | | |
103 | 104 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
| 55 | + | |
55 | 56 | | |
56 | 57 | | |
57 | 58 | | |
| |||
110 | 111 | | |
111 | 112 | | |
112 | 113 | | |
| 114 | + | |
113 | 115 | | |
114 | 116 | | |
115 | 117 | | |
| |||
198 | 200 | | |
199 | 201 | | |
200 | 202 | | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
201 | 211 | | |
202 | 212 | | |
203 | 213 | | |
| |||
388 | 398 | | |
389 | 399 | | |
390 | 400 | | |
391 | | - | |
| 401 | + | |
| 402 | + | |
392 | 403 | | |
393 | 404 | | |
394 | 405 | | |
| |||
397 | 408 | | |
398 | 409 | | |
399 | 410 | | |
400 | | - | |
| 411 | + | |
| 412 | + | |
401 | 413 | | |
402 | 414 | | |
403 | 415 | | |
| |||
408 | 420 | | |
409 | 421 | | |
410 | 422 | | |
411 | | - | |
| 423 | + | |
412 | 424 | | |
413 | 425 | | |
414 | 426 | | |
415 | 427 | | |
416 | 428 | | |
417 | | - | |
| 429 | + | |
418 | 430 | | |
419 | 431 | | |
420 | 432 | | |
| |||
424 | 436 | | |
425 | 437 | | |
426 | 438 | | |
| 439 | + | |
427 | 440 | | |
428 | 441 | | |
429 | 442 | | |
| |||
0 commit comments