Skip to content

Commit 87d79f0

Browse files
committed
MDEV-31949 parallel slave xa round-robin distribution
XA-Prepare group of events and its XA-"complete" terminator are made distributed Round-Robin across parallel slave workers. The former hash-based policy was proven to attribute to execution latency being prone to create big - many times larger than the size of the worker pool - queue of binlog-ordered transactions to commit. Log of changes: MDEV-31949 intermediate commit: Made XA RR more robbust - XAC may proceed even to binlog in parallel with its (then earlier) XAP TODO: - convert bing numbers of spin time to a mutex/cond signal - cleanup (e.g generalize is_explicit_XA() to cover the async XAC. CAVEAT: esp with big # of workers XAC spin wait might run out of predefined loops (see the convert of TODO). Cover Brendon's catch of an apparent lack of unpinning by the xid owner XAP. MDEV-31949: intermediate commit fixes XAC is not retrying anymore, and it also waits much more patiently for XAP's xid release todo: to test on env when XID_cache_element::uninitialized() assert fired eso to see any effect of XAC not retrying. MDEV-31949: intermediate commit to let XAC not find xid initially which prevents its retry that might have (had) something to do with lf_hash asserts, segfauls. MDEV-31949 intermediate commit to cover duplicate xid "unlikely" duplicate xids have to be treated with spin-waiting. XA-prepare keeps trying to insert into xid cache until succeeds. Although this is not a duplicate case, XA-commit may do a similar wait for xid from its parent XAP. MDEV-31949 incr commit: XAP1 vs XAP2 Cleanup and improvements over previous Brandon's two commits to replace both. Collapse into XAP1 vs XAP2. XAP_k-i(same_xid) -> XAC_k(same_xid) dependency is covered with marking XAC's rgi context with SPECULATE_WAIT. MDEV-31949 Incremental functional commit to address - log-slave-updates=0 - xa rollback Todo: 0. cleanup 1. add mtr tests 2. implement mutex wait by XAC (to limit the spin number) MDEV-31949 incremental commit implements pthread cond wait by XAC to feature - a new processlist stage introduced - wait_for_commit::COND_wait_xa_commit added to the class - wait_for_commit* XID_cache_element::waiter added to the class - little cleanup TODO: - find optimal SPIN_MAX Duplicate xid proper handling briefly tested with up to 32 workers. - sliding window is made 2 * |W| - one more XID_cache_element::p_waiter is added to serve in C2 -> P3 control. - some cleanup TODO: complete testing and cleanup prove the window size MDEV-31949: Initial MTR test commit Note it will cause the server to crash with tests 4a and 4b Also fixed a master-side assertion error MDEV-31949 optimization not to W4PC in a certain case and ... restore the assigned XAP sliding window size back to the original |W|. It's safe now when a XAP and a XAC use separate wait conditions. MDEV-31949 fixes around rpl_xa_concurrent_xap_xac 1. the XAP duplicate xid window registration is moved up to cover all gco situations; 2. The windon object gets destroyed for real 3. the test simplified to remove 4b in favor of a new to-be-done P1(fail)->C2-P3 branch. Fixed lsu=0 false wakeup as seen by rpl.rpl_parallel_optimistic_xa_lsu_off MDEV-31949: Test fixes/improvements Fixed rpl_xa_prepare_gtid_fail. The problem was that now with concurrent XAP/XAC, the XAP signalled the XAC to complete, to which then the XAP would continue to update gtid_slave_pos (with induced error), and fail, but the XAC would already have completed. To fix with MDEV-21777, but the change to the test is just to restart the slave with the position of the XAC, as opposed to the XAP, because it is now able to complete. In rpl_xa_concurrent_xap_xac, test case 4b is repurposed to ensure the XAP duplicate xid wait case, such that if the prior XAP fails, the later in-wait XAP rolls back successfully. squash! MDEV-31949 parallel slave xa round-robin distribution MDEV-31949 fixes to P1,R2,P3 XA-ROLLBACK (R2) did not call wakeup subsequent commit. Fixed with making it to find `xid` in the usual place. MDEV-31949: Extended test for ROLLBACK case Restructured and renamed rpl_xa_concurrent_xap_xac to rpl_xa_concurrent_2pc, and moved its test cases into an include file with a parameterized completion event, either COMMIT or ROLLBACK. The main test then calls this included file under COMMIT and ROLLBACK variations to ensure the behavior is correct for both cases. Additionally updated the XA ROLLBACK logic in the code so it doesn't call acquire_xid() after binlogging, because the rollback case gets the XID beforehand. Missed include/rpl_xa_concurrent_2pc.inc in last commit Correction in condition leading to XAC to possibly wait for xid is corrected. Before the change XAC could do some extra work inside acquire_xid. Cleanup commit. - xa rollback is made consistent with commit wrt xid waiting and logging; - rpl_xa_concurrent_2pc extended and refined to reflect the above - simplified logics around `is_async_xac`; - memory_order_relaxed within locked mutex. MDEV-32257 dangling XA-rollback in binlog from emtpy XA in pseudo_slave_mode This commit protects the slave from crashing at execution of an orphan XA-rollback in MDEV-31949 branch. This commit complements the preceding one. Fixes to a MDEV-32257-like scenario to prove the orphan XA-"complete" does not binlog. The test part for the previous commit. Earlier show-binlog-events are removed as being superseded by logic checks. Fixes to faling tests. - rpl.rpl_xa_concurrent_2pc runs only on debug builds - rpl.rpl_parallel_xa_same_xid showed race in handling XA-commit (gtid k) -> XA-start (gtid k+n) dependency. The duplicate xid pass protocol is reinforced. Parallel slave XAC_k now marks its xa delete intent, so XAP_k+n either sees that or marks xid itself. XAC_k does not express the intent when xid has been already gotten an XAP waiter (in which case things go as before). MDEV-31949: Cleanup and fix typo rpl_xa_empty_transaction test made deterministic MDEV-32347 ASAN xid_t::eq/event_xid_t::serialize poison, SIGSEGV in serialize_xid Unexpected use case to rollback XA-prepare was illegitimate - MDEV-32455 is reported to that effect - but it showed a vulnerability to access properties of THD::lex such as xid of a past statement. That is fixed. The assert has to be removed altogether at until MDEV-32455 gets fixed. Complete MDEV-32347 fixes to cover is_async_xac branch of access to xid. Fix Commit ID binlog filter MDEV-32347 a review note addressed. Cleanup: added docs to new functions and few cosmetics. Correcting P -> C dependency handling and cleanup P (slave_applier_reset_xa_trans) -> C (xid_cache_search_maybe_wait) is made compatible with C (xid_cache_delete) -> P (xid_cache_insert_maybe_wait). The latter pair employs CAS to guarantee synchronization about a waiter. C -> P is also simplified so P -> C reflects that. A large difference beteen the two is that in C -> P the xid record is eliminated from the cache which forces to rely on lf_hash_search() as a condition variable for pthread-cond-signal wait. The concluding commit prior to collapse the branch. - renaming - function header comments added - hyperoptimization of `wfc->parent_commit_started` is removed for the reason of not having been proved safe - the size of the XAP sliding window is doubled to account a possibility of XAP_k -> XAP_k+2|W|-1 dependency. Say k=1, and the # of Workers is 4. Transaction are distributed RR, then it's possible to have T^*_1 -> T^*_8. It's seen from worker queues. The queue depelops downward: W1 ... W4 1^* 2 3 4 5 6 7 8^* Worker # 1 has assigned with T_1 and T_5. Worker #4 can take on its T_8 when T_1 is yet at the beginning of its processing, so even before XA START of that XAP. This analysis was done couple of weeks ago, but I have not found a commit planned to cover it. Fixed ASAN/MSAN build spotted possibly non-exiting access to thd->rgi_slave->commit_orderer. TODO: explain/fix rpl.rpl_xa_concurrent_2pc, rpl.rpl_xa_prepare_gtid_fail
1 parent ee5cadd commit 87d79f0

24 files changed

+2381
-101
lines changed

mysql-test/include/show_binlog_events2.inc

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
# ==== Usage ====
2+
#
3+
# [--let $binlog_file= [<FILENAME> | LAST]]
4+
# [--let $binlog_start= <POSITION> ]
5+
# [--let $filter_cid= [0 | 1]
6+
17
if ($binlog_start)
28
{
39
--let $_binlog_start=$binlog_start
@@ -14,4 +20,9 @@ if ($binlog_file)
1420
--replace_result "$_from_binlog_start" "from <binlog_start>" $MYSQLTEST_VARDIR MYSQLTEST_VARDIR
1521
--replace_column 2 # 5 #
1622
--replace_regex /\/\* xid=.* \*\//\/* XID *\// /table_id: [0-9]+/table_id: #/ /file_id=[0-9]+/file_id=#/ /GTID [0-9]+-[0-9]+-[0-9]+/GTID #-#-#/
23+
if ($filter_cid)
24+
{
25+
--replace_regex /\/\* xid=.* \*\//\/* XID *\// /table_id: [0-9]+/table_id: #/ /file_id=[0-9]+/file_id=#/ /GTID [0-9]+-[0-9]+-[0-9]+/GTID #-#-#/ / cid=[0-9]+//
26+
27+
}
1728
--eval show binlog events $_in_binlog_file from $_binlog_start
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
CREATE TABLE ta (c INT KEY) engine=Aria;
2+
XA START 'xid_a';
3+
INSERT INTO ta VALUES (1);
4+
XA END 'xid_a';
5+
XA PREPARE 'xid_a';
6+
Warnings:
7+
Warning 1030 Got error 131 "Command not supported by the engine" from storage engine Aria
8+
LOAD INDEX INTO CACHE c KEY(PRIMARY);
9+
Table Op Msg_type Msg_text
10+
test.c preload_keys Error XAER_RMFAIL: The command cannot be executed when global transaction is in the PREPARED state
11+
test.c preload_keys Error XAER_RMFAIL: The command cannot be executed when global transaction is in the PREPARED state
12+
test.c preload_keys error Corrupt
13+
Warnings:
14+
Warning 1196 Some non-transactional changed tables couldn't be rolled back
15+
XA ROLLBACK 'xid_a';
16+
CREATE TABLE ti (c INT KEY) engine=Innodb;
17+
XA START 'xid_i';
18+
INSERT INTO ti VALUES (1);
19+
XA END 'xid_i';
20+
XA PREPARE 'xid_i';
21+
LOAD INDEX INTO CACHE c KEY(PRIMARY);
22+
Table Op Msg_type Msg_text
23+
test.c preload_keys Error XAER_RMFAIL: The command cannot be executed when global transaction is in the PREPARED state
24+
test.c preload_keys Error XAER_RMFAIL: The command cannot be executed when global transaction is in the PREPARED state
25+
test.c preload_keys error Corrupt
26+
XA COMMIT 'xid_i';
27+
SELECT * FROM ti;
28+
c
29+
include/show_binlog_events.inc
30+
Log_name Pos Event_type Server_id End_log_pos Info
31+
master-bin.000001 # Gtid # # GTID #-#-#
32+
master-bin.000001 # Query # # use `test`; CREATE TABLE ta (c INT KEY) engine=Aria
33+
master-bin.000001 # Gtid # # BEGIN GTID #-#-#
34+
master-bin.000001 # Annotate_rows # # INSERT INTO ta VALUES (1)
35+
master-bin.000001 # Table_map # # table_id: # (test.ta)
36+
master-bin.000001 # Write_rows_v1 # # table_id: # flags: STMT_END_F
37+
master-bin.000001 # Query # # COMMIT
38+
master-bin.000001 # Gtid # # XA START X'7869645f61',X'',1 GTID #-#-#
39+
master-bin.000001 # Query # # XA END X'7869645f61',X'',1
40+
master-bin.000001 # XA_prepare # # XA PREPARE X'7869645f61',X'',1
41+
master-bin.000001 # Gtid # # GTID #-#-#
42+
master-bin.000001 # Query # # XA ROLLBACK X'7869645f61',X'',1
43+
master-bin.000001 # Gtid # # GTID #-#-#
44+
master-bin.000001 # Query # # use `test`; CREATE TABLE ti (c INT KEY) engine=Innodb
45+
master-bin.000001 # Gtid # # XA START X'7869645f69',X'',1 GTID #-#-#
46+
master-bin.000001 # Annotate_rows # # INSERT INTO ti VALUES (1)
47+
master-bin.000001 # Table_map # # table_id: # (test.ti)
48+
master-bin.000001 # Write_rows_v1 # # table_id: # flags: STMT_END_F
49+
master-bin.000001 # Query # # XA END X'7869645f69',X'',1
50+
master-bin.000001 # XA_prepare # # XA PREPARE X'7869645f69',X'',1
51+
master-bin.000001 # Gtid # # GTID #-#-#
52+
master-bin.000001 # Query # # XA ROLLBACK X'7869645f69',X'',1
53+
drop table ta,ti;
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
--source include/have_binlog_format_row.inc
2+
--source include/have_innodb.inc
3+
4+
CREATE TABLE ta (c INT KEY) engine=Aria;
5+
XA START 'xid_a';
6+
INSERT INTO ta VALUES (1);
7+
XA END 'xid_a';
8+
XA PREPARE 'xid_a';
9+
10+
#--error ER_XAER_RMFAIL
11+
LOAD INDEX INTO CACHE c KEY(PRIMARY);
12+
13+
XA ROLLBACK 'xid_a';
14+
15+
CREATE TABLE ti (c INT KEY) engine=Innodb;
16+
XA START 'xid_i';
17+
INSERT INTO ti VALUES (1);
18+
XA END 'xid_i';
19+
XA PREPARE 'xid_i';
20+
21+
# --error ER_XAER_RMFAIL
22+
LOAD INDEX INTO CACHE c KEY(PRIMARY);
23+
24+
XA COMMIT 'xid_i';
25+
SELECT * FROM ti;
26+
27+
#
28+
--source include/show_binlog_events.inc
29+
30+
drop table ta,ti;

0 commit comments

Comments
 (0)