Skip to content

Conversation

ghost
Copy link

@ghost ghost commented Jul 1, 2014

There must be one space before equal sign

@vuvova vuvova self-assigned this Jul 1, 2014
@vuvova
Copy link
Member

vuvova commented Jul 1, 2014

No, there should not be. This is the coding style that is used everywhere in the MariaDB (or MySQL) sources. No space before, one space after.

@vuvova vuvova closed this Jul 1, 2014
@ghost
Copy link
Author

ghost commented Jul 1, 2014

server/client/mysql.cc

2279 line

if (!(inchar = (uchar) *++pos))

why we can see different code style every where?

why not using the same code style

@ghost
Copy link
Author

ghost commented Jul 1, 2014

the same file 2287 line
if ((com= find_command((char) inchar)))

this style is as you said. but not every place the same rule.

@ghost
Copy link
Author

ghost commented Jul 1, 2014

the same file 2435 line;
out=line;

seems the rule is said , there is no rules, you can do anything you want, just code can run?

spetrunia added a commit that referenced this pull request May 23, 2016
Variant #4 of the fix.

Make ORDER BY optimization functions take into account multiple
equalities. This is done in several places:
- remove_const() checks whether we can sort the first table in the
  join, or we need to put rows into temp.table and then sort.
- test_if_order_by_key() checks whether there are indexes that
  can be used to produce the required ordering
- make_unireg_sortorder() constructs sort criteria for filesort.
elenst added a commit that referenced this pull request Feb 6, 2017
Test logic essentially depends on group_concat_max_len and result
truncation, preserve it
nirbhayc pushed a commit that referenced this pull request Feb 8, 2017
Test logic essentially depends on group_concat_max_len and result
truncation, preserve it
ankitkumar031 pushed a commit to ankitkumar031/server that referenced this pull request Apr 16, 2017
dr-m added a commit that referenced this pull request May 5, 2017
This only merges MDEV-12253, adapting it to MDEV-12602 which is already
present in 10.2 but not yet in the 10.1 revision that is being merged.

TODO: Error handling in crash recovery needs to be improved.
If a page cannot be decrypted (or read), we should cleanly abort
the startup. If innodb_force_recovery is specified, we should
ignore the problematic page and apply redo log to other pages.
Currently, the test encryption.innodb-redo-badkey randomly fails
like this (the last messages are from cmake -DWITH_ASAN):

2017-05-05 10:19:40 140037071685504 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1635994
2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Missing MLOG_FILE_NAME or MLOG_FILE_DELETE before MLOG_CHECKPOINT for tablespace 1
2017-05-05 10:19:40 140037071685504 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2201] with error Data structure corruption
2017-05-05 10:19:41 140037071685504 [Note] InnoDB: Starting shutdown...
i=================================================================
==5226==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x612000018588 in thread T0
    #0 0x736750 in operator delete(void*) (/mariadb/server/build/sql/mysqld+0x736750)
    #1 0x1e4833f in LatchCounter::~LatchCounter() /mariadb/server/storage/innobase/include/sync0types.h:599:4
    #2 0x1e480b8 in LatchMeta<LatchCounter>::~LatchMeta() /mariadb/server/storage/innobase/include/sync0types.h:786:17
    #3 0x1e35509 in sync_latch_meta_destroy() /mariadb/server/storage/innobase/sync/sync0debug.cc:1622:3
    #4 0x1e35314 in sync_check_close() /mariadb/server/storage/innobase/sync/sync0debug.cc:1839:2
    #5 0x1dfdc18 in innodb_shutdown() /mariadb/server/storage/innobase/srv/srv0start.cc:2888:2
    #6 0x197e5e6 in innobase_init(void*) /mariadb/server/storage/innobase/handler/ha_innodb.cc:4475:3
abarkov pushed a commit that referenced this pull request Jun 19, 2017
…RT in subquery

The bug happens because of a combination of unfortunate circumstances:

1. Arguments args[0] and args[2] of Item_func_concat point recursively
(through Item_direct_view_ref's) to the same Item_func_conv_charset.
Both args[0]->args[0]->ref[0] and args[2]->args[0]->ref[0] refer to
this Item_func_conv_charset.

2. When Item_func_concat::args[0]->val_str() is called,
Item_func_conv_charset::val_str() writes its result to
Item_func_conc_charset::tmp_value.

3. Then, for optimization purposes (to avoid copying),
Item_func_substr::val_str() initializes Item_func_substr::tmp_value
to point to the buffer fragment owned by Item_func_conv_charset::tmp_value
Item_func_substr::tmp_value is returned as a result of
Item_func_concat::args[0]->val_str().

4. Due to optimization to avoid memory reallocs,
Item_func_concat::val_str() remembers the result of args[0]->val_str()
in "res" and further uses "res" to collect the return value.

5. When Item_func_concat::args[2]->val_str() is called,
Item_func_conv_charset::tmp_value gets overwritten (see #1),
which effectively overwrites args[0]'s Item_func_substr::tmp_value (see #3),
which effectively overwrites "res" (see #4).

This patch does the following:

a. Changes Item_func_conv_charset::val_str(String *str) to use
   tmp_value and str the other way around. After this change tmp_value
   is used to store a temporary result, while str is used to return the value.
   The fixes the second problem (without SUBSTR):
     SELECT CONCAT(t2,'-',t2) c2
       FROM (SELECT CONVERT(t USING latin1) t2 FROM t1) sub;
   As Item_func_concat::val_str() supplies two different buffers when calling
   args[0]->val_str() and args[2]->val_str(), in the new reduction the result
   created during args[0]->val_str() does not get overwritten by
   args[2]->val_str().

b. Fixing the same problem in val_str() for similar classes

   Item_func_to_base64
   Item_func_from_base64
   Item_func_weight_string
   Item_func_hex
   Item_func_unhex
   Item_func_quote
   Item_func_compress
   Item_func_uncompress
   Item_func_des_encrypt
   Item_func_des_decrypt
   Item_func_conv_charset
   Item_func_reverse
   Item_func_soundex
   Item_func_aes_encrypt
   Item_func_aes_decrypt
   Item_func_buffer

c. Fixing Item_func::val_str_from_val_str_ascii() the same way.
   Now Item_str_ascii_func::ascii_buff is used for temporary value,
   while the parameter passed to val_str() is used to return the result.
   This fixes the same problem when conversion (from ASCII to e.g. UCS2)
   takes place. See the ctype_ucs.test for example queries that returned
   wrong results before the fix.

d. Some Item_func descendand classes had temporary String buffers
   (tmp_value and tmp_str), but did not really use them.
   Removing these temporary buffers from:

   Item_func_decode_histogram
   Item_func_format
   Item_func_binlog_gtid_pos
   Item_func_spatial_collection:

e. Removing Item_func_buffer::tmp_value, because it's not used any more.

f. Renaming Item_func_[un]compress::buffer to "tmp_value",
   for consistency with other classes.

Note, this patch does not fix the following classes
(although they have a similar problem):

   Item_str_conv
   Item_func_make_set
   Item_char_typecast

They have a complex implementations and simple swapping between "tmp_value"
and "str" won't work. These classes will be fixed separately.
vuvova pushed a commit that referenced this pull request Jun 22, 2017
vuvova pushed a commit that referenced this pull request Jun 22, 2017
abarkov pushed a commit that referenced this pull request Oct 6, 2017
Fixing the asymmetry in the array field_types_merge_rules[][]
which caused data loss when mixing FLOAT + BIGINT in UNIONs
or hybrid functions:

1. FLOAT  + INT    = DOUBLE
2. FLOAT  + BIGINT = FLOAT
3. INT    + FLOAT  = DOUBLE
4. BIGINT + FLOAT  = DOUBLE

Now FLOAT + BIGINT (as in #2) also produces DOUBLE, like the cases #1,#3,#4 do.
dr-m pushed a commit that referenced this pull request Dec 8, 2017
dr-m pushed a commit that referenced this pull request Feb 20, 2018
Fixes this report:
==3165==ERROR: AddressSanitizer: use-after-poison on address 0x61e0000270a0 at pc 0x00000114b78c bp 0x7f15d65fe120 sp 0x7f15d65fd8d0
WRITE of size 1366 at 0x61e0000270a0 thread T28
    #0 0x114b78b in __asan_memcpy fun/cpp_projects/llvm_toolchain/llvm/projects/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cc:23
    #1 0x208208d in TABLE::init(THD*, TABLE_LIST*) work/mariadb/sql/table.cc:4662:3
    #2 0x19df85b in open_table(THD*, TABLE_LIST*, Open_table_context*) work/mariadb/sql/sql_base.cc:1993:10
    #3 0x19eb968 in open_and_process_table(THD*, LEX*, TABLE_LIST*, unsigned int*, unsigned int, Prelocking_strategy*, bool, Open_table_context*) work/mariadb/sql/sql_base.cc:3483:14
    #4 0x19e7c05 in open_tables(THD*, DDL_options_st const&, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) work/mariadb/sql/sql_base.cc:4001:14
    #5 0x19f4dac in open_and_lock_tables(THD*, DDL_options_st const&, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*) work/mariadb/sql/sql_base.cc:4879:7
    #6 0x1627263 in open_and_lock_tables(THD*, TABLE_LIST*, bool, unsigned int) work/mariadb/sql/sql_base.h:487:10
    #7 0x1c3839c in mysql_execute_command(THD*) work/mariadb/sql/sql_parse.cc:5113:13
    #8 0x1c1b72c in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) work/mariadb/sql/sql_parse.cc:7980:18
    #9 0x1c13464 in handle_bootstrap_impl(THD*) work/mariadb/sql/sql_parse.cc:1044:5
    #10 0x1c11ff7 in do_handle_bootstrap(THD*) work/mariadb/sql/sql_parse.cc:1096:3
    #11 0x1c11d14 in handle_bootstrap work/mariadb/sql/sql_parse.cc:1079:3
    #12 0x115a6ae in __asan::AsanThread::ThreadStart(unsigned long, __sanitizer::atomic_uintptr_t*) fun/cpp_projects/llvm_toolchain/llvm/projects/compiler-rt/lib/asan/asan_thread.cc:259
    #13 0x7f15fe1407fb in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x77fb)
    #14 0x7f15fbb64b5e in clone /build/glibc-itYbWN/glibc-2.26/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
svoj pushed a commit that referenced this pull request Mar 22, 2018
srv_last_monitor_time: make all accesses relaxed atomical

WARNING: ThreadSanitizer: data race (pid=12041)
  Write of size 8 at 0x000003949278 by thread T26 (mutexes: write M226445748578513120):
    #0 thd_destructor_proxy storage/innobase/handler/ha_innodb.cc:314:14 (mysqld+0x19b5505)

  Previous read of size 8 at 0x000003949278 by main thread:
    #0 innobase_init(void*) storage/innobase/handler/ha_innodb.cc:4180:11 (mysqld+0x1a03404)
    #1 ha_initialize_handlerton(st_plugin_int*) sql/handler.cc:522:31 (mysqld+0xc5ec73)
    #2 plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) sql/sql_plugin.cc:1447:9 (mysqld+0x134908d)
    #3 plugin_init(int*, char**, int) sql/sql_plugin.cc:1729:15 (mysqld+0x13484f0)
    #4 init_server_components() sql/mysqld.cc:5345:7 (mysqld+0xbf720f)
    #5 mysqld_main(int, char**) sql/mysqld.cc:5940:7 (mysqld+0xbf107d)
    #6 main sql/main.cc:25:10 (mysqld+0xbe971b)

  Location is global 'srv_running' of size 8 at 0x000003949278 (mysqld+0x000003949278)

WARNING: ThreadSanitizer: data race (pid=27869)
  Atomic write of size 4 at 0x7b4800000c00 by thread T8:
    #0 __tsan_atomic32_exchange llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:589 (mysqld+0xbd4eac)
    #1 TTASEventMutex<GenericPolicy>::exit() storage/innobase/include/ib0mutex.h:467:7 (mysqld+0x1a8d4cb)
    #2 PolicyMutex<TTASEventMutex<GenericPolicy> >::exit() storage/innobase/include/ib0mutex.h:609:10 (mysqld+0x1a7839e)
    #3 fil_validate() storage/innobase/fil/fil0fil.cc:5535:2 (mysqld+0x1abd913)
    #4 fil_validate_skip() storage/innobase/fil/fil0fil.cc:204:9 (mysqld+0x1aba601)
    #5 fil_aio_wait(unsigned long) storage/innobase/fil/fil0fil.cc:5296:2 (mysqld+0x1abbae6)
    #6 io_handler_thread storage/innobase/srv/srv0start.cc:340:3 (mysqld+0x21abe1e)

  Previous read of size 4 at 0x7b4800000c00 by main thread (mutexes: write M1273, write M1271):
    #0 TTASEventMutex<GenericPolicy>::state() const storage/innobase/include/ib0mutex.h:530:10 (mysqld+0x21c66e2)
    #1 sync_array_detect_deadlock(sync_array_t*, sync_cell_t*, sync_cell_t*, unsigned long) storage/innobase/sync/sync0arr.cc:746:14 (mysqld+0x21c1c7a)
    #2 sync_array_wait_event(sync_array_t*, sync_cell_t*&) storage/innobase/sync/sync0arr.cc:465:6 (mysqld+0x21c1708)
    #3 TTASEventMutex<GenericPolicy>::enter(unsigned int, unsigned int, char const*, unsigned int) storage/innobase/include/ib0mutex.h:516:6 (mysqld+0x1a8c206)
    #4 PolicyMutex<TTASEventMutex<GenericPolicy> >::enter(unsigned int, unsigned int, char const*, unsigned int) storage/innobase/include/ib0mutex.h:635:10 (mysqld+0x1a782c3)
    #5 fil_mutex_enter_and_prepare_for_io(unsigned long) storage/innobase/fil/fil0fil.cc:1131:3 (mysqld+0x1a9a92e)
    #6 fil_io(IORequest const&, bool, page_id_t const&, page_size_t const&, unsigned long, unsigned long, void*, void*, bool) storage/innobase/fil/fil0fil.cc:5082:2 (mysqld+0x1ab8de2)
    #7 buf_flush_write_block_low(buf_page_t*, buf_flush_t, bool) storage/innobase/buf/buf0flu.cc:1112:3 (mysqld+0x1cb970a)
    #8 buf_flush_page(buf_pool_t*, buf_page_t*, buf_flush_t, bool) storage/innobase/buf/buf0flu.cc:1270:3 (mysqld+0x1cb7d70)
    #9 buf_flush_try_neighbors(page_id_t const&, buf_flush_t, unsigned long, unsigned long) storage/innobase/buf/buf0flu.cc:1493:9 (mysqld+0x1cc9674)
    #10 buf_flush_page_and_try_neighbors(buf_page_t*, buf_flush_t, unsigned long, unsigned long*) storage/innobase/buf/buf0flu.cc:1565:13 (mysqld+0x1cbadf3)
    #11 buf_do_flush_list_batch(buf_pool_t*, unsigned long, unsigned long) storage/innobase/buf/buf0flu.cc:1825:3 (mysqld+0x1cbbcb8)
    #12 buf_flush_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, flush_counters_t*) storage/innobase/buf/buf0flu.cc:1895:16 (mysqld+0x1cbb459)
    #13 buf_flush_do_batch(buf_pool_t*, buf_flush_t, unsigned long, unsigned long, flush_counters_t*) storage/innobase/buf/buf0flu.cc:2065:2 (mysqld+0x1cbcfe1)
    #14 buf_flush_lists(unsigned long, unsigned long, unsigned long*) storage/innobase/buf/buf0flu.cc:2167:8 (mysqld+0x1cbd5a3)
    #15 log_preflush_pool_modified_pages(unsigned long) storage/innobase/log/log0log.cc:1400:13 (mysqld+0x1eefc3b)
    #16 log_make_checkpoint_at(unsigned long, bool) storage/innobase/log/log0log.cc:1751:10 (mysqld+0x1eefb16)
    #17 buf_dblwr_create() storage/innobase/buf/buf0dblwr.cc:335:2 (mysqld+0x1cd2141)
    #18 innobase_start_or_create_for_mysql() storage/innobase/srv/srv0start.cc:2539:10 (mysqld+0x21b4d8e)
    #19 innobase_init(void*) storage/innobase/handler/ha_innodb.cc:4193:8 (mysqld+0x1a5e3d7)
    #20 ha_initialize_handlerton(st_plugin_int*) sql/handler.cc:522:31 (mysqld+0xc74d33)
    #21 plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) sql/sql_plugin.cc:1447:9 (mysqld+0x1376d5d)
    #22 plugin_init(int*, char**, int) sql/sql_plugin.cc:1729:15 (mysqld+0x13761c0)
    #23 init_server_components() sql/mysqld.cc:5348:7 (mysqld+0xc0d0ff)
    #24 mysqld_main(int, char**) sql/mysqld.cc:5943:7 (mysqld+0xc06f9d)
    #25 main sql/main.cc:25:10 (mysqld+0xbff71b)

WARNING: ThreadSanitizer: data race (pid=29031)
  Write of size 8 at 0x0000039e48e0 by thread T15:
    #0 srv_monitor_thread storage/innobase/srv/srv0srv.cc:1699:24 (mysqld+0x21a254e)

  Previous write of size 8 at 0x0000039e48e0 by thread T14:
    #0 srv_refresh_innodb_monitor_stats() storage/innobase/srv/srv0srv.cc:1165:24 (mysqld+0x21a3124)
    #1 srv_error_monitor_thread storage/innobase/srv/srv0srv.cc:1836:3 (mysqld+0x21a2d40)

  Location is global 'srv_last_monitor_time' of size 8 at 0x0000039e48e0 (mysqld+0x0000039e48e0)
abarkov added a commit that referenced this pull request May 11, 2018
1. Adding THD::convert_string(LEX_CSTRING *to,...) as a wrapper
   for convert_string(LEX_STRING *to,...), as LEX_CSTRING
   is now frequently used for conversion purpose.
   This reduced duplicate code in TEXT_STRING_sys,
   TEXT_STRING_literal, TEXT_STRING_filesystem grammar rules in *.yy

2. Adding yet another THD::convert_string() with an extra parameter
   "bool simple_copy_is_possible". This even more reduced
   repeatable code in the mentioned grammar rules in *.yy

3. Deriving Lex_ident_cli_st from Lex_string_with_metadata_st,
   as they have very similar functionality. Moving m_quote
   from Lex_ident_cli_st to Lex_string_with_metadata_st,
   as m_quote will be used later to optimize string literals anyway
   (e.g. avoid redundant copying on the tokenizer stage).
   Adjusting Lex_input_stream::get_text() accordingly.

4. Moving the reminders of the code in TEXT_STRING_sys, TEXT_STRING_literal,
   TEXT_STRING_filesystem grammar rules as new methods in THD:
   - make_text_string_sys()
   - make_text_string_connection()
   - make_text_string_filesystem()
   and changing *.yy to use these new methods.
   This reduced the amount of similar code in
   sql_yacc.yy and sql_yacc_ora.yy.

5. Removing duplicate code in Lex_input_stream::body_utf8_append_ident():
   by reusing THD::make_text_string_sys(). Thanks to #3 and #4.

6. Making THD members charset_is_system_charset,
   charset_is_collation_connection, charset_is_character_set_filesystem
   private, as they are not needed externally any more.
abarkov added a commit that referenced this pull request May 23, 2018
…larations

1. Adding LEX::make_item_sysvar() and reusing it
   in sql_yacc.yy and sql_yacc_ora.yy.
   Removing the "opt_component" rule.

2. Renaming rules to better reflect their purpose:
   - keyword to keyword_ident
   - keyword_sp to keyword_label
   - keyword_sp_not_data_type to keyword_sp_var_and_label

   Also renaming:
   - sp_decl_ident_keyword to keyword_sp_decl for naming consistency
   - keyword_alias to keyword_table_alias,
     for consistency with ident_table_alias
   - keyword_sp_data_type to keyword_data_type,
     as it has nothing SP-specific.

3. Moving GLOBAL_SYM, LOCAL_SYM, SESSION_SYM from
   keyword_sp_var_and_label to a separate rule keyword_sysvar_type.
   We don't have system variables with these names anyway.
   Adding ident_sysvar_name and using it in the grammar that needs
   a system variable name instead of ident_or_text.
   This removed a number of shift/reduce conflicts
   between GLOBAL_SYM/LOCAL_SYM/SESSION_SYM as a variable scope and
   as a variable name.

4. Moving keywords BEGIN_SYM, END (in both *.yy fiels)
   and EXCEPTION_SYM (in sql_yacc_ora.yy) into a separate
   rule keyword_sp_block_section, because in Oracle verb keywords
   (COMMIT, DO, HANDLER, OPEN, REPAIR, ROLLBACK, SAVEPOINT, SHUTDOWN, TRUNCATE)
   are good variables names and can appear in e.g. DECLARE,
   while block keywords (BEGIN, END, EXCEPTION) are not good variable names
   and cannot appear in DECLARE.

5. Further splitting keyword_directly_not_assignable in sql_yacc_ora.yy:
   moving keyword_sp_verb_clause out. Renaming the rest of
   keyword_directly_not_assignable to keyword_sp_head,
   which represents keywords that can appear in optional
   clauses in CREATE PROCEDURE/FUNCTION/TRIGGER.

6. Renaming keyword_sp_verb_clause to keyword_verb_clause,
   as now it does not contains anything SP-specific.

   As a result or #4,#5,#6, the rule keyword_directly_not_assignable
   was replaced to three separate rules:
   - keyword_sp_block
   - keyword_sp_head
   - keyword_verb_clause
   Adding the same rules in sql_yacc.yy, for unification.

6. Adding keyword_sp_head and keyword_verb_clause into keyword_sp_decl.
   This fixes MDEV-16244.

7. Reorganizing the rest of keyword related rules into two groups:
  a. Rules defining a list of keywords and consisting of only terminal symbols:
    - keyword_sp_var_not_label
    - keyword_sp_head
    - keyword_sp_verb_clause
    - keyword_sp_block_section
    - keyword_sysvar_type

  b. Rules that combine the above lists into keyword places:
    - keyword_table_alias
    - keyword_ident
    - keyword_label
    - keyword_sysvar_name
    - keyword_sp_decl
  Rules from the group "b" use on the right side only rules
  from the group "a" (with optional terminal symbols added).
  Rules from the group "b" DO NOT mutually use each other any more.
  This makes them easier to read (and see the difference between them).

  Sorting the right sides of the group "b" keyword rules alphabetically,
  for yet better readability.
andrelkin added a commit that referenced this pull request Jun 7, 2018
           specific temporary errors

The optimistic parallel slave's worker thread could face a run-time error due to
the algorithm's specifics which allows for conflicts like the reported
"Can't find record in 'table'".
A typical stack is like

{noformat}
#0  handler::print_error (this=0x61c00008f8a0, error=149, errflag=0) at handler.cc:3650
#1  0x0000555555e95361 in write_record (thd=thd@entry=0x62a0000a2208, table=table@entry=0x61f00008ce88, info=info@entry=0x7fffdee356d0) at sql_insert.cc:1944
#2  0x0000555555ea7767 in mysql_insert (thd=thd@entry=0x62a0000a2208, table_list=0x61b00012ada0, fields=..., values_list=..., update_fields=..., update_values=..., duplic=<optimized out>, ignore=<optimized out>) at sql_insert.cc:1039
#3  0x0000555555efda90 in mysql_execute_command (thd=thd@entry=0x62a0000a2208) at sql_parse.cc:3927
#4  0x0000555555f0cc50 in mysql_parse (thd=0x62a0000a2208, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at sql_parse.cc:7449
#5  0x00005555566d4444 in Query_log_event::do_apply_event (this=0x61200005b9c8, rgi=<optimized out>, query_arg=<optimized out>, q_len_arg=<optimized out>) at log_event.cc:4508
#6  0x00005555566d639e in Query_log_event::do_apply_event (this=<optimized out>, rgi=<optimized out>) at log_event.cc:4185
#7  0x0000555555d738cf in Log_event::apply_event (rgi=0x61d0001ea080, this=0x61200005b9c8) at log_event.h:1343
#8  apply_event_and_update_pos_apply (ev=ev@entry=0x61200005b9c8, thd=thd@entry=0x62a0000a2208, rgi=rgi@entry=0x61d0001ea080, reason=<optimized out>) at slave.cc:3479
#9  0x0000555555d8596b in apply_event_and_update_pos_for_parallel (ev=ev@entry=0x61200005b9c8, thd=thd@entry=0x62a0000a2208, rgi=rgi@entry=0x61d0001ea080) at slave.cc:3623
#10 0x00005555562aca83 in rpt_handle_event (qev=qev@entry=0x6190000fa088, rpt=rpt@entry=0x62200002bd68) at rpl_parallel.cc:50
#11 0x00005555562bd04e in handle_rpl_parallel_thread (arg=arg@entry=0x62200002bd68) at rpl_parallel.cc:1258
{noformat}

Here {{handler::print_error}} computes whether to error log the
current error when --log-warnings > 1. The decision flag is consulted
bu {{my_message_sql()}} which can be eventually called.
In the bug case the decision is to log.
However in the optimistic mode slave applier case any conflict is
attempted to resolve with rollback and retry to success. Hence the
logging is at least extraneous.

The case is fixed with refining  the flags computation for my_message_sql()
to downgrade it to the warning level when the error comes from the *optimistically*
{{rpl_group_info::SPECULATE_OPTIMISTIC}} running parallel slave
thread. As this change has a side effect to stop poluting the slave worker's THD::main_da
the slave warning reporting is slightly refined so that {{convert_handler_error()}} does
not log any more warning message when there's one already.
Secondly, post temporary error {{convert_kill_to_deadlock_error()}} is also
refined to accept a "manual" (not being in THD da) error code. This change
is necessary to force a pseudo-deadlock error reporting and consequent retry.
andrelkin added a commit that referenced this pull request Jun 8, 2018
           specific temporary errors

The optimistic parallel slave's worker thread could face a run-time error due to
the algorithm's specifics which allows for conflicts like the reported
"Can't find record in 'table'".
A typical stack is like

{noformat}
#0  handler::print_error (this=0x61c00008f8a0, error=149, errflag=0) at handler.cc:3650
#1  0x0000555555e95361 in write_record (thd=thd@entry=0x62a0000a2208, table=table@entry=0x61f00008ce88, info=info@entry=0x7fffdee356d0) at sql_insert.cc:1944
#2  0x0000555555ea7767 in mysql_insert (thd=thd@entry=0x62a0000a2208, table_list=0x61b00012ada0, fields=..., values_list=..., update_fields=..., update_values=..., duplic=<optimized out>, ignore=<optimized out>) at sql_insert.cc:1039
#3  0x0000555555efda90 in mysql_execute_command (thd=thd@entry=0x62a0000a2208) at sql_parse.cc:3927
#4  0x0000555555f0cc50 in mysql_parse (thd=0x62a0000a2208, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at sql_parse.cc:7449
#5  0x00005555566d4444 in Query_log_event::do_apply_event (this=0x61200005b9c8, rgi=<optimized out>, query_arg=<optimized out>, q_len_arg=<optimized out>) at log_event.cc:4508
#6  0x00005555566d639e in Query_log_event::do_apply_event (this=<optimized out>, rgi=<optimized out>) at log_event.cc:4185
#7  0x0000555555d738cf in Log_event::apply_event (rgi=0x61d0001ea080, this=0x61200005b9c8) at log_event.h:1343
#8  apply_event_and_update_pos_apply (ev=ev@entry=0x61200005b9c8, thd=thd@entry=0x62a0000a2208, rgi=rgi@entry=0x61d0001ea080, reason=<optimized out>) at slave.cc:3479
#9  0x0000555555d8596b in apply_event_and_update_pos_for_parallel (ev=ev@entry=0x61200005b9c8, thd=thd@entry=0x62a0000a2208, rgi=rgi@entry=0x61d0001ea080) at slave.cc:3623
#10 0x00005555562aca83 in rpt_handle_event (qev=qev@entry=0x6190000fa088, rpt=rpt@entry=0x62200002bd68) at rpl_parallel.cc:50
#11 0x00005555562bd04e in handle_rpl_parallel_thread (arg=arg@entry=0x62200002bd68) at rpl_parallel.cc:1258
{noformat}

Here {{handler::print_error}} computes whether to error log the
current error when --log-warnings > 1. The decision flag is consulted
bu {{my_message_sql()}} which can be eventually called.
In the bug case the decision is to log.
However in the optimistic mode slave applier case any conflict is
attempted to resolve with rollback and retry to success. Hence the
logging is at least extraneous.

The case is fixed with adding a new flag {{ME_LOG_AS_WARN}} which
{{handler::print_error}} may propagate further on through {{my_error}}
when the error comes from an optimistically running slave worker thread.

The new flag effectively requests the warning level for the errlog record,
while the thread's DA records the actual error (which is regarded as temporary one
by the parallel slave error handler).
andrelkin added a commit that referenced this pull request Jun 11, 2018
           specific temporary errors

The optimistic parallel slave's worker thread could face a run-time error due to
the algorithm's specifics which allows for conflicts like the reported
"Can't find record in 'table'".
A typical stack is like

{noformat}
#0  handler::print_error (this=0x61c00008f8a0, error=149, errflag=0) at handler.cc:3650
#1  0x0000555555e95361 in write_record (thd=thd@entry=0x62a0000a2208, table=table@entry=0x61f00008ce88, info=info@entry=0x7fffdee356d0) at sql_insert.cc:1944
#2  0x0000555555ea7767 in mysql_insert (thd=thd@entry=0x62a0000a2208, table_list=0x61b00012ada0, fields=..., values_list=..., update_fields=..., update_values=..., duplic=<optimized out>, ignore=<optimized out>) at sql_insert.cc:1039
#3  0x0000555555efda90 in mysql_execute_command (thd=thd@entry=0x62a0000a2208) at sql_parse.cc:3927
#4  0x0000555555f0cc50 in mysql_parse (thd=0x62a0000a2208, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at sql_parse.cc:7449
#5  0x00005555566d4444 in Query_log_event::do_apply_event (this=0x61200005b9c8, rgi=<optimized out>, query_arg=<optimized out>, q_len_arg=<optimized out>) at log_event.cc:4508
#6  0x00005555566d639e in Query_log_event::do_apply_event (this=<optimized out>, rgi=<optimized out>) at log_event.cc:4185
#7  0x0000555555d738cf in Log_event::apply_event (rgi=0x61d0001ea080, this=0x61200005b9c8) at log_event.h:1343
#8  apply_event_and_update_pos_apply (ev=ev@entry=0x61200005b9c8, thd=thd@entry=0x62a0000a2208, rgi=rgi@entry=0x61d0001ea080, reason=<optimized out>) at slave.cc:3479
#9  0x0000555555d8596b in apply_event_and_update_pos_for_parallel (ev=ev@entry=0x61200005b9c8, thd=thd@entry=0x62a0000a2208, rgi=rgi@entry=0x61d0001ea080) at slave.cc:3623
#10 0x00005555562aca83 in rpt_handle_event (qev=qev@entry=0x6190000fa088, rpt=rpt@entry=0x62200002bd68) at rpl_parallel.cc:50
#11 0x00005555562bd04e in handle_rpl_parallel_thread (arg=arg@entry=0x62200002bd68) at rpl_parallel.cc:1258
{noformat}

Here {{handler::print_error}} computes whether to error log the
current error when --log-warnings > 1. The decision flag is consulted
bu {{my_message_sql()}} which can be eventually called.
In the bug case the decision is to log.
However in the optimistic mode slave applier case any conflict is
attempted to resolve with rollback and retry to success. Hence the
logging is at least extraneous.

The case is fixed with adding a new flag {{ME_LOG_AS_WARN}} which
{{handler::print_error}} may propagate further on through {{my_error}}
when the error comes from an optimistically running slave worker thread.

The new flag effectively requests the warning level for the errlog record,
while the thread's DA records the actual error (which is regarded as temporary one
by the parallel slave error handler).
vlad-lesin added a commit that referenced this pull request Aug 17, 2021
…==========

This is the initial patch to show how gap lock is inherited during
purge. The tests are for debugging.

The code can be used to remove gap lock inherit code from purge process.

===============================================================================
Extend gap locks in row_search_mvcc().

This is preliminary code without a good testing. The general logic is
the following:

1) Use two directions to extend gap locks - FORWARD and BACKWARD only if
"direction" argument of row_search_mvcc() is 0, otherwise use only
FORWARD.

2) FORWARD and BACKWARD does not really mean forward or backward
iteration through B-tree leafs, FORWARD corresponds to the same
direction which was choosen in row_search_mvcc() when "moves_up"
variables is set, while BACKWARD means opposite direction.

3) If "direction" argument of row_search_mvcc() is 0 then cursor
position is stored in local cursor object before going to the next record in
BACKWARD direction.

4) When the first non-delete-marked record is reached in BACKWARD scan,
mini-transaction is committed, the cursor position is restored from local
cursor object and scan direction is changed.

Currently all innodb tests are passed.

Things to do:

1) Copy changes in row_sel(),
2) Remove gap lock inheritance from purge process,
3) Add more cases in mtr test:
  a) spatial indexes
  b) tests for row_sel()
  c) ...
  d) PROFIT!!!

Notes for code reviewer:
I count on the preliminary review to be sure I am moving in the correct
direction and did not make some obvoius errors, so please please don't pay
attention on code format and non-full testing.

===============================================================================
row_sel() changes

===============================================================================
do not inherit gap locks on purge

===============================================================================
Test for row_search_mvcc

===============================================================================
Foreign keys constraints check fix.

The problem of the current fix is that it's complexity is n*m. Because
there will be one pass of parent gap for each record in childs gap.

Duplicates check is not implemented.

===============================================================================
This is a try to implement the way of row_sel() testing.

The idea is to have special debug variable innobase_debug_que_eval_sql,
when this variable is set, the inernal innodb query parser is invoked,
and the result is sent to user.

===============================================================================
Add forward scan and insert intention locking for insert operation.

Removed backward scan from foreign key contraints check.

Removed backward scan from row_search_mvcc(), leave it only for the case
of ROW_SEL_EXACT_PREFIX(i.e. for ORDER BY ... DESC).

Removed backward scan from rol_sel() (except ORDER BY ... DESC).

Added forward scan for secondary indexes duplicates check.

===============================================================================
Add debug check.

If purgeable  record has gap, the next record must has gap too.

===============================================================================
Some debug tests. Can be useful for research.

===============================================================================
Code cleanup.

===============================================================================
Revert "This is a try to implement the way of row_sel() testing."

This reverts commit dc33c72c3ba69989e11f377aa902ed9b32f8854a.

===============================================================================
Do not set LOCK_REC_NOT_GAP for delete-marked records in row_search_mvcc()

===============================================================================
Check if lock_update_delete() is invoked from purge process, the deleted
record must be delete-marked.

===============================================================================
Do not take into account insert intention locks on debug check.

===============================================================================
The current commit solves the following issues:

-------------------------------------------------------------------------------
I. If some record is deleted by rollback, it's lock is inherited as gap
lock to the next record. And if the next record is then purged while the
lock is still held, the debug check will fail.

The scenario is the following:

1) Some thread executes "INSERT" and checks clustered index for
duplicates, it sets shared lock for checked record(let's call it record
A) converting implicit lock to explicit one. Note that the record's
transaction id is the same as the current transaction id:
-------------------
0x000055f1e65c6bd6 in lock_rec_create_low (c_lock=0x0, thr=0x0, type_mode=1059, space=11, page_no=3, page=0x2e167079c000 "l\206", <incomplete sequence \372\221>, heap_no=9, index=0x149444393cd0,
    trx=0x55522d18d188, holds_trx_mutex=true) at ./storage/innobase/lock/lock0lock.cc:1466
1466            lock->type_mode = (type_mode & ~LOCK_TYPE_MASK) | LOCK_REC;
(rr) bt
\#0  0x000055f1e65c6bd6 in lock_rec_create_low (c_lock=0x0, thr=0x0, type_mode=1059, space=11, page_no=3, page=0x2e167079c000 "l\206", <incomplete sequence \372\221>, heap_no=9, index=0x149444393cd0,
    trx=0x55522d18d188, holds_trx_mutex=true) at ./storage/innobase/lock/lock0lock.cc:1466
\#1  0x000055f1e65c2b29 in lock_rec_create (c_lock=0x0, thr=0x0, type_mode=1059, block=0x2e1670080560, heap_no=9, index=0x149444393cd0, trx=0x55522d18d188, caller_owns_trx_mutex=true)
    at ./storage/innobase/include/lock0lock.ic:133
\#2  0x000055f1e65c83fe in lock_rec_add_to_queue (type_mode=1059, block=0x2e1670080560, heap_no=9, index=0x149444393cd0, trx=0x55522d18d188, caller_owns_trx_mutex=true)
    at ./storage/innobase/lock/lock0lock.cc:1941
\#3  0x000055f1e65d228f in lock_rec_convert_impl_to_expl_for_trx (block=0x2e1670080560, rec=0x2e167079c299 "\200", index=0x149444393cd0, trx=0x55522d18d188, heap_no=9)
    at ./storage/innobase/lock/lock0lock.cc:5832
\#4  0x000055f1e65d2537 in lock_rec_convert_impl_to_expl (block=0x2e1670080560, rec=0x2e167079c299 "\200", index=0x149444393cd0, offsets=0x663f4cace6d0)
    at ./storage/innobase/lock/lock0lock.cc:5886
\#5  0x000055f1e65d32d9 in lock_clust_rec_read_check_and_lock (flags=0, block=0x2e1670080560, rec=0x2e167079c299 "\200", index=0x149444393cd0, offsets=0x663f4cace6d0, mode=LOCK_S, gap_mode=1024,
    thr=0x611370025b88) at ./storage/innobase/lock/lock0lock.cc:6194
\#6  0x000055f1e666b03c in row_ins_set_shared_rec_lock (type=1024, block=0x2e1670080560, rec=0x2e167079c299 "\200", index=0x149444393cd0, offsets=0x663f4cace6d0, thr=0x611370025b88)
    at ./storage/innobase/row/row0ins.cc:1427
\#7  0x000055f1e666cf4f in row_ins_duplicate_error_in_clust (flags=0, cursor=0x663f4cace9e0, entry=0x61137c044900, thr=0x611370025b88)
    at ./storage/innobase/row/row0ins.cc:2360
\#8  0x000055f1e666db48 in row_ins_clust_index_entry_low (flags=0, mode=2, index=0x149444393cd0, n_uniq=1, entry=0x61137c044900, n_ext=0, thr=0x611370025b88)
    at ./storage/innobase/row/row0ins.cc:2658
\#9  0x000055f1e666f0cb in row_ins_clust_index_entry (index=0x149444393cd0, entry=0x61137c044900, thr=0x611370025b88, n_ext=0)
    at ./storage/innobase/row/row0ins.cc:3146
\#10 0x000055f1e666f4a8 in row_ins_index_entry (index=0x149444393cd0, entry=0x61137c044900, thr=0x611370025b88) at ./storage/innobase/row/row0ins.cc:3265
\#11 0x000055f1e666f9a4 in row_ins_index_entry_step (node=0x611370025658, thr=0x611370025b88) at ./storage/innobase/row/row0ins.cc:3416
\#12 0x000055f1e666fd2a in row_ins (node=0x611370025658, thr=0x611370025b88) at ./storage/innobase/row/row0ins.cc:3553
\#13 0x000055f1e66700c6 in row_ins_step (thr=0x611370025b88) at ./storage/innobase/row/row0ins.cc:3677
\#14 0x000055f1e668c7e0 in row_insert_for_mysql (mysql_rec=0x61137005c460 "\377", prebuilt=0x611370024d50) at ./storage/innobase/row/row0mysql.cc:1408
\#15 0x000055f1e6556c1f in ha_innobase::write_row (this=0x611370033a00, record=0x61137005c460 "\377") at ./storage/innobase/handler/ha_innodb.cc:8284
\#16 0x000055f1e6378051 in handler::ha_write_row (this=0x611370033a00, buf=0x61137005c460 "\377") at ./sql/handler.cc:6118
\#17 0x000055f1e60f21c4 in write_record (thd=0xa48640010a8, table=0x61137005b8c8, info=0x663f4cacfa00) at ./sql/sql_insert.cc:1939
\#18 0x000055f1e60f00f4 in mysql_insert (thd=0xa48640010a8, table_list=0xa4864010d40, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false)
    at ./sql/sql_insert.cc:1066
\#19 0x000055f1e61149d7 in mysql_execute_command (thd=0xa48640010a8) at ./sql/sql_parse.cc:4220
\#20 0x000055f1e611faf6 in mysql_parse (thd=0xa48640010a8,
    rawbuf=0xa48640109e0 "INSERT INTO t6 (col1,col2, col_int, col_string, col_text) VALUES /* NULL */ (NULL,NULL,NULL,REPEAT(SUBSTR(CAST( NULL AS CHAR),1,1), 10),REPEAT(SUBSTR(CAST( NULL AS CHAR),1,1), @fill_amount) ), (NULL,N"..., length=347, parser_state=0x663f4cad0670, is_com_multi=false, is_next_command=false) at ./sql/sql_parse.cc:7796
-------------------

2) Then duplicate key is found in row_ins_duplicate_error_in_clust(),
and the transaction is rolled back. When it's rolled back, the lock is
inherited to the next record(let's call it record B) as a gap lock:
-------------------
\#0  lock_rec_create_low (c_lock=0x0, thr=0x0, type_mode=547, space=11, page_no=3, page=0x2e167079c000 "l\206", <incomplete sequence \372\221>, heap_no=33, index=0x149444393cd0, trx=0x55522d18d188,
    holds_trx_mutex=false) at ./storage/innobase/lock/lock0lock.cc:1467
\#1  0x000055f1e65c2b29 in lock_rec_create (c_lock=0x0, thr=0x0, type_mode=547, block=0x2e1670080560, heap_no=33, index=0x149444393cd0, trx=0x55522d18d188, caller_owns_trx_mutex=false)
    at ./storage/innobase/include/lock0lock.ic:133
\#2  0x000055f1e65c83fe in lock_rec_add_to_queue (type_mode=547, block=0x2e1670080560, heap_no=33, index=0x149444393cd0, trx=0x55522d18d188, caller_owns_trx_mutex=false)
    at ./storage/innobase/lock/lock0lock.cc:1941
\#3  0x000055f1e65c9fee in lock_rec_inherit_to_gap (heir_block=0x2e1670080560, block=0x2e1670080560, heir_heap_no=33, heap_no=9)
    at ./storage/innobase/lock/lock0lock.cc:2580
\#4  0x000055f1e65cc057 in lock_update_delete (block=0x2e1670080560, rec=0x2e167079c299 "\200", from_purge=false)
    at ./storage/innobase/lock/lock0lock.cc:3559
\#5  0x000055f1e6788e57 in btr_cur_optimistic_delete (cursor=0xa486405aff0, flags=0, mtr=0x663f4cacf250, from_purge=false)
    at ./storage/innobase/btr/btr0cur.cc:5252
\#6  0x000055f1e68af6aa in row_undo_ins_remove_clust_rec (node=0xa486405af80) at ./storage/innobase/row/row0uins.cc:141
\#7  0x000055f1e68b05b5 in row_undo_ins (node=0xa486405af80, thr=0xa4864042d78) at ./storage/innobase/row/row0uins.cc:518
\#8  0x000055f1e66d7d80 in row_undo (node=0xa486405af80, thr=0xa4864042d78) at ./storage/innobase/row/row0undo.cc:298
\#9  0x000055f1e66d7f2d in row_undo_step (thr=0xa4864042d78) at ./storage/innobase/row/row0undo.cc:351
\#10 0x000055f1e663e6a0 in que_thr_step (thr=0xa4864042d78) at ./storage/innobase/que/que0que.cc:1039
\#11 0x000055f1e663e8c1 in que_run_threads_low (thr=0xa4864042d78) at ./storage/innobase/que/que0que.cc:1103
\#12 0x000055f1e663ea73 in que_run_threads (thr=0xa4864042d78) at ./storage/innobase/que/que0que.cc:1143
\#13 0x000055f1e6733cb9 in trx_rollback_to_savepoint_low (trx=0x55522d18d188, savept=0x55522d18e198) at ./storage/innobase/trx/trx0roll.cc:107
\#14 0x000055f1e6733f5f in trx_rollback_to_savepoint (trx=0x55522d18d188, savept=0x55522d18e198) at ./storage/innobase/trx/trx0roll.cc:148
\#15 0x000055f1e6734756 in trx_rollback_last_sql_stat_for_mysql (trx=0x55522d18d188) at ./storage/innobase/trx/trx0roll.cc:281
\#16 0x000055f1e654fb65 in innobase_rollback (hton=0x55f1e8c17968, thd=0xa48640010a8, rollback_trx=false) at ./storage/innobase/handler/ha_innodb.cc:4875
\#17 0x000055f1e636dfb4 in ha_rollback_trans (thd=0xa48640010a8, all=false) at ./sql/handler.cc:1708
\#18 0x000055f1e6262a1b in trans_rollback_stmt (thd=0xa48640010a8) at ./sql/transaction.cc:565
\#19 0x000055f1e611b5e4 in mysql_execute_command (thd=0xa48640010a8) at ./sql/sql_parse.cc:6067
\#20 0x000055f1e611faf6 in mysql_parse (thd=0xa48640010a8,
    rawbuf=0xa48640109e0 "INSERT INTO t6 (col1,col2, col_int, col_string, col_text) VALUES /* NULL */ (NULL,NULL,NULL,REPEAT(SUBSTR(CAST( NULL AS CHAR),1,1), 10),REPEAT(SUBSTR(CAST( NULL AS CHAR),1,1), @fill_amount) ), (NULL,N"..., length=347, parser_state=0x663f4cad0670, is_com_multi=false, is_next_command=false) at ./sql/sql_parse.cc:7796
-------------------

3) purge is invoked, it tries to purge record B, record B has gap
lock, but the record next to the record B does not have gap lock, the
debug check is failed.

But initially on step 1 the acquired lock is not gap lock:
-----------------
dberr_t
row_ins_duplicate_error_in_clust(...)
{
...
        if (cursor->low_match >= n_unique) {
...
                        if (flags & BTR_NO_LOCKING_FLAG) {
                                /* Do nothing if no-locking is set */
                                err = DB_SUCCESS;
                        } else if (trx->duplicates) {
                                /* If the SQL-query will update or replace
                                duplicate key we will take X-lock for
                                duplicates ( REPLACE, LOAD DATAFILE REPLACE,
                                INSERT ON DUPLICATE KEY UPDATE). */
                                err = row_ins_set_exclusive_rec_lock(
                                        LOCK_REC_NOT_GAP,
                                        btr_cur_get_block(cursor),
                                        rec, cursor->index, offsets, thr);
                        } else {
                                err = row_ins_set_shared_rec_lock(
                                        LOCK_REC_NOT_GAP,
                                        btr_cur_get_block(cursor), rec,
                                        cursor->index, offsets, thr);
                        }
        }
...
}
-----------------

Then lock_rec_inherit_to_gap() copies this non-gap lock to gap lock
to the next record when transaction is rolled back and the record is
being deleted with btr_cur_optimistic_delete().

So, rollback converted that into a gap lock, what is wrong, the lock should
simply be deleted.

For this purpose convert_lock_to_gap flag is added to
lock_rec_inherit_to_gap() function arguments. When this flag is not set,
lock_rec_inherit_to_gap() ignores non-gap locks, and this flag is set
when lock_rec_inherit_to_gap() is invoked from rollback.

-------------------------------------------------------------------------------
II. When locking read is in progress, and requested ordinary-lock can not be
granted for delete-marked record due to conflicting lock, mtr is
committed, page latch is released, and purge thread can try to purge the
record. The debug check will fail as the record next to delete-marked ordinary-
locked one is not ordinary-locked.

To solve this issue hash-table of scanned record ids(page_id, heap_no
pairs) is stored in trx_t. After locking read is finished at the end of
row_search_mvcc() and rol_sel(), the hash-table is cleaned-up.

When permanent cursor is restored after the lock is granted and the
transaction thread is woken up, and the record stored in the cursor is
purged, then the position will be set to the previous or next record
dependin on the direction of scanning.

-------------------------------------------------------------------------------
Warning: the current implementation for row_sel() is wrong, because the
behaviour when there is conflicting lock is not the same as in
row_search_mvcc(), i.e. when transaction is suspended/woken up, the execution
does not leave row_search_mvcc(), while for row_sel() all necessary
steps to suspend/wake-up the thread are executed outside on row_sel(),
at the higher layer.

===============================================================================
The new system variable is added to test row_sel().

Some initial test is also added.
vlad-lesin added a commit that referenced this pull request Aug 17, 2021
…==========

This is the initial patch to show how gap lock is inherited during
purge. The tests are for debugging.

The code can be used to remove gap lock inherit code from purge process.

===============================================================================
Extend gap locks in row_search_mvcc().

This is preliminary code without a good testing. The general logic is
the following:

1) Use two directions to extend gap locks - FORWARD and BACKWARD only if
"direction" argument of row_search_mvcc() is 0, otherwise use only
FORWARD.

2) FORWARD and BACKWARD does not really mean forward or backward
iteration through B-tree leafs, FORWARD corresponds to the same
direction which was choosen in row_search_mvcc() when "moves_up"
variables is set, while BACKWARD means opposite direction.

3) If "direction" argument of row_search_mvcc() is 0 then cursor
position is stored in local cursor object before going to the next record in
BACKWARD direction.

4) When the first non-delete-marked record is reached in BACKWARD scan,
mini-transaction is committed, the cursor position is restored from local
cursor object and scan direction is changed.

Currently all innodb tests are passed.

Things to do:

1) Copy changes in row_sel(),
2) Remove gap lock inheritance from purge process,
3) Add more cases in mtr test:
  a) spatial indexes
  b) tests for row_sel()
  c) ...
  d) PROFIT!!!

Notes for code reviewer:
I count on the preliminary review to be sure I am moving in the correct
direction and did not make some obvoius errors, so please please don't pay
attention on code format and non-full testing.

===============================================================================
row_sel() changes

===============================================================================
do not inherit gap locks on purge

===============================================================================
Test for row_search_mvcc

===============================================================================
Foreign keys constraints check fix.

The problem of the current fix is that it's complexity is n*m. Because
there will be one pass of parent gap for each record in childs gap.

Duplicates check is not implemented.

===============================================================================
This is a try to implement the way of row_sel() testing.

The idea is to have special debug variable innobase_debug_que_eval_sql,
when this variable is set, the inernal innodb query parser is invoked,
and the result is sent to user.

===============================================================================
Add forward scan and insert intention locking for insert operation.

Removed backward scan from foreign key contraints check.

Removed backward scan from row_search_mvcc(), leave it only for the case
of ROW_SEL_EXACT_PREFIX(i.e. for ORDER BY ... DESC).

Removed backward scan from rol_sel() (except ORDER BY ... DESC).

Added forward scan for secondary indexes duplicates check.

===============================================================================
Add debug check.

If purgeable  record has gap, the next record must has gap too.

===============================================================================
Some debug tests. Can be useful for research.

===============================================================================
Code cleanup.

===============================================================================
Revert "This is a try to implement the way of row_sel() testing."

This reverts commit dc33c72c3ba69989e11f377aa902ed9b32f8854a.

===============================================================================
Do not set LOCK_REC_NOT_GAP for delete-marked records in row_search_mvcc()

===============================================================================
Check if lock_update_delete() is invoked from purge process, the deleted
record must be delete-marked.

===============================================================================
Do not take into account insert intention locks on debug check.

===============================================================================
The current commit solves the following issues:

-------------------------------------------------------------------------------
I. If some record is deleted by rollback, it's lock is inherited as gap
lock to the next record. And if the next record is then purged while the
lock is still held, the debug check will fail.

The scenario is the following:

1) Some thread executes "INSERT" and checks clustered index for
duplicates, it sets shared lock for checked record(let's call it record
A) converting implicit lock to explicit one. Note that the record's
transaction id is the same as the current transaction id:
-------------------
0x000055f1e65c6bd6 in lock_rec_create_low (c_lock=0x0, thr=0x0, type_mode=1059, space=11, page_no=3, page=0x2e167079c000 "l\206", <incomplete sequence \372\221>, heap_no=9, index=0x149444393cd0,
    trx=0x55522d18d188, holds_trx_mutex=true) at ./storage/innobase/lock/lock0lock.cc:1466
1466            lock->type_mode = (type_mode & ~LOCK_TYPE_MASK) | LOCK_REC;
(rr) bt
\#0  0x000055f1e65c6bd6 in lock_rec_create_low (c_lock=0x0, thr=0x0, type_mode=1059, space=11, page_no=3, page=0x2e167079c000 "l\206", <incomplete sequence \372\221>, heap_no=9, index=0x149444393cd0,
    trx=0x55522d18d188, holds_trx_mutex=true) at ./storage/innobase/lock/lock0lock.cc:1466
\#1  0x000055f1e65c2b29 in lock_rec_create (c_lock=0x0, thr=0x0, type_mode=1059, block=0x2e1670080560, heap_no=9, index=0x149444393cd0, trx=0x55522d18d188, caller_owns_trx_mutex=true)
    at ./storage/innobase/include/lock0lock.ic:133
\#2  0x000055f1e65c83fe in lock_rec_add_to_queue (type_mode=1059, block=0x2e1670080560, heap_no=9, index=0x149444393cd0, trx=0x55522d18d188, caller_owns_trx_mutex=true)
    at ./storage/innobase/lock/lock0lock.cc:1941
\#3  0x000055f1e65d228f in lock_rec_convert_impl_to_expl_for_trx (block=0x2e1670080560, rec=0x2e167079c299 "\200", index=0x149444393cd0, trx=0x55522d18d188, heap_no=9)
    at ./storage/innobase/lock/lock0lock.cc:5832
\#4  0x000055f1e65d2537 in lock_rec_convert_impl_to_expl (block=0x2e1670080560, rec=0x2e167079c299 "\200", index=0x149444393cd0, offsets=0x663f4cace6d0)
    at ./storage/innobase/lock/lock0lock.cc:5886
\#5  0x000055f1e65d32d9 in lock_clust_rec_read_check_and_lock (flags=0, block=0x2e1670080560, rec=0x2e167079c299 "\200", index=0x149444393cd0, offsets=0x663f4cace6d0, mode=LOCK_S, gap_mode=1024,
    thr=0x611370025b88) at ./storage/innobase/lock/lock0lock.cc:6194
\#6  0x000055f1e666b03c in row_ins_set_shared_rec_lock (type=1024, block=0x2e1670080560, rec=0x2e167079c299 "\200", index=0x149444393cd0, offsets=0x663f4cace6d0, thr=0x611370025b88)
    at ./storage/innobase/row/row0ins.cc:1427
\#7  0x000055f1e666cf4f in row_ins_duplicate_error_in_clust (flags=0, cursor=0x663f4cace9e0, entry=0x61137c044900, thr=0x611370025b88)
    at ./storage/innobase/row/row0ins.cc:2360
\#8  0x000055f1e666db48 in row_ins_clust_index_entry_low (flags=0, mode=2, index=0x149444393cd0, n_uniq=1, entry=0x61137c044900, n_ext=0, thr=0x611370025b88)
    at ./storage/innobase/row/row0ins.cc:2658
\#9  0x000055f1e666f0cb in row_ins_clust_index_entry (index=0x149444393cd0, entry=0x61137c044900, thr=0x611370025b88, n_ext=0)
    at ./storage/innobase/row/row0ins.cc:3146
\#10 0x000055f1e666f4a8 in row_ins_index_entry (index=0x149444393cd0, entry=0x61137c044900, thr=0x611370025b88) at ./storage/innobase/row/row0ins.cc:3265
\#11 0x000055f1e666f9a4 in row_ins_index_entry_step (node=0x611370025658, thr=0x611370025b88) at ./storage/innobase/row/row0ins.cc:3416
\#12 0x000055f1e666fd2a in row_ins (node=0x611370025658, thr=0x611370025b88) at ./storage/innobase/row/row0ins.cc:3553
\#13 0x000055f1e66700c6 in row_ins_step (thr=0x611370025b88) at ./storage/innobase/row/row0ins.cc:3677
\#14 0x000055f1e668c7e0 in row_insert_for_mysql (mysql_rec=0x61137005c460 "\377", prebuilt=0x611370024d50) at ./storage/innobase/row/row0mysql.cc:1408
\#15 0x000055f1e6556c1f in ha_innobase::write_row (this=0x611370033a00, record=0x61137005c460 "\377") at ./storage/innobase/handler/ha_innodb.cc:8284
\#16 0x000055f1e6378051 in handler::ha_write_row (this=0x611370033a00, buf=0x61137005c460 "\377") at ./sql/handler.cc:6118
\#17 0x000055f1e60f21c4 in write_record (thd=0xa48640010a8, table=0x61137005b8c8, info=0x663f4cacfa00) at ./sql/sql_insert.cc:1939
\#18 0x000055f1e60f00f4 in mysql_insert (thd=0xa48640010a8, table_list=0xa4864010d40, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false)
    at ./sql/sql_insert.cc:1066
\#19 0x000055f1e61149d7 in mysql_execute_command (thd=0xa48640010a8) at ./sql/sql_parse.cc:4220
\#20 0x000055f1e611faf6 in mysql_parse (thd=0xa48640010a8,
    rawbuf=0xa48640109e0 "INSERT INTO t6 (col1,col2, col_int, col_string, col_text) VALUES /* NULL */ (NULL,NULL,NULL,REPEAT(SUBSTR(CAST( NULL AS CHAR),1,1), 10),REPEAT(SUBSTR(CAST( NULL AS CHAR),1,1), @fill_amount) ), (NULL,N"..., length=347, parser_state=0x663f4cad0670, is_com_multi=false, is_next_command=false) at ./sql/sql_parse.cc:7796
-------------------

2) Then duplicate key is found in row_ins_duplicate_error_in_clust(),
and the transaction is rolled back. When it's rolled back, the lock is
inherited to the next record(let's call it record B) as a gap lock:
-------------------
\#0  lock_rec_create_low (c_lock=0x0, thr=0x0, type_mode=547, space=11, page_no=3, page=0x2e167079c000 "l\206", <incomplete sequence \372\221>, heap_no=33, index=0x149444393cd0, trx=0x55522d18d188,
    holds_trx_mutex=false) at ./storage/innobase/lock/lock0lock.cc:1467
\#1  0x000055f1e65c2b29 in lock_rec_create (c_lock=0x0, thr=0x0, type_mode=547, block=0x2e1670080560, heap_no=33, index=0x149444393cd0, trx=0x55522d18d188, caller_owns_trx_mutex=false)
    at ./storage/innobase/include/lock0lock.ic:133
\#2  0x000055f1e65c83fe in lock_rec_add_to_queue (type_mode=547, block=0x2e1670080560, heap_no=33, index=0x149444393cd0, trx=0x55522d18d188, caller_owns_trx_mutex=false)
    at ./storage/innobase/lock/lock0lock.cc:1941
\#3  0x000055f1e65c9fee in lock_rec_inherit_to_gap (heir_block=0x2e1670080560, block=0x2e1670080560, heir_heap_no=33, heap_no=9)
    at ./storage/innobase/lock/lock0lock.cc:2580
\#4  0x000055f1e65cc057 in lock_update_delete (block=0x2e1670080560, rec=0x2e167079c299 "\200", from_purge=false)
    at ./storage/innobase/lock/lock0lock.cc:3559
\#5  0x000055f1e6788e57 in btr_cur_optimistic_delete (cursor=0xa486405aff0, flags=0, mtr=0x663f4cacf250, from_purge=false)
    at ./storage/innobase/btr/btr0cur.cc:5252
\#6  0x000055f1e68af6aa in row_undo_ins_remove_clust_rec (node=0xa486405af80) at ./storage/innobase/row/row0uins.cc:141
\#7  0x000055f1e68b05b5 in row_undo_ins (node=0xa486405af80, thr=0xa4864042d78) at ./storage/innobase/row/row0uins.cc:518
\#8  0x000055f1e66d7d80 in row_undo (node=0xa486405af80, thr=0xa4864042d78) at ./storage/innobase/row/row0undo.cc:298
\#9  0x000055f1e66d7f2d in row_undo_step (thr=0xa4864042d78) at ./storage/innobase/row/row0undo.cc:351
\#10 0x000055f1e663e6a0 in que_thr_step (thr=0xa4864042d78) at ./storage/innobase/que/que0que.cc:1039
\#11 0x000055f1e663e8c1 in que_run_threads_low (thr=0xa4864042d78) at ./storage/innobase/que/que0que.cc:1103
\#12 0x000055f1e663ea73 in que_run_threads (thr=0xa4864042d78) at ./storage/innobase/que/que0que.cc:1143
\#13 0x000055f1e6733cb9 in trx_rollback_to_savepoint_low (trx=0x55522d18d188, savept=0x55522d18e198) at ./storage/innobase/trx/trx0roll.cc:107
\#14 0x000055f1e6733f5f in trx_rollback_to_savepoint (trx=0x55522d18d188, savept=0x55522d18e198) at ./storage/innobase/trx/trx0roll.cc:148
\#15 0x000055f1e6734756 in trx_rollback_last_sql_stat_for_mysql (trx=0x55522d18d188) at ./storage/innobase/trx/trx0roll.cc:281
\#16 0x000055f1e654fb65 in innobase_rollback (hton=0x55f1e8c17968, thd=0xa48640010a8, rollback_trx=false) at ./storage/innobase/handler/ha_innodb.cc:4875
\#17 0x000055f1e636dfb4 in ha_rollback_trans (thd=0xa48640010a8, all=false) at ./sql/handler.cc:1708
\#18 0x000055f1e6262a1b in trans_rollback_stmt (thd=0xa48640010a8) at ./sql/transaction.cc:565
\#19 0x000055f1e611b5e4 in mysql_execute_command (thd=0xa48640010a8) at ./sql/sql_parse.cc:6067
\#20 0x000055f1e611faf6 in mysql_parse (thd=0xa48640010a8,
    rawbuf=0xa48640109e0 "INSERT INTO t6 (col1,col2, col_int, col_string, col_text) VALUES /* NULL */ (NULL,NULL,NULL,REPEAT(SUBSTR(CAST( NULL AS CHAR),1,1), 10),REPEAT(SUBSTR(CAST( NULL AS CHAR),1,1), @fill_amount) ), (NULL,N"..., length=347, parser_state=0x663f4cad0670, is_com_multi=false, is_next_command=false) at ./sql/sql_parse.cc:7796
-------------------

3) purge is invoked, it tries to purge record B, record B has gap
lock, but the record next to the record B does not have gap lock, the
debug check is failed.

But initially on step 1 the acquired lock is not gap lock:
-----------------
dberr_t
row_ins_duplicate_error_in_clust(...)
{
...
        if (cursor->low_match >= n_unique) {
...
                        if (flags & BTR_NO_LOCKING_FLAG) {
                                /* Do nothing if no-locking is set */
                                err = DB_SUCCESS;
                        } else if (trx->duplicates) {
                                /* If the SQL-query will update or replace
                                duplicate key we will take X-lock for
                                duplicates ( REPLACE, LOAD DATAFILE REPLACE,
                                INSERT ON DUPLICATE KEY UPDATE). */
                                err = row_ins_set_exclusive_rec_lock(
                                        LOCK_REC_NOT_GAP,
                                        btr_cur_get_block(cursor),
                                        rec, cursor->index, offsets, thr);
                        } else {
                                err = row_ins_set_shared_rec_lock(
                                        LOCK_REC_NOT_GAP,
                                        btr_cur_get_block(cursor), rec,
                                        cursor->index, offsets, thr);
                        }
        }
...
}
-----------------

Then lock_rec_inherit_to_gap() copies this non-gap lock to gap lock
to the next record when transaction is rolled back and the record is
being deleted with btr_cur_optimistic_delete().

So, rollback converted that into a gap lock, what is wrong, the lock should
simply be deleted.

For this purpose convert_lock_to_gap flag is added to
lock_rec_inherit_to_gap() function arguments. When this flag is not set,
lock_rec_inherit_to_gap() ignores non-gap locks, and this flag is set
when lock_rec_inherit_to_gap() is invoked from rollback.

-------------------------------------------------------------------------------
II. When locking read is in progress, and requested ordinary-lock can not be
granted for delete-marked record due to conflicting lock, mtr is
committed, page latch is released, and purge thread can try to purge the
record. The debug check will fail as the record next to delete-marked ordinary-
locked one is not ordinary-locked.

To solve this issue hash-table of scanned record ids(page_id, heap_no
pairs) is stored in trx_t. After locking read is finished at the end of
row_search_mvcc() and rol_sel(), the hash-table is cleaned-up.

When permanent cursor is restored after the lock is granted and the
transaction thread is woken up, and the record stored in the cursor is
purged, then the position will be set to the previous or next record
dependin on the direction of scanning.

-------------------------------------------------------------------------------
Warning: the current implementation for row_sel() is wrong, because the
behaviour when there is conflicting lock is not the same as in
row_search_mvcc(), i.e. when transaction is suspended/woken up, the execution
does not leave row_search_mvcc(), while for row_sel() all necessary
steps to suspend/wake-up the thread are executed outside on row_sel(),
at the higher layer.

===============================================================================
The new system variable is added to test row_sel().

Some initial test is also added.
kevgs added a commit that referenced this pull request Sep 8, 2021
  Read of size 8 at 0x7fecf2e75fc8 by thread T2 (mutexes: write M1318):
    #0 tpool::thread_pool_generic::submit_task(tpool::task*) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../tpool/tpool_generic.cc:823:9 (mariadbd+0x25fd2d2)
    #1 (anonymous namespace)::aio_uring::thread_routine((anonymous namespace)::aio_uring*) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../tpool/aio_liburing.cc:173:20 (mariadbd+0x260b21b)
    #2 void std::__invoke_impl<void, void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*>(std::__invoke_other, void (*&&)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 (mariadbd+0x260c62a)
    #3 std::__invoke_result<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*>::type std::__invoke<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*>(void (*&&)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14 (mariadbd+0x260c4ba)
    #4 void std::thread::_Invoker<std::tuple<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:253:13 (mariadbd+0x260c442)
    #5 std::thread::_Invoker<std::tuple<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*> >::operator()() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:260:11 (mariadbd+0x260c3c5)
    #6 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*> > >::_M_run() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211:13 (mariadbd+0x260c189)
    #7 <null> <null> (libstdc++.so.6+0xd230f)

  Previous write of size 8 at 0x7fecf2e75fc8 by main thread:
    #0 tpool::task::task(void (*)(void*), void*, tpool::task_group*) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../tpool/task.cc:40:46 (mariadbd+0x260a138)
    #1 tpool::aiocb::aiocb() /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../tpool/tpool.h:147:13 (mariadbd+0x2355943)
    #2 void std::_Construct<tpool::aiocb>(tpool::aiocb*) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:109:38 (mariadbd+0x2355845)
    #3 tpool::aiocb* std::__uninitialized_default_n_1<false>::__uninit_default_n<tpool::aiocb*, unsigned long>(tpool::aiocb*, unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_uninitialized.h:579:3 (mariadbd+0x235576c)
    #4 tpool::aiocb* std::__uninitialized_default_n<tpool::aiocb*, unsigned long>(tpool::aiocb*, unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_uninitialized.h:638:14 (mariadbd+0x23556e9)
    #5 tpool::aiocb* std::__uninitialized_default_n_a<tpool::aiocb*, unsigned long, tpool::aiocb>(tpool::aiocb*, unsigned long, std::allocator<tpool::aiocb>&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_uninitialized.h:704:14 (mariadbd+0x2355641)
    #6 std::vector<tpool::aiocb, std::allocator<tpool::aiocb> >::_M_default_initialize(unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1606:4 (mariadbd+0x2354f3d)
    #7 std::vector<tpool::aiocb, std::allocator<tpool::aiocb> >::vector(unsigned long, std::allocator<tpool::aiocb> const&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:512:9 (mariadbd+0x2354a19)
    #8 tpool::cache<tpool::aiocb>::cache(unsigned long, tpool::cache_notification_mode) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../tpool/tpool_structs.h:73:20 (mariadbd+0x2354784)
    #9 io_slots::io_slots(int, int) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../storage/innobase/os/os0file.cc:93:3 (mariadbd+0x235343b)
    #10 os_aio_init() /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../storage/innobase/os/os0file.cc:3780:22 (mariadbd+0x234ebce)
    #11 srv_start(bool) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../storage/innobase/srv/srv0start.cc:1190:6 (mariadbd+0x256720c)
    #12 innodb_init(void*) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../storage/innobase/handler/ha_innodb.cc:4188:8 (mariadbd+0x1ed3bda)
    #13 ha_initialize_handlerton(st_plugin_int*) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../sql/handler.cc:659:31 (mariadbd+0xf7be06)
    #14 plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../sql/sql_plugin.cc:1463:9 (mariadbd+0x160fa1b)
    #15 plugin_init(int*, char**, int) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../sql/sql_plugin.cc:1756:15 (mariadbd+0x160f07f)
    #16 init_server_components() /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../sql/mysqld.cc:5043:7 (mariadbd+0xd70fb2)
    #17 mysqld_main(int, char**) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../sql/mysqld.cc:5655:7 (mariadbd+0xd6a9d7)
    #18 main /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../sql/main.cc:34:10 (mariadbd+0xd65d18)

Here T2 accesses tpool::task while the main thread still initializes it!
aio_uring accesses io_slots and thus io_slots should be initialized before it.
So, fixing by changing the order or initialization.
kevgs added a commit that referenced this pull request Sep 8, 2021
  Read of size 8 at 0x7fecf2e75fc8 by thread T2 (mutexes: write M1318):
    #0 tpool::thread_pool_generic::submit_task(tpool::task*) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../tpool/tpool_generic.cc:823:9 (mariadbd+0x25fd2d2)
    #1 (anonymous namespace)::aio_uring::thread_routine((anonymous namespace)::aio_uring*) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../tpool/aio_liburing.cc:173:20 (mariadbd+0x260b21b)
    #2 void std::__invoke_impl<void, void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*>(std::__invoke_other, void (*&&)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 (mariadbd+0x260c62a)
    #3 std::__invoke_result<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*>::type std::__invoke<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*>(void (*&&)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14 (mariadbd+0x260c4ba)
    #4 void std::thread::_Invoker<std::tuple<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:253:13 (mariadbd+0x260c442)
    #5 std::thread::_Invoker<std::tuple<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*> >::operator()() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:260:11 (mariadbd+0x260c3c5)
    #6 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*> > >::_M_run() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211:13 (mariadbd+0x260c189)
    #7 <null> <null> (libstdc++.so.6+0xd230f)

  Previous write of size 8 at 0x7fecf2e75fc8 by main thread:
    #0 tpool::task::task(void (*)(void*), void*, tpool::task_group*) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../tpool/task.cc:40:46 (mariadbd+0x260a138)
    #1 tpool::aiocb::aiocb() /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../tpool/tpool.h:147:13 (mariadbd+0x2355943)
    #2 void std::_Construct<tpool::aiocb>(tpool::aiocb*) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:109:38 (mariadbd+0x2355845)
    #3 tpool::aiocb* std::__uninitialized_default_n_1<false>::__uninit_default_n<tpool::aiocb*, unsigned long>(tpool::aiocb*, unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_uninitialized.h:579:3 (mariadbd+0x235576c)
    #4 tpool::aiocb* std::__uninitialized_default_n<tpool::aiocb*, unsigned long>(tpool::aiocb*, unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_uninitialized.h:638:14 (mariadbd+0x23556e9)
    #5 tpool::aiocb* std::__uninitialized_default_n_a<tpool::aiocb*, unsigned long, tpool::aiocb>(tpool::aiocb*, unsigned long, std::allocator<tpool::aiocb>&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_uninitialized.h:704:14 (mariadbd+0x2355641)
    #6 std::vector<tpool::aiocb, std::allocator<tpool::aiocb> >::_M_default_initialize(unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1606:4 (mariadbd+0x2354f3d)
    #7 std::vector<tpool::aiocb, std::allocator<tpool::aiocb> >::vector(unsigned long, std::allocator<tpool::aiocb> const&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:512:9 (mariadbd+0x2354a19)
    #8 tpool::cache<tpool::aiocb>::cache(unsigned long, tpool::cache_notification_mode) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../tpool/tpool_structs.h:73:20 (mariadbd+0x2354784)
    #9 io_slots::io_slots(int, int) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../storage/innobase/os/os0file.cc:93:3 (mariadbd+0x235343b)
    #10 os_aio_init() /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../storage/innobase/os/os0file.cc:3780:22 (mariadbd+0x234ebce)
    #11 srv_start(bool) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../storage/innobase/srv/srv0start.cc:1190:6 (mariadbd+0x256720c)
    #12 innodb_init(void*) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../storage/innobase/handler/ha_innodb.cc:4188:8 (mariadbd+0x1ed3bda)
    #13 ha_initialize_handlerton(st_plugin_int*) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../sql/handler.cc:659:31 (mariadbd+0xf7be06)
    #14 plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../sql/sql_plugin.cc:1463:9 (mariadbd+0x160fa1b)
    #15 plugin_init(int*, char**, int) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../sql/sql_plugin.cc:1756:15 (mariadbd+0x160f07f)
    #16 init_server_components() /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../sql/mysqld.cc:5043:7 (mariadbd+0xd70fb2)
    #17 mysqld_main(int, char**) /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../sql/mysqld.cc:5655:7 (mariadbd+0xd6a9d7)
    #18 main /home/kevgs/work/m/bb-10.6-kevgs/build_tsan/../sql/main.cc:34:10 (mariadbd+0xd65d18)

Here T2 accesses tpool::task while the main thread still initializes it!
aio_uring accesses io_slots and thus io_slots should be initialized before it.
So, fixing by changing the order or initialization.
kevgs added a commit that referenced this pull request Sep 8, 2021
  Read of size 8 at 0x7fecf2e75fc8 by thread T2 (mutexes: write M1318):
    #0 tpool::thread_pool_generic::submit_task(tpool::task*) /tpool/tpool_generic.cc:823:9 (mariadbd+0x25fd2d2)
    #1 (anonymous namespace)::aio_uring::thread_routine((anonymous namespace)::aio_uring*) /tpool/aio_liburing.cc:173:20 (mariadbd+0x260b21b)
    #2 void std::__invoke_impl<void, void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*>(std::__invoke_other, void (*&&)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 (mariadbd+0x260c62a)
    #3 std::__invoke_result<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*>::type std::__invoke<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*>(void (*&&)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14 (mariadbd+0x260c4ba)
    #4 void std::thread::_Invoker<std::tuple<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:253:13 (mariadbd+0x260c442)
    #5 std::thread::_Invoker<std::tuple<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*> >::operator()() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:260:11 (mariadbd+0x260c3c5)
    #6 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*> > >::_M_run() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211:13 (mariadbd+0x260c189)
    #7 <null> <null> (libstdc++.so.6+0xd230f)

  Previous write of size 8 at 0x7fecf2e75fc8 by main thread:
    #0 tpool::task::task(void (*)(void*), void*, tpool::task_group*) /tpool/task.cc:40:46 (mariadbd+0x260a138)
    #1 tpool::aiocb::aiocb() /tpool/tpool.h:147:13 (mariadbd+0x2355943)
    #2 void std::_Construct<tpool::aiocb>(tpool::aiocb*) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:109:38 (mariadbd+0x2355845)
    #3 tpool::aiocb* std::__uninitialized_default_n_1<false>::__uninit_default_n<tpool::aiocb*, unsigned long>(tpool::aiocb*, unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_uninitialized.h:579:3 (mariadbd+0x235576c)
    #4 tpool::aiocb* std::__uninitialized_default_n<tpool::aiocb*, unsigned long>(tpool::aiocb*, unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_uninitialized.h:638:14 (mariadbd+0x23556e9)
    #5 tpool::aiocb* std::__uninitialized_default_n_a<tpool::aiocb*, unsigned long, tpool::aiocb>(tpool::aiocb*, unsigned long, std::allocator<tpool::aiocb>&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_uninitialized.h:704:14 (mariadbd+0x2355641)
    #6 std::vector<tpool::aiocb, std::allocator<tpool::aiocb> >::_M_default_initialize(unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1606:4 (mariadbd+0x2354f3d)
    #7 std::vector<tpool::aiocb, std::allocator<tpool::aiocb> >::vector(unsigned long, std::allocator<tpool::aiocb> const&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:512:9 (mariadbd+0x2354a19)
    #8 tpool::cache<tpool::aiocb>::cache(unsigned long, tpool::cache_notification_mode) /tpool/tpool_structs.h:73:20 (mariadbd+0x2354784)
    #9 io_slots::io_slots(int, int) /storage/innobase/os/os0file.cc:93:3 (mariadbd+0x235343b)
    #10 os_aio_init() /storage/innobase/os/os0file.cc:3780:22 (mariadbd+0x234ebce)
    #11 srv_start(bool) /storage/innobase/srv/srv0start.cc:1190:6 (mariadbd+0x256720c)
    #12 innodb_init(void*) /storage/innobase/handler/ha_innodb.cc:4188:8 (mariadbd+0x1ed3bda)
    #13 ha_initialize_handlerton(st_plugin_int*) /sql/handler.cc:659:31 (mariadbd+0xf7be06)
    #14 plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) /sql/sql_plugin.cc:1463:9 (mariadbd+0x160fa1b)
    #15 plugin_init(int*, char**, int) /sql/sql_plugin.cc:1756:15 (mariadbd+0x160f07f)
    #16 init_server_components() /sql/mysqld.cc:5043:7 (mariadbd+0xd70fb2)
    #17 mysqld_main(int, char**) /sql/mysqld.cc:5655:7 (mariadbd+0xd6a9d7)
    #18 main /sql/main.cc:34:10 (mariadbd+0xd65d18)

I think the report is incorrect: it's not possible to have such a race
condition. I've checked it by reading the code and putting assertions.
Namely, no aio I/O is possible before the end of os_aio_init().
Most probably it's some bug in TSAN. But the patch fixes around 5 related
reports and this is a step toward TSAN usefullness. Currently it reports too
much noise.

std::unique_ptr is a step toward https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#r11-avoid-calling-new-and-delete-explicitly
There is no std::make_unique() in C++11, however.
kevgs added a commit that referenced this pull request Sep 8, 2021
  Read of size 8 at 0x7fecf2e75fc8 by thread T2 (mutexes: write M1318):
    #0 tpool::thread_pool_generic::submit_task(tpool::task*) /tpool/tpool_generic.cc:823:9 (mariadbd+0x25fd2d2)
    #1 (anonymous namespace)::aio_uring::thread_routine((anonymous namespace)::aio_uring*) /tpool/aio_liburing.cc:173:20 (mariadbd+0x260b21b)
    #2 void std::__invoke_impl<void, void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*>(std::__invoke_other, void (*&&)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 (mariadbd+0x260c62a)
    #3 std::__invoke_result<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*>::type std::__invoke<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*>(void (*&&)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14 (mariadbd+0x260c4ba)
    #4 void std::thread::_Invoker<std::tuple<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:253:13 (mariadbd+0x260c442)
    #5 std::thread::_Invoker<std::tuple<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*> >::operator()() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:260:11 (mariadbd+0x260c3c5)
    #6 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)((anonymous namespace)::aio_uring*), (anonymous namespace)::aio_uring*> > >::_M_run() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211:13 (mariadbd+0x260c189)
    #7 <null> <null> (libstdc++.so.6+0xd230f)

  Previous write of size 8 at 0x7fecf2e75fc8 by main thread:
    #0 tpool::task::task(void (*)(void*), void*, tpool::task_group*) /tpool/task.cc:40:46 (mariadbd+0x260a138)
    #1 tpool::aiocb::aiocb() /tpool/tpool.h:147:13 (mariadbd+0x2355943)
    #2 void std::_Construct<tpool::aiocb>(tpool::aiocb*) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:109:38 (mariadbd+0x2355845)
    #3 tpool::aiocb* std::__uninitialized_default_n_1<false>::__uninit_default_n<tpool::aiocb*, unsigned long>(tpool::aiocb*, unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_uninitialized.h:579:3 (mariadbd+0x235576c)
    #4 tpool::aiocb* std::__uninitialized_default_n<tpool::aiocb*, unsigned long>(tpool::aiocb*, unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_uninitialized.h:638:14 (mariadbd+0x23556e9)
    #5 tpool::aiocb* std::__uninitialized_default_n_a<tpool::aiocb*, unsigned long, tpool::aiocb>(tpool::aiocb*, unsigned long, std::allocator<tpool::aiocb>&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_uninitialized.h:704:14 (mariadbd+0x2355641)
    #6 std::vector<tpool::aiocb, std::allocator<tpool::aiocb> >::_M_default_initialize(unsigned long) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1606:4 (mariadbd+0x2354f3d)
    #7 std::vector<tpool::aiocb, std::allocator<tpool::aiocb> >::vector(unsigned long, std::allocator<tpool::aiocb> const&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:512:9 (mariadbd+0x2354a19)
    #8 tpool::cache<tpool::aiocb>::cache(unsigned long, tpool::cache_notification_mode) /tpool/tpool_structs.h:73:20 (mariadbd+0x2354784)
    #9 io_slots::io_slots(int, int) /storage/innobase/os/os0file.cc:93:3 (mariadbd+0x235343b)
    #10 os_aio_init() /storage/innobase/os/os0file.cc:3780:22 (mariadbd+0x234ebce)
    #11 srv_start(bool) /storage/innobase/srv/srv0start.cc:1190:6 (mariadbd+0x256720c)
    #12 innodb_init(void*) /storage/innobase/handler/ha_innodb.cc:4188:8 (mariadbd+0x1ed3bda)
    #13 ha_initialize_handlerton(st_plugin_int*) /sql/handler.cc:659:31 (mariadbd+0xf7be06)
    #14 plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) /sql/sql_plugin.cc:1463:9 (mariadbd+0x160fa1b)
    #15 plugin_init(int*, char**, int) /sql/sql_plugin.cc:1756:15 (mariadbd+0x160f07f)
    #16 init_server_components() /sql/mysqld.cc:5043:7 (mariadbd+0xd70fb2)
    #17 mysqld_main(int, char**) /sql/mysqld.cc:5655:7 (mariadbd+0xd6a9d7)
    #18 main /sql/main.cc:34:10 (mariadbd+0xd65d18)

I think the report is incorrect: it's not possible to have such a race
condition. I've checked it by reading the code and putting assertions.
Namely, no aio I/O is possible before the end of os_aio_init().
Most probably it's some bug in TSAN. But the patch fixes around 5 related
reports and this is a step toward TSAN usefullness. Currently it reports too
much noise.

std::unique_ptr is a step toward https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#r11-avoid-calling-new-and-delete-explicitly
There is no std::make_unique() in C++11, however.
kevgs added a commit that referenced this pull request Sep 8, 2021
WARNING: ThreadSanitizer: data race (pid=1503350)
  Write of size 8 at 0x0000067b1f20 by thread T3:
    #0 os_file_sync_posix(int) /storage/innobase/os/os0file.cc:895:5 (mariadbd+0x23493f6)
    #1 os_file_flush_func(int) /storage/innobase/os/os0file.cc:983:8 (mariadbd+0x2349204)
    #2 file_os_io::flush() /storage/innobase/log/log0log.cc:326:10 (mariadbd+0x22eaaa9)
    #3 log_file_t::flush() /storage/innobase/log/log0log.cc:440:18 (mariadbd+0x22eb2d0)
    #4 log_t::file::flush() /storage/innobase/log/log0log.cc:507:29 (mariadbd+0x22ebe69)
    #5 log_write_flush_to_disk_low(unsigned long) /storage/innobase/log/log0log.cc:629:17 (mariadbd+0x22ed3f3)
    #6 log_write_up_to(unsigned long, bool, bool, completion_callback const*) /storage/innobase/log/log0log.cc:829:3 (mariadbd+0x22ecb04)
    #7 log_checkpoint_low(unsigned long, unsigned long) /storage/innobase/buf/buf0flu.cc:1734:5 (mariadbd+0x20d37f1)
    #8 buf_flush_sync_for_checkpoint(unsigned long) /storage/innobase/buf/buf0flu.cc:1947:7 (mariadbd+0x20d4193)
    #9 buf_flush_page_cleaner() /storage/innobase/buf/buf0flu.cc:2186:9 (mariadbd+0x20cdad7)
    #10 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 (mariadbd+0x20c3aaa)
    #11 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14 (mariadbd+0x20c39bd)
    #12 void std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:253:13 (mariadbd+0x20c3965)
    #13 std::thread::_Invoker<std::tuple<void (*)()> >::operator()() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:260:11 (mariadbd+0x20c3905)
    #14 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211:13 (mariadbd+0x20c37f9)
    #15 <null> <null> (libstdc++.so.6+0xd230f)

  Previous write of size 8 at 0x0000067b1f20 by main thread:
    #0 os_file_sync_posix(int) /storage/innobase/os/os0file.cc:895:5 (mariadbd+0x23493f6)
    #1 os_file_flush_func(int) /storage/innobase/os/os0file.cc:983:8 (mariadbd+0x2349204)
    #2 fil_space_t::flush_low() /storage/innobase/fil/fil0fil.cc:504:5 (mariadbd+0x205cad5)
    #3 fil_flush_file_spaces() /storage/innobase/fil/fil0fil.cc:2947:13 (mariadbd+0x206523f)
    #4 log_checkpoint() /storage/innobase/buf/buf0flu.cc:1777:5 (mariadbd+0x20cd069)
    #5 buf_flush_wait_flushed(unsigned long) /storage/innobase/buf/buf0flu.cc:1867:5 (mariadbd+0x20ccf95)
    #6 log_make_checkpoint() /storage/innobase/buf/buf0flu.cc:1793:3 (mariadbd+0x20cc4c9)
    #7 buf_dblwr_t::create() /storage/innobase/buf/buf0dblwr.cc:216:3 (mariadbd+0x209076a)
    #8 srv_start(bool) /storage/innobase/srv/srv0start.cc:1685:20 (mariadbd+0x256b514)
    #9 innodb_init(void*) /storage/innobase/handler/ha_innodb.cc:4188:8 (mariadbd+0x1ed406a)
    #10 ha_initialize_handlerton(st_plugin_int*) /sql/handler.cc:659:31 (mariadbd+0xf7c246)
    #11 plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) /sql/sql_plugin.cc:1463:9 (mariadbd+0x160fe6b)
    #12 plugin_init(int*, char**, int) /sql/sql_plugin.cc:1756:15 (mariadbd+0x160f4cf)
    #13 init_server_components() /sql/mysqld.cc:5043:7 (mariadbd+0xd713f2)
    #14 mysqld_main(int, char**) /sql/mysqld.cc:5655:7 (mariadbd+0xd6ae17)
    #15 main /sql/main.cc:34:10 (mariadbd+0xd66158)

This is a correct report by TSAN for an obvious case: unprotected global
counter. Fix it by making counter std::atomic.
kevgs added a commit that referenced this pull request Sep 8, 2021
WARNING: ThreadSanitizer: data race (pid=1506937)
  Write of size 8 at 0x0000067ab740 by thread T6:
    #0 buf_page_get_low(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, mtr_t*, dberr_t*, bool) /storage/innobase/buf/buf0buf.cc:2946:8 (mariadbd+0x2014c7f)
    #1 buf_page_get_gen(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, mtr_t*, dberr_t*, bool) /storage/innobase/buf/buf0buf.cc:3047:10 (mariadbd+0x2016216)
    #2 btr_cur_search_to_nth_level_func(dict_index_t*, unsigned long, dtuple_t const*, page_cur_mode_t, unsigned long, btr_cur_t*, ssux_lock_impl<true>*, mtr_t*, unsigned long) /storage/innobase/btr/btr0cur.cc:1613:10 (mariadbd+0x1fb5bff)
    #3 btr_pcur_open_low(dict_index_t*, unsigned long, dtuple_t const*, page_cur_mode_t, unsigned long, btr_pcur_t*, unsigned long, mtr_t*) /storage/innobase/include/btr0pcur.ic:439:8 (mariadbd+0x24ddead)
    #4 row_search_on_row_ref(btr_pcur_t*, unsigned long, dict_table_t const*, dtuple_t const*, mtr_t*) /storage/innobase/row/row0row.cc:1215:7 (mariadbd+0x24dd537)
    #5 row_purge_reposition_pcur(unsigned long, purge_node_t*, mtr_t*) /storage/innobase/row/row0purge.cc:81:23 (mariadbd+0x24c5369)
    #6 row_purge_reset_trx_id(purge_node_t*, mtr_t*) /storage/innobase/row/row0purge.cc:748:6 (mariadbd+0x24c90c7)
    #7 row_purge_record_func(purge_node_t*, unsigned char*, que_thr_t const*, bool) /storage/innobase/row/row0purge.cc:1174:4 (mariadbd+0x24c8262)
    #8 row_purge(purge_node_t*, unsigned char*, que_thr_t*) /storage/innobase/row/row0purge.cc:1218:18 (mariadbd+0x24c5af3)
    #9 row_purge_step(que_thr_t*) /storage/innobase/row/row0purge.cc:1267:3 (mariadbd+0x24c5996)
    #10 que_thr_step(que_thr_t*) /storage/innobase/que/que0que.cc:653:9 (mariadbd+0x23d5298)
    #11 que_run_threads_low(que_thr_t*) /storage/innobase/que/que0que.cc:709:25 (mariadbd+0x23d3f29)
    #12 que_run_threads(que_thr_t*) /storage/innobase/que/que0que.cc:729:2 (mariadbd+0x23d3bdf)
    #13 srv_task_execute() /storage/innobase/srv/srv0srv.cc:1692:3 (mariadbd+0x2562841)
    #14 purge_worker_callback(void*) /storage/innobase/srv/srv0srv.cc:1864:10 (mariadbd+0x255f361)
    #15 tpool::task_group::execute(tpool::task*) /tpool/task_group.cc:55:9 (mariadbd+0x260a5ca)
    #16 tpool::task::execute() /tpool/task.cc:47:16 (mariadbd+0x260adf6)
    #17 tpool::thread_pool_generic::worker_main(tpool::worker_data*) /tpool/tpool_generic.cc:550:11 (mariadbd+0x25fc590)
    #18 void std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*>(std::__invoke_memfun_deref, void (tpool::thread_pool_generic::*&&)(tpool::worker_data*), tpool::thread_pool_generic*&&, tpool::worker_data*&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:74:14 (mariadbd+0x26061b5)
    #19 std::__invoke_result<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*>::type std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*>(void (tpool::thread_pool_generic::*&&)(tpool::worker_data*), tpool::thread_pool_generic*&&, tpool::worker_data*&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14 (mariadbd+0x2605f57)
    #20 void std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:253:13 (mariadbd+0x2605ecb)
    #21 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator()() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:260:11 (mariadbd+0x2605e35)
    #22 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211:13 (mariadbd+0x2605ac9)
    #23 <null> <null> (libstdc++.so.6+0xd230f)

  Previous write of size 8 at 0x0000067ab740 by thread T8:
    #0 buf_page_get_low(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, mtr_t*, dberr_t*, bool) /storage/innobase/buf/buf0buf.cc:2946:8 (mariadbd+0x2014c7f)
    #1 buf_page_get_gen(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, mtr_t*, dberr_t*, bool) /storage/innobase/buf/buf0buf.cc:3047:10 (mariadbd+0x2016216)
    #2 btr_cur_search_to_nth_level_func(dict_index_t*, unsigned long, dtuple_t const*, page_cur_mode_t, unsigned long, btr_cur_t*, ssux_lock_impl<true>*, mtr_t*, unsigned long) /storage/innobase/btr/btr0cur.cc:1613:10 (mariadbd+0x1fb5bff)
    #3 btr_pcur_open_low(dict_index_t*, unsigned long, dtuple_t const*, page_cur_mode_t, unsigned long, btr_pcur_t*, unsigned long, mtr_t*) /storage/innobase/include/btr0pcur.ic:439:8 (mariadbd+0x24ddead)
    #4 row_search_on_row_ref(btr_pcur_t*, unsigned long, dict_table_t const*, dtuple_t const*, mtr_t*) /storage/innobase/row/row0row.cc:1215:7 (mariadbd+0x24dd537)
    #5 row_purge_reposition_pcur(unsigned long, purge_node_t*, mtr_t*) /storage/innobase/row/row0purge.cc:81:23 (mariadbd+0x24c5369)
    #6 row_purge_reset_trx_id(purge_node_t*, mtr_t*) /storage/innobase/row/row0purge.cc:748:6 (mariadbd+0x24c90c7)
    #7 row_purge_record_func(purge_node_t*, unsigned char*, que_thr_t const*, bool) /storage/innobase/row/row0purge.cc:1174:4 (mariadbd+0x24c8262)
    #8 row_purge(purge_node_t*, unsigned char*, que_thr_t*) /storage/innobase/row/row0purge.cc:1218:18 (mariadbd+0x24c5af3)
    #9 row_purge_step(que_thr_t*) /storage/innobase/row/row0purge.cc:1267:3 (mariadbd+0x24c5996)
    #10 que_thr_step(que_thr_t*) /storage/innobase/que/que0que.cc:653:9 (mariadbd+0x23d5298)
    #11 que_run_threads_low(que_thr_t*) /storage/innobase/que/que0que.cc:709:25 (mariadbd+0x23d3f29)
    #12 que_run_threads(que_thr_t*) /storage/innobase/que/que0que.cc:729:2 (mariadbd+0x23d3bdf)
    #13 trx_purge(unsigned long, bool) /storage/innobase/trx/trx0purge.cc:1271:2 (mariadbd+0x25841b4)
    #14 srv_do_purge(unsigned long*) /storage/innobase/srv/srv0srv.cc:1784:20 (mariadbd+0x2563224)
    #15 purge_coordinator_callback_low() /storage/innobase/srv/srv0srv.cc:1881:35 (mariadbd+0x2562b3b)
    #16 purge_coordinator_callback(void*) /storage/innobase/srv/srv0srv.cc:1910:3 (mariadbd+0x255f4ab)
    #17 tpool::task_group::execute(tpool::task*) /tpool/task_group.cc:55:9 (mariadbd+0x260a5ca)
    #18 tpool::task::execute() /tpool/task.cc:47:16 (mariadbd+0x260adf6)
    #19 tpool::thread_pool_generic::worker_main(tpool::worker_data*) /tpool/tpool_generic.cc:550:11 (mariadbd+0x25fc590)
    #20 void std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*>(std::__invoke_memfun_deref, void (tpool::thread_pool_generic::*&&)(tpool::worker_data*), tpool::thread_pool_generic*&&, tpool::worker_data*&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:74:14 (mariadbd+0x26061b5)
    #21 std::__invoke_result<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*>::type std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*>(void (tpool::thread_pool_generic::*&&)(tpool::worker_data*), tpool::thread_pool_generic*&&, tpool::worker_data*&&) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14 (mariadbd+0x2605f57)
    #22 void std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:253:13 (mariadbd+0x2605ecb)
    #23 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator()() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:260:11 (mariadbd+0x2605e35)
    #24 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211:13 (mariadbd+0x2605ac9)
    #25 <null> <null> (libstdc++.so.6+0xd230f)

  Location is global 'buf_dbg_counter' of size 8 at 0x0000067ab740 (mariadbd+0x67ab740)

  The obvious fix is to make counter atomic.
kevgs added a commit that referenced this pull request Sep 8, 2021
  Write of size 1 at 0x0000067abe08 by thread T3 (mutexes: write M1372):
    #0 buf_flush_page_cleaner() /storage/innobase/buf/buf0flu.cc:2366:29 (mariadbd+0x20cea7c)
    #1 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 (mariadbd+0x20c3a8a)
    #2 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14 (mariadbd+0x20c399d)
    #3 void std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:253:13 (mariadbd+0x20c3945)
    #4 std::thread::_Invoker<std::tuple<void (*)()> >::operator()() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:260:11 (mariadbd+0x20c38e5)
    #5 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211:13 (mariadbd+0x20c37d9)
    #6 <null> <null> (libstdc++.so.6+0xd230f)

  Previous read of size 1 at 0x0000067abe08 by main thread:
    #0 logs_empty_and_mark_files_at_shutdown() /storage/innobase/log/log0log.cc:1094:6 (mariadbd+0x22eeff3)
    #1 innodb_shutdown() /storage/innobase/srv/srv0start.cc:1970:3 (mariadbd+0x256ffd6)
    #2 innobase_end(handlerton*, ha_panic_function) /storage/innobase/handler/ha_innodb.cc:4265:3 (mariadbd+0x1ee3fc4)
    #3 ha_finalize_handlerton(st_plugin_int*) /sql/handler.cc:595:5 (mariadbd+0xf7bac9)
    #4 plugin_deinitialize(st_plugin_int*, bool) /sql/sql_plugin.cc:1266:9 (mariadbd+0x1611789)
    #5 reap_plugins() /sql/sql_plugin.cc:1342:7 (mariadbd+0x160e17d)
    #6 plugin_shutdown() /sql/sql_plugin.cc:2050:7 (mariadbd+0x1611f42)
    #7 clean_up(bool) /sql/mysqld.cc:1923:3 (mariadbd+0xd67a4c)
    #8 unireg_abort /sql/mysqld.cc:1835:3 (mariadbd+0xd67605)
    #9 mysqld_main(int, char**) /sql/mysqld.cc:5741:7 (mariadbd+0xd6b36a)
    #10 main /sql/main.cc:34:10 (mariadbd+0xd661a8)

  Location is global 'buf_page_cleaner_is_active' of size 1 at 0x0000067abe08 (mariadbd+0x67abe08)
kevgs added a commit that referenced this pull request Sep 8, 2021
WARNING: ThreadSanitizer: data race (pid=1510842)
  Write of size 8 at 0x0000067b1e98 by main thread:
    #0 os_file_pwrite(IORequest const&, int, unsigned char const*, unsigned long, unsigned long, dberr_t*) /storage/innobase/os/os0file.cc:2928:2 (mariadbd+0x234c5ac)
    #1 os_file_write_func(IORequest const&, char const*, int, void const*, unsigned long, unsigned long) /storage/innobase/os/os0file.cc:2963:20 (mariadbd+0x234c019)
    #2 file_os_io::write(char const*, unsigned long, st_::span<unsigned char const>) /storage/innobase/log/log0log.cc:320:10 (mariadbd+0x22eaa50)
    #3 log_file_t::write(unsigned long, st_::span<unsigned char const>) /storage/innobase/log/log0log.cc:434:18 (mariadbd+0x22eb1d8)
    #4 log_t::file::write(unsigned long, st_::span<unsigned char>) /storage/innobase/log/log0log.cc:496:29 (mariadbd+0x22ebb55)
    #5 log_write_buf(unsigned char*, unsigned long, unsigned long, unsigned long, unsigned long) /storage/innobase/log/log0log.cc:614:14 (mariadbd+0x22f1b51)
    #6 log_write(bool) /storage/innobase/log/log0log.cc:755:2 (mariadbd+0x22ed2ec)
    #7 log_write_up_to(unsigned long, bool, bool, completion_callback const*) /storage/innobase/log/log0log.cc:817:5 (mariadbd+0x22eca44)
    #8 log_checkpoint_low(unsigned long, unsigned long) /storage/innobase/buf/buf0flu.cc:1734:5 (mariadbd+0x20d37c1)
    #9 log_checkpoint() /storage/innobase/buf/buf0flu.cc:1787:10 (mariadbd+0x20cd155)
    #10 buf_flush_wait_flushed(unsigned long) /storage/innobase/buf/buf0flu.cc:1867:5 (mariadbd+0x20ccf8f)
    #11 log_make_checkpoint() /storage/innobase/buf/buf0flu.cc:1793:3 (mariadbd+0x20cc4c9)
    #12 buf_dblwr_t::create() /storage/innobase/buf/buf0dblwr.cc:216:3 (mariadbd+0x209076a)
    #13 srv_start(bool) /storage/innobase/srv/srv0start.cc:1685:20 (mariadbd+0x256b4aa)
    #14 innodb_init(void*) /storage/innobase/handler/ha_innodb.cc:4188:8 (mariadbd+0x1ed40da)
    #15 ha_initialize_handlerton(st_plugin_int*) /sql/handler.cc:659:31 (mariadbd+0xf7c2b6)
    #16 plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool) /sql/sql_plugin.cc:1463:9 (mariadbd+0x160fedb)
    #17 plugin_init(int*, char**, int) /sql/sql_plugin.cc:1756:15 (mariadbd+0x160f53f)
    #18 init_server_components() /sql/mysqld.cc:5043:7 (mariadbd+0xd71462)
    #19 mysqld_main(int, char**) /sql/mysqld.cc:5655:7 (mariadbd+0xd6ae87)
    #20 main /sql/main.cc:34:10 (mariadbd+0xd661c8)

  Previous write of size 8 at 0x0000067b1e98 by thread T3:
    #0 os_file_pwrite(IORequest const&, int, unsigned char const*, unsigned long, unsigned long, dberr_t*) /storage/innobase/os/os0file.cc:2928:2 (mariadbd+0x234c5ac)
    #1 os_file_write_func(IORequest const&, char const*, int, void const*, unsigned long, unsigned long) /storage/innobase/os/os0file.cc:2963:20 (mariadbd+0x234c019)
    #2 file_os_io::write(char const*, unsigned long, st_::span<unsigned char const>) /storage/innobase/log/log0log.cc:320:10 (mariadbd+0x22eaa50)
    #3 log_file_t::write(unsigned long, st_::span<unsigned char const>) /storage/innobase/log/log0log.cc:434:18 (mariadbd+0x22eb1d8)
    #4 log_t::file::write(unsigned long, st_::span<unsigned char>) /storage/innobase/log/log0log.cc:496:29 (mariadbd+0x22ebb55)
    #5 log_write_checkpoint_info(unsigned long) /storage/innobase/log/log0log.cc:911:14 (mariadbd+0x22edd4e)
    #6 log_checkpoint_low(unsigned long, unsigned long) /storage/innobase/buf/buf0flu.cc:1755:3 (mariadbd+0x20d3a3d)
    #7 buf_flush_sync_for_checkpoint(unsigned long) /storage/innobase/buf/buf0flu.cc:1947:7 (mariadbd+0x20d4163)
    #8 buf_flush_page_cleaner() /storage/innobase/buf/buf0flu.cc:2186:9 (mariadbd+0x20cdab1)
    #9 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14 (mariadbd+0x20c3aaa)
    #10 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14 (mariadbd+0x20c39bd)
    #11 void std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:253:13 (mariadbd+0x20c3965)
    #12 std::thread::_Invoker<std::tuple<void (*)()> >::operator()() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:260:11 (mariadbd+0x20c3905)
    #13 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run() /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211:13 (mariadbd+0x20c37f9)
    #14 <null> <null> (libstdc++.so.6+0xd230f)

  Location is global 'os_n_file_writes' of size 8 at 0x0000067b1e98 (mariadbd+0x67b1e98)

  Make variable atomic.
spetrunia added a commit that referenced this pull request Jan 11, 2022
In Histogram_json_hb::point_selectivity(), do return selectivity of 0.0
when the histogram says so.

The logic of "Do not return 0.0 estimate as it causes a multiply-by-zero
meltdown in cost and cardinality calculations" is moved into
records_in_column_ranges() where it is one *once* per column pair (as
opposed to doing once per range, which can cause the error to add-up
to large number when there are many ranges)
spetrunia added a commit that referenced this pull request Jan 19, 2022
In Histogram_json_hb::point_selectivity(), do return selectivity of 0.0
when the histogram says so.

The logic of "Do not return 0.0 estimate as it causes a multiply-by-zero
meltdown in cost and cardinality calculations" is moved into
records_in_column_ranges() where it is one *once* per column pair (as
opposed to doing once per range, which can cause the error to add-up
to large number when there are many ranges)
spetrunia added a commit that referenced this pull request Mar 17, 2022
janlindstrom added a commit that referenced this pull request Jun 12, 2022
janlindstrom added a commit that referenced this pull request Jun 13, 2022
mariadb-RuchaDeodhar added a commit that referenced this pull request Sep 29, 2022
Underlying causes of all bugs mentioned below are same. This patch fixes
all of them:
1) MDEV-25028: MDEV-25028: ASAN use-after-poison in
base_list_iterator::next or Assertion `sl->join == 0' upon
INSERT .. RETURNING via PS
2) MDEV-25187: Assertion `inited == NONE || table->open_by_handler'
failed or Direct leak in init_dynamic_array2 upon INSERT .. RETURNING
and memory leak in init_dynamic_array2
3) MDEV-28740: crash in INSERT RETURNING subquery in prepared statements
4) MDEV-27165: crash in base_list_iterator::next

Analysis:
consider this statement:
INSERT(#1)...SELECT(#2)...(SELECT(#3)...) RETURNING (SELECT(#4)...)

When RETURNING is encountered, add_slave() changes how selects are linked.
It makes the builtin_select(#1) slave of SELECT(#2). This causes
losing of already existing slave(#3) (which is nested select of SELECT of
INSERT...SELECT). When really, builtin_select (#1) shouldn't be slave to
SELECT(#2) because it is not nested within it. Also, push_select() to use
correct context also changed how select are linked.
During reinit_stmt_before_use(), we expect the selects to
be cleaned-up and have join=0. Since these selects are not linked correctly,
clean-up doesn't happen correctly so join is not NULL. Hence the crash.

Fix:
IF we are parsing RETURNING, make is_parsing_returning= true for
current select. get rid of add_slave(). In place of push_select(), used
push_context() to have correct context (the context of builtin_select)
to resolve items in item_list. And add these items to item_list of
builtin_select.
illuusio pushed a commit to illuusio/server that referenced this pull request Aug 14, 2023
Follow-up patch for #848616 to match upstream
andrelkin added a commit that referenced this pull request Oct 16, 2023
- renaming
- function header comments added
- hyperoptimization of `wfc->parent_commit_started` is removed
  for the reason of not having been proved safe
- the size of the XAP sliding window is doubled to account
  a possibility of XAP_k -> XAP_k+2|W|-1 dependency.
  Say k=1, and the # of Workers is 4. Transaction are distributed RR,
  then it's possible to have T^*_1 -> T^*_8. It's seen from worker
  queues. The queue depelops downward:

  W1 ...  W4
  1^* 2 3 4
  5   6 7 8^*

 Worker # 1 has assigned with
  T_1 and T_5. Worker #4 can take on its T_8 when T_1 is yet at the
  beginning of its processing, so even before XA START of that XAP.

  This analysis was done couple of weeks ago, but I have not found a
  commit planned to cover it.
andrelkin added a commit that referenced this pull request Oct 16, 2023
XA-Prepare group of events and its XA-"complete" terminator
are made distributed Round-Robin across parallel slave workers.
The former hash-based policy was proven to attribute to execution
latency being prone to create big - many times larger than the size
of the worker pool - queue of binlog-ordered transactions
to commit.

Log of changes:

MDEV-31949 intermediate commit: Made XA RR more robbust

- XAC may proceed even to binlog in parallel with its (then earlier) XAP
TODO:
- convert bing numbers of spin time to a mutex/cond signal
- cleanup (e.g generalize is_explicit_XA() to cover the async XAC.

CAVEAT: esp with big # of workers XAC spin wait might run out of
predefined loops (see the convert of TODO).

Cover Brendon's catch of an apparent lack of unpinning by the xid owner XAP.

MDEV-31949: intermediate commit fixes

XAC is not retrying anymore, and
it also waits much more patiently for XAP's xid release

todo: to test on env when XID_cache_element::uninitialized() assert
      fired eso to see any effect of XAC not retrying.

MDEV-31949: intermediate commit to let XAC not find xid initially

which prevents its retry that might have (had) something to do with lf_hash
asserts, segfauls.

MDEV-31949 intermediate commit to cover duplicate xid

"unlikely" duplicate xids have to be treated with spin-waiting.
XA-prepare keeps trying to insert into xid cache until succeeds.
Although this is not a duplicate case, XA-commit may do a similar wait
for xid from its parent XAP.

MDEV-31949 incr commit: XAP1 vs XAP2

Cleanup and improvements over previous Brandon's two commits
to replace both.

Collapse into XAP1 vs XAP2.

XAP_k-i(same_xid) -> XAC_k(same_xid) dependency is covered

with marking XAC's rgi context with SPECULATE_WAIT.

MDEV-31949 Incremental functional commit to address

- log-slave-updates=0
- xa rollback

Todo:
   0. cleanup
   1. add mtr tests
   2. implement mutex wait by XAC (to limit the spin number)

MDEV-31949 incremental commit implements pthread cond wait by XAC

to feature
- a new processlist stage introduced
- wait_for_commit::COND_wait_xa_commit added to the class
- wait_for_commit* XID_cache_element::waiter added to the class
- little cleanup

TODO:
- find optimal SPIN_MAX

Duplicate xid proper handling

briefly tested with up to 32 workers.

- sliding window is made 2 * |W|
- one more XID_cache_element::p_waiter is added to serve in C2 -> P3
  control.
- some cleanup

TODO: complete testing and cleanup
      prove the window size

MDEV-31949: Initial MTR test commit

Note it will cause the server to crash with tests 4a
and 4b

Also fixed a master-side assertion error

MDEV-31949 optimization not to W4PC in a certain case and ...

restore the assigned XAP sliding window size back to the original |W|.
It's safe now when a XAP and a XAC use separate wait conditions.

MDEV-31949 fixes around rpl_xa_concurrent_xap_xac

1. the XAP duplicate xid window registration is moved up
   to cover all gco situations;
2. The windon object gets destroyed for real
3. the test simplified to remove 4b in favor of a new to-be-done
   P1(fail)->C2-P3 branch.

Fixed lsu=0 false wakeup as seen by rpl.rpl_parallel_optimistic_xa_lsu_off

MDEV-31949: Test fixes/improvements

Fixed rpl_xa_prepare_gtid_fail. The problem was that now
with concurrent XAP/XAC, the XAP signalled the XAC to
complete, to which then the XAP would continue to update
gtid_slave_pos (with induced error), and fail, but the
XAC would already have completed. To fix with MDEV-21777,
but the change to the test is just to restart the slave
with the position of the XAC, as opposed to the XAP,
because it is now able to complete.

In rpl_xa_concurrent_xap_xac, test case 4b is repurposed
to ensure the XAP duplicate xid wait case, such that
if the prior XAP fails, the later in-wait XAP rolls
back successfully.

squash! MDEV-31949 parallel slave xa round-robin distribution

MDEV-31949 fixes to P1,R2,P3

XA-ROLLBACK (R2) did not call wakeup subsequent commit.
Fixed with making it to find `xid` in the usual place.

MDEV-31949: Extended test for ROLLBACK case

Restructured and renamed rpl_xa_concurrent_xap_xac to
rpl_xa_concurrent_2pc, and moved its test cases into
an include file with a parameterized completion event,
either COMMIT or ROLLBACK. The main test then calls
this included file under COMMIT and ROLLBACK variations
to ensure the behavior is correct for both cases.

Additionally updated the XA ROLLBACK logic in the code
so it doesn't call acquire_xid() after binlogging,
because the rollback case gets the XID beforehand.

Missed include/rpl_xa_concurrent_2pc.inc in last commit

Correction in condition leading to XAC to possibly wait for xid is corrected.

Before the change XAC could do some extra work inside  acquire_xid.

Cleanup commit.

- xa rollback is made consistent with commit wrt xid waiting and
  logging;
- rpl_xa_concurrent_2pc extended and refined to reflect the above
- simplified logics around `is_async_xac`;
- memory_order_relaxed within locked mutex.

MDEV-32257 dangling XA-rollback in binlog from emtpy XA in pseudo_slave_mode

This commit protects the slave from crashing at execution of
an orphan XA-rollback in MDEV-31949 branch.

This commit complements the preceding one.

Fixes to a MDEV-32257-like scenario to prove the orphan XA-"complete" does not binlog.

The test part for the previous commit.

Earlier show-binlog-events are removed as being superseded
by logic checks.

Fixes to faling tests.

- rpl.rpl_xa_concurrent_2pc runs only on debug builds
- rpl.rpl_parallel_xa_same_xid showed race in handling
    XA-commit (gtid k) -> XA-start (gtid k+n)
  dependency.
  The duplicate xid pass protocol is reinforced.
  Parallel slave XAC_k now marks its xa delete intent, so
  XAP_k+n either sees that or marks xid itself.
  XAC_k does not express the intent when xid has been already
  gotten an XAP waiter (in which case things go as before).

MDEV-31949: Cleanup and fix typo

rpl_xa_empty_transaction test made deterministic

MDEV-32347 ASAN xid_t::eq/event_xid_t::serialize poison, SIGSEGV in serialize_xid

Unexpected use case to rollback XA-prepare was illegitimate -
MDEV-32455 is reported to that effect - but it showed a vulnerability
to access properties of THD::lex such as xid of a past statement.

That is fixed. The assert has to be removed altogether at until
MDEV-32455 gets fixed.

Complete MDEV-32347 fixes to cover is_async_xac

branch of access to xid.

Fix Commit ID binlog filter

MDEV-32347 a review note addressed.

Cleanup: added docs to new functions and few cosmetics.

Correcting P -> C dependency handling and cleanup

P (slave_applier_reset_xa_trans) -> C (xid_cache_search_maybe_wait)
is made compatible with
C (xid_cache_delete) -> P (xid_cache_insert_maybe_wait).
The latter pair employs CAS to guarantee synchronization about
a waiter.

C -> P is also simplified so P -> C reflects that.
A large difference beteen the two is that in C -> P the xid record
is eliminated from the cache which forces to rely on
lf_hash_search() as a condition variable for pthread-cond-signal wait.

The concluding commit prior to collapse the branch.

- renaming
- function header comments added
- hyperoptimization of `wfc->parent_commit_started` is removed
  for the reason of not having been proved safe
- the size of the XAP sliding window is doubled to account
  a possibility of XAP_k -> XAP_k+2|W|-1 dependency.
  Say k=1, and the # of Workers is 4. Transaction are distributed RR,
  then it's possible to have T^*_1 -> T^*_8. It's seen from worker
  queues. The queue depelops downward:

  W1 ...  W4
  1^* 2 3 4
  5   6 7 8^*

 Worker # 1 has assigned with
  T_1 and T_5. Worker #4 can take on its T_8 when T_1 is yet at the
  beginning of its processing, so even before XA START of that XAP.

  This analysis was done couple of weeks ago, but I have not found a
  commit planned to cover it.

Fixed ASAN/MSAN build spotted possibly non-exiting access to thd->rgi_slave->commit_orderer.

TODO: explain/fix rpl.rpl_xa_concurrent_2pc, rpl.rpl_xa_prepare_gtid_fail
andrelkin added a commit that referenced this pull request Oct 18, 2023
XA-Prepare group of events

  XA START xid
  ...
  XA END xid
  XA PREPARE xid

and its XA-"complete" terminator

  XA COMMIT or
  XA ROLLBACK

are made distributed Round-Robin across slave parallel workers.
The former hash-based policy was proven to attribute to execution
latency through creating a big - many times larger than the size
of the worker pool - queue of binlog-ordered transactions
to commit.

Acronyms and notations used below:

  XAP := XA-Prepare event or the whole prepared XA group of events
  XAC := XA-"complete", which is a solitary group of events
  |W| := the size of the slave worker pool
  Subscripts like `_k' denote order in a corresponding sequence
     (e.g binlog file).

KEY CHANGES:

The parallel slave
------------------
driver thread now maintains a list XAP:s currently
in processing. It's purpose is to avoid "wild" parallel execution of XA:s
with duplicate xids (unlikely, but that's the user's right).
The list is arranged as a sliding window with the size of 2*|W| to account
a possibility of XAP_k -> XAP_k+2|W|-1 the largest (in the group-of-events
count sense) dependency.
Say k=1, and |W| the # of Workers is 4. As transactions are distributed
Round-Robin, it's possible to have T^*_1 -> T^*_8 as the largest
dependency ('*' marks the dependents) in runtime.
It can be seen from worker queues, like in the picture below.
Let Q_i worker queues  develop downward:

  Q1 ...  Q4
  1^* 2 3 4
  5   6 7 8^*

Worker # 1 has assigned with T_1 and T_5.
Worker #4 can take on its T_8 when T_1 is yet at the
beginning of its processing, so even before XA START of that XAP.

XA related
----------
XID_cache_element is extended with two pointers to resolve
two types of dependencies: the duplicate xid XAP_k -> XAP_k+i
and the ordinary completion on the prepare XAP_k -> XAC_k+j.
The former is handled by a wait-for-xid protocol conducted by
xid_cache_delete() and xid_cache_insert_maybe_wait().
The later is done analogously by xid_cache_search_maybe_wait() and
slave_applier_reset_xa_trans().

XA-"complete" are allowed to go forward before its XAP parent
has released the xid (all recovery concerns are covered in MDEV-21496,
MDEV-21777).
Yet XAC is going to wait for it at a critical
point of execution which is at "complete" the work in Engine.

CAVEAT: storage/innobase/trx/trx0undo.cc changes are due to possibly
        fixed MDEV-32144,
	TODO: to be verified.

Thanks to Brandon Nesterenko at mariadb.com for initial review and
a lot of creative efforts to advance with this work!
bnestere added a commit that referenced this pull request Nov 4, 2024
There seem to be 2 ASAN issues using mysqltest.cc (at least
using test binlog.binlog_autocommit_off_no_hang):

 1. (Fixed by this test) cur_con is not NULLed when freeing
    connections. At backtrace time, it can be read (though
    the backtrace is likely caused by point 2).

 2. (Still to be fixed) There is a leak in mariadb_lib.c line 3863:

```
      OPT_SET_EXTENDED_VALUE(&mysql->options, tls_verification_callback, arg1);
```

    with stack
=================================================================
==288928==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1568 byte(s) in 4 object(s) allocated from:
    #0 0x5ceaa4ebad7d in calloc (./server/build116_asan/client/mariadb-test+0x1dcd7d) (BuildId: 21b3097a37beada873a1eaa15c1ea1f16d3ad6d7)
    #1 0x5ceaa4f6457f in mysql_optionsv ./server/libmariadb/libmariadb/mariadb_lib.c:3863:7
    #2 0x5ceaa4f698fd in mysql_init ./server/libmariadb/libmariadb/mariadb_lib.c:1320:3
    #3 0x5ceaa4f5a06f in mariadb_reconnect ./server/libmariadb/libmariadb/mariadb_lib.c:2104:3
    #4 0x5ceaa4f58c8b in mthd_my_send_cmd ./server/libmariadb/libmariadb/mariadb_lib.c:394:9
    #5 0x5ceaa4f5add5 in ma_simple_command ./server/libmariadb/libmariadb/mariadb_lib.c:472:10
    #6 0x5ceaa4f75044 in mysql_send_query ./server/libmariadb/libmariadb/mariadb_lib.c:2524:10
    #7 0x5ceaa4efc226 in wrap_mysql_send_query(st_mysql*, char const*, unsigned long) ./server/client/../tests/nonblock-wrappers.h:211:1
    #8 0x5ceaa4f2f8ce in run_query_normal(st_connection*, st_command*, int, char const*, unsigned long, st_dynamic_string*, st_dynamic_string*) ./server/client/mysqltest.cc:8230:9
    #9 0x5ceaa4f36676 in run_query(st_connection*, st_command*, int) ./server/client/mysqltest.cc:9652:5
    #10 0x5ceaa4f3a0ea in main ./server/client/mysqltest.cc:10484:2
    #11 0x7efaf322a1c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #12 0x7efaf322a28a in __libc_start_main csu/../csu/libc-start.c:360:3
    #13 0x5ceaa4e1fd44 in _start (./server/build116_asan/client/mariadb-test+0x141d44) (BuildId: 21b3097a37beada873a1eaa15c1ea1f16d3ad6d7)
andrelkin pushed a commit that referenced this pull request Apr 24, 2025
…utine

When the SELECT sub-statement executes a stored function that is defined
to modify a non-transactional table, like

   delimiter |;
   create function f_ia(arg int)
   returns integer
   begin
     insert into ti_pk set a=1;
     insert into ta set a=1;
     insert into ti_pk set a=arg;
     return 1;
   end |
   delimiter ;|

any modified records that the function has succeeded
on must be binlogged as a "side effect" of CREATE-SELECT.

It is expected that a failing CREATE-SELECT like

  --error ER_DUP_ENTRY
  set statement binlog_format = ROW for create table t_y (a int) engine=aria select f_ia(1 /* err in Innodb after Aria stmt is done */) as a;

leaves upon itself the following state:

  include/show_binlog_events.inc
  Log_name	Pos	Event_type	Server_id	End_log_pos	Info
  master-bin.000001	#	Gtid	#	#	BEGIN GTID #-#-#
  master-bin.000001	#	Table_map	#	#
  table_id: # (test.  ta)
  master-bin.000001	#	Write_rows_v1	#	#
  table_id: # flags:   STMT_END_F
  master-bin.000001	#	Query	#	#	COMMIT
  select * from ta;
  a
  1
  select count(*) = 0 from ti_pk;
  true

However it's not so for the binlog part.
The reason is that prior to MDEV-34150 fixes the CREATE-SELECT's
errored phase leaves the binlog caches intact (the file:pos from 10.11 c06c362)

to defer their reset to the rollback phase of the top-level

/* the statement cache gets binlogged */

where the side-effect changes gets binlogged.

MDEV-34150 fixes harmed (+#4 line) the statement cache in particular
in the error phase (file:pos are from 395db6f the current 11.8 )

/* The caches incl the statement cache are gone */
/* 'cos of MDEV-34150 */
+#4  0x00005d75f9b6a92e in THD::binlog_remove_rows_events (this=0x52c000240288) at log.cc:579

Apparently it should not have been there, as proper emptying (either
with reset for the transactional cache or flush and then reset for the
statement cache) is (must be) always done via binlog_rollback of the
top-level statement.
To observe the above requirement the case is fixed with the removal of
thd->binlog_remove_rows_events() and its definition.
andrelkin pushed a commit that referenced this pull request Apr 25, 2025
…utine

When the SELECT sub-statement executes a stored function that is defined
to modify a non-transactional table, like

   delimiter |;
   create function f_ia(arg int)
   returns integer
   begin
     insert into ti_pk set a=1;
     insert into ta set a=1;
     insert into ti_pk set a=arg;
     return 1;
   end |
   delimiter ;|

any modified records that the function has succeeded
on must be binlogged as a "side effect" of CREATE-SELECT.

It is expected that a failing CREATE-SELECT like

  --error ER_DUP_ENTRY
  set statement binlog_format = ROW for create table t_y (a int) engine=aria select f_ia(1 /* err in Innodb after Aria stmt is done */) as a;

leaves upon itself the following state:

  include/show_binlog_events.inc
  Log_name	Pos	Event_type	Server_id	End_log_pos	Info
  master-bin.000001	#	Gtid	#	#	BEGIN GTID #-#-#
  master-bin.000001	#	Table_map	#	#
  table_id: # (test.  ta)
  master-bin.000001	#	Write_rows_v1	#	#
  table_id: # flags:   STMT_END_F
  master-bin.000001	#	Query	#	#	COMMIT
  select * from ta;
  a
  1
  select count(*) = 0 from ti_pk;
  true

However it's not so for the binlog part.
The reason is that prior to MDEV-34150 fixes the CREATE-SELECT's
errored phase leaves the binlog caches intact (the file:pos from 10.11 c06c362)

to defer their reset to the rollback phase of the top-level

/* the statement cache gets binlogged */

where the side-effect changes gets binlogged.

MDEV-34150 fixes harmed (+#4 line) the statement cache in particular
in the error phase (file:pos are from 395db6f the current 11.8 )

/* The caches incl the statement cache are gone */
/* 'cos of MDEV-34150 */
+#4  0x00005d75f9b6a92e in THD::binlog_remove_rows_events (this=0x52c000240288) at log.cc:579

Apparently it should not have been there, as proper emptying (either
with reset for the transactional cache or flush and then reset for the
statement cache) is (must be) always done via binlog_rollback of the
top-level statement.
To observe the above requirement the case is fixed with the removal of
thd->binlog_remove_rows_events() and its definition.
andrelkin pushed a commit that referenced this pull request Apr 25, 2025
…utine

When the SELECT sub-statement executes a stored function that is defined
to modify a non-transactional table, like

   delimiter |;
   create function f_ia(arg int)
   returns integer
   begin
     insert into ti_pk set a=1;
     insert into ta set a=1;
     insert into ti_pk set a=arg;
     return 1;
   end |
   delimiter ;|

any modified records that the function has succeeded
on must be binlogged as a "side effect" of CREATE-SELECT.

It is expected that a failing CREATE-SELECT like

  --error ER_DUP_ENTRY
  set statement binlog_format = ROW for create table t_y (a int) engine=aria select f_ia(1 /* err in Innodb after Aria stmt is done */) as a;

leaves upon itself the following state:

  include/show_binlog_events.inc
  Log_name	Pos	Event_type	Server_id	End_log_pos	Info
  master-bin.000001	#	Gtid	#	#	BEGIN GTID #-#-#
  master-bin.000001	#	Table_map	#	#
  table_id: # (test.  ta)
  master-bin.000001	#	Write_rows_v1	#	#
  table_id: # flags:   STMT_END_F
  master-bin.000001	#	Query	#	#	COMMIT
  select * from ta;
  a
  1
  select count(*) = 0 from ti_pk;
  true

However it's not so for the binlog part.
The reason is that prior to MDEV-34150 fixes the CREATE-SELECT's
errored phase leaves the binlog caches intact (the file:pos from 10.11 c06c362)

to defer their reset to the rollback phase of the top-level

/* the statement cache gets binlogged */

where the side-effect changes gets binlogged.

MDEV-34150 fixes harmed (+#4 line) the statement cache in particular
in the error phase (file:pos are from 395db6f the current 11.8 )

/* The caches incl the statement cache are gone */
/* 'cos of MDEV-34150 */
+#4  0x00005d75f9b6a92e in THD::binlog_remove_rows_events (this=0x52c000240288) at log.cc:579

Apparently it should not have been there, as proper emptying (either
with reset for the transactional cache or flush and then reset for the
statement cache) is (must be) always done via binlog_rollback of the
top-level statement.
To observe the above requirement the case is fixed with the removal of
thd->binlog_remove_rows_events() and its definition.

Tested with rpl.rpl_create_select_row.

Reviewed-by Brandon Nesterenko.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant