-
Notifications
You must be signed in to change notification settings - Fork 129
Uninitialized data in the TokuDB recovery log
A TokuDB test run with valgrind reported an uninitialized data error when writing into the TokuDB recovery log.
==1032== Syscall param write(buf) points to uninitialised byte(s)
==1032== at 0x3EFA60E4ED: ??? (in /lib64/libpthread-2.12.so)
==1032== by 0xB894038: toku_os_full_write(int, void const*, unsigned long) (file.cc:249)
==1032== by 0xB83248A: write_outbuf_to_logfile(tokulogger*, __toku_lsn*) (logger.cc:513)
==1032== by 0xB83326C: toku_logger_maybe_fsync(tokulogger*, __toku_lsn, int, bool) (logger.cc:836)
==1032== by 0xB8327DE: toku_logger_fsync_if_lsn_not_fsynced(tokulogger*, __toku_lsn) (logger.cc:586)
==1032== by 0xB8493E6: toku_txn_maybe_fsync_log(tokulogger*, __toku_lsn, bool) (txn.cc:600)
==1032== by 0xB7B4EBB: toku_txn_commit(__toku_db_txn*, unsigned int, void (*)(__toku_txn_progress*, void*), void*, bool, bool) (ydb_txn.cc:198)
==1032== by 0xB7B55E9: locked_txn_commit_with_progress(__toku_db_txn*, unsigned int, void (*)(__toku_txn_progress*, void*), void*) (ydb_txn.cc:338)
==1032== by 0xB798B04: commit_txn_with_progress(__toku_db_txn*, unsigned int, THD*) (hatoku_hton.cc:638)
==1032== by 0xB798E82: tokudb_commit(handlerton*, THD*, bool) (hatoku_hton.cc:679)
==1032== by 0x655597: ha_commit_low(THD*, bool, bool) (handler.cc:1496)
==1032== by 0x726681: TC_LOG_MMAP::commit(THD*, bool) (log.cc:2642)
==1032== by 0x6553F0: ha_commit_trans(THD*, bool, bool) (handler.cc:1439)
==1032== by 0x8A825F: trans_commit_stmt(THD*) (transaction.cc:434)
==1032== by 0x7E95E2: mysql_execute_command(THD*) (sql_parse.cc:4997)
TokuDB should not be storing uninitialized data into its files, so we need to find the the code that neglected to set the memory being written into a known state. An update SQL statement, which uses a read modify write algorithm, was being committed at the time of the valgrind error, so we started searching for the error in the TokuDB update logic.
Updates in MySQL are typically use a read, modify, write algorithm. We discovered that the value being written into the fractal tree by the update function had one uninitialized byte. This byte contained the null bits. We further discovered that the value read from the fractal tree by the TokuDB update function contained this uninitialized byte. So, the input to the update statement was uninitialized. Some previous SQL statement must have written an uninitialized null byte.
We found a previous alter table statement that added a column to the table. In addition, we found that the column addition update callback was the source of the uninitialized null byte.
The column addition update callback sets or clears a bit in the null bytes array of the new row for every nullable column in the row. The initial state of the new null bytes array is random since it mallocs some memory but does not zero it. The column addition algorithm sets or clears the correct set of bits, but it does not set all of the bits in the bit vector since it ignores those bits that do not have a corresponding nullable column. Valgrind detects that not all of the bits were set and reports the uninitialized data error.
So, no logic error here. I changed the column addition callback to zero the initial state of the null bits array prior to the same code setting or clearing the nullable column's null bit. I believe that the end result is that the correct bits are set or clear, and valgrind no longer complains.
I did not realize that valgrind tracks the state of each memory bit. Here is a simple program that shows this.
1 #include <stdio.h>
2 #include <assert.h>
3 #include <stdlib.h>
4 #include <unistd.h>
5 #include <memory.h>
6 static void setbits(unsigned char *b, unsigned max_bits) {
7 for (unsigned j = 0; j < max_bits; j++) {
8 if (j & 1)
9 *b |= 1<<j;
10 else
11 *b &= ~(1<<j);
12 }
13 }
14 int main(int argc, char *argv[]) {
15 assert(argc == 2);
16 unsigned max_bits = atoi(argv[1]);
17 unsigned char *b = new unsigned char[1];
18 setbits(b, max_bits);
19 write(1000, b, 1);
20 delete [] b;
21 return 0;
22 }
If we only write the first 4 bits, then valgrind reports an uninitialized data issue.
$ valgrind ./setbits 4
$ valgrind ./setbits 4
==26368== Memcheck, a memory error detector
==26368== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==26368== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==26368== Command: ./setbits 4
==26368==
==26368== Syscall param write(buf) points to uninitialised byte(s)
==26368== at 0x3EF9ADB690: __write_nocancel (in /lib64/libc-2.12.so)
==26368== by 0x400781: main (setbits.cc:19)
==26368== Address 0x4c21040 is 0 bytes inside a block of size 1 alloc'd
==26368== at 0x4A07152: operator new[](unsigned long) (vg_replace_malloc.c:363)
==26368== by 0x400756: main (setbits.cc:17)
==26368==
==26368==
==26368== HEAP SUMMARY:
==26368== in use at exit: 0 bytes in 0 blocks
==26368== total heap usage: 1 allocs, 1 frees, 1 bytes allocated
==26368==
==26368== All heap blocks were freed -- no leaks are possible
==26368==
==26368== For counts of detected and suppressed errors, rerun with: -v
==26368== Use --track-origins=yes to see where uninitialised values come from
==26368== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 6 from 6)
If we write all 8 bits, then valgrind is silent as it should be.
$ valgrind ./setbits 8
==26288== Memcheck, a memory error detector
==26288== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==26288== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==26288== Command: ./setbits 8
==26288==
==26288==
==26288== HEAP SUMMARY:
==26288== in use at exit: 0 bytes in 0 blocks
==26288== total heap usage: 1 allocs, 1 frees, 1 bytes allocated
==26288==
==26288== All heap blocks were freed -- no leaks are possible
==26288==
==26288== For counts of detected and suppressed errors, rerun with: -v
==26288== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6)