Thursday, December 23, 2010

Bug in TokuDB

I just found a bug on MariaDB version. I was able to replicate it 4-5 times.

MariaDB [test]> show create table video;

CREATE TABLE `video` (
  `id` int(4) NOT NULL AUTO_INCREMENT,
  `artist` varchar(100) NOT NULL,
  `title` varchar(100) NOT NULL,
  `video` varchar(20) NOT NULL,
  `date` date NOT NULL,
  PRIMARY KEY (`id`),
  KEY `video_artist_ix` (`artist`),
  KEY `video` (`video`),
  KEY `date_id` (`date`,`id`)
);



MariaDB [test]> create table x_toku(
 `id` int(4) NOT NULL AUTO_INCREMENT,
 `artist` varchar(100) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `video_artist_ix` (`artist`) 
) ENGINE=TOKUDB ;
Query OK, 0 rows affected (18.51 sec)



MariaDB [test]> insert into x_toku select id, artist from video;
Query OK, 10875440 rows affected (54.39 sec)
Records: 10875440  Duplicates: 0  Warnings: 0



MariaDB [test]> delete from x_toku where artist like 'ma%';
Query OK, 274705 rows affected (22.49 sec)



MariaDB [test]> select count(*) from x_toku where artist like 'ma%';
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)
*** THIS MUST BE WRONG, DIDN'T WE DELETED EVERYTHING ***



*** THE DATA PROBABLY COMES FROM THE INDEX ***
MariaDB [test]> explain select count(*) from x_toku where artist like 'ma%';
+----+-------------+--------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table  | type  | possible_keys   | key             | key_len | ref  | rows | Extra                    |
+----+-------------+--------+-------+-----------------+-----------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | x_toku | range | video_artist_ix | video_artist_ix | 102     | NULL |    2 | Using where; Using index |
+----+-------------+--------+-------+-----------------+-----------------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
*** YEP, USING INDEX ***



*** LETS TRY WITHOUT INDEX SCAN ***
MariaDB [test]> select count(*) from x_toku ignore index(video_artist_ix) where artist like 'ma%';
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (5.51 sec)
*** THIS STILL MUST BE WRONG, DIDN'T WE DELETED EVERYTHING ***



*** LETS SEE THE DATA ***
MariaDB [test]> select * from x_toku where artist like 'ma%';
+---------+------------------+
| id      | artist           |
+---------+------------------+
| 7953132 | mallu+adult2     |
| 6640134 | Martin+Nieverra1 |
+---------+------------------+
2 rows in set (0.00 sec)
*** HMMM ***



*** MAY BE IT GETS THE DATA IS MATERIALIZED FROM THE INDEX ***
MariaDB [test]> explain select * from x_toku where artist like 'ma%';
+----+-------------+--------+-------+-----------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table  | type  | possible_keys   | key             | key_len | ref  | rows | Extra                    |
+----+-------------+--------+-------+-----------------+-----------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | x_toku | range | video_artist_ix | video_artist_ix | 102     | NULL |    2 | Using where; Using index |
+----+-------------+--------+-------+-----------------+-----------------+---------+------+------+--------------------------+
1 row in set (0.03 sec)
*** YEP, USING INDEX ***



*** LETS SEE THE DATA WITHOUT INDEX SCAN ***
MariaDB [test]> select * from x_toku ignore index(video_artist_ix) where artist like 'ma%';
+----------+----------------------------------+
| id       | artist                           |
+----------+----------------------------------+
| 15292754 | Matt+Pokora+Catch+Me+If+You+Can1 |
+----------+----------------------------------+
1 row in set (5.24 sec)
*** DOUBLE HMMM ***


*** LETS SEE WHAT RECORD IS ACTUALLY REAL ***
MariaDB [test]> select * from x_toku where id in (7953132,6640134,15292754);
+----------+----------------------------------+
| id       | artist                           |
+----------+----------------------------------+
| 15292754 | Matt+Pokora+Catch+Me+If+You+Can1 |
+----------+----------------------------------+
1 row in set (0.00 sec)

3 comments:

Bradley C. Kuszmaul said...

Hi, this Bradley Kuszmaul, chief architect at Tokutek. We're looking into this.

-Bradley

Bradley C. Kuszmaul said...

Nikolay, thanks for finding this bug. We reproduced it and we found that it is an error in our Fast Loader, a feature we introduced in v4.0 for accelerating loading into an empty index. We have fixed the bug and are currently testing a new release which we expect to release for General Availability early next week.

We have apprised all our customers of the problem and are working with them to determine if their tables or indexes have been corrupted - fortunately it turns out that this bug causes data inconsistencies or data loss rarely and only under very specific circumstances. Details on the bug are available on the Known Issues page on the Support tab of our website.

Again, on behalf of ourselves and our customers, please accept our thanks for finding this important bug and reporting it to us.

NMMM.NU said...

Bug was fixed in TokuDB 4.1.3, here you are the "dump" of the test.

=====================

insert into video_toku select * from video;

Query OK, 10875440 rows affected (1 min 43.00 sec)

=====================

Size on disk
558M

Size on show table status
846M

=====================

update video_toku set artist=concat(artist,' ok');

Query OK, 10873941 rows affected, 1635 warnings (8 min 30.91 sec)
Rows matched: 10875440 Changed: 10873941 Warnings: 1635

=====================

delete from video_toku where artist like 'ni%';

Query OK, 68155 rows affected (18.89 sec)

=====================

insert into video_bh select * from video_toku where date = '2010-11-11' ;

Query OK, 10853 rows affected (0.23 sec)
Records: 10853 Duplicates: 0 Warnings: 0

=====================

insert into video_bh select * from video_toku where artist like 'a%' ;

Query OK, 790356 rows affected (9.37 sec)
Records: 790356 Duplicates: 0 Warnings: 0

=====================

select count(*) from video_toku where date = '2010-11-11' ;

1 row in set (0.11 sec)

=====================

select count(*) from video_toku where artist like 'a%' ;

1 row in set (0.46 sec)

=====================

select count(*) from video_toku;

1 row in set (6.61 sec)

=====================

select min(date), max(date) from video_toku;

1 row in set (0.01 sec)

=====================

Concurrent operation returned lock timout