Percona Server moved to https://jira.percona.com/projects/PS

Bug #758788
Comment #21

Comment 21 for bug 758788

Revision history for this message

Tim (tchadwick-b) wrote on 2012-01-04:

#21

Hi All,

I am responsible for a set of MySQL instances that are playing the application-level shard game. We have well over 100K tables per instance that are partitioned, so we're really talking 600K+ tables with a dictionary cache over 6GB.

We've seen instances of the corruption in a handful of tables upon restart, and the errors point to the table definitions being corrupt. After seeing the innodb_dict_size_limit instrumentation available, and knowing that we were well past the usual size of a large dictionary I was excited to see if reducing the dictionary cache could eliminate this issue. However, before I attempted to implement the feature I noticed this bug and did considerable testing using the random query generator and probing harsh transaction and alter conditions around the 100M threshold suggested in this bug report. I was able to confirm instrumentation of the dictionary cache, and did not see crash. I also note that I was able to repeat the behavior Yasufumi reports using very small innodb_dict_size_limit values.

The production environment in which I next implemented the feature is a site mirror of the production "primary" (not a replication architecture - the sites operate independently), and so it is a production environment that can suffer downtime, but only briefly so as to maintain failover for the primary site.

The summary is that the functionality fell over.

Here are the initial / material details for 2 machines, denoted as 01 and 02:

# 01
mysql> SHOW ENGINE INNODB STATUS;
Dictionary cache 6434650431 (212499088 + 6222151343)
Dictionary memory allocated 6222151343

#02
mysql> SHOW ENGINE INNODB STATUS;
Dictionary cache 5662601749 (212499088 + 5450102661)
Dictionary memory allocated 5450102661

I began by setting the dictionary size limit to 5GB.

# 01 and 02
mysql> SET GLOBAL innodb_dict_size_limit=5368709120;

The cache on 01, illustrated the expected behavior and slowly (ever so slowly...) began to evict cache entries.

# 01
mysql> SHOW ENGINE INNODB STATUS;
Dictionary cache 6432416970 (212499088 + 6219917882)
Dictionary memory allocated 6219917882
1 row in set (0.00 sec)

mysql> SHOW ENGINE INNODB STATUS;
Dictionary cache 6431708659 (212499088 + 6219209571)
Dictionary memory allocated 6219209571
1 row in set (0.00 sec)

mysql> SHOW ENGINE INNODB STATUS;
Dictionary cache 6429897331 (212499088 + 6217398243)
Dictionary memory allocated 6217398243
1 row in set (0.00 sec)

The cache on 02 is a different story. It did not change at all. Please note that the table_open_cache was 50k initially, but reduced to 20k to help the algorithm identify candidates for dictionary cache eviction. Of course while this was happening all activity on the db was suspended and no tables were in active use. Also, although 50k might seem low for this number of tables, the tables are in a single tablespace and not in a file-per-table mode. At this scale using multiple buffer pool instances was found to be most performant in that regard, but thoughts on that certainly appreciated.

While I was prodding 02 in order to determine what might be causing the innodb_dict_size_limit to not force dictionary cache instrumentation, 01 crashed.

I can provide much more detail but wanted to get this out on the bug report to illustrate that although the innodb_dict_size_limit was set well above 100M, the mysqld process still experienced crash in one case, and interestingly also experienced unexpected behavior in another. To be clear though, the 02 server which did not show dictionary cache change, did not crash.

Corruption was found on both instances upon restart.

Hi All,

I am responsible for a set of MySQL instances that are playing the application-level shard game.  We have well over 100K tables per instance that are partitioned, so we're really talking 600K+ tables with a dictionary cache over 6GB.

We've seen instances of the corruption in a handful of tables upon restart, and the errors point to the table definitions being corrupt.  After seeing the innodb_dict_size_limit instrumentation available, and knowing that we were well past the usual size of a large dictionary I was excited to see if reducing the dictionary cache could eliminate this issue.  However, before I attempted to implement the feature I noticed this bug and did considerable testing using the random query generator and probing harsh transaction and alter conditions around the 100M threshold suggested in this bug report.  I was able to confirm instrumentation of the dictionary cache, and did not see crash.  I also note that I was able to repeat the behavior Yasufumi reports using very small innodb_dict_size_limit values.

The summary is that the functionality fell over.

Here are the initial / material details for 2 machines, denoted as 01 and 02:

# 01
mysql> SHOW ENGINE INNODB STATUS;
    Dictionary cache    6434650431 	(212499088 + 6222151343)
Dictionary memory allocated 6222151343

#02
mysql> SHOW ENGINE INNODB STATUS;
    Dictionary cache    5662601749 	(212499088 + 5450102661)
Dictionary memory allocated 5450102661

I began by setting the dictionary size limit to 5GB.

# 01 and 02
mysql> SET GLOBAL innodb_dict_size_limit=5368709120;

The cache on 01, illustrated the expected behavior and slowly (ever so slowly...) began to evict cache entries.

# 01
mysql> SHOW ENGINE INNODB STATUS;
    Dictionary cache    6432416970 	(212499088 + 6219917882)
Dictionary memory allocated 6219917882
1 row in set (0.00 sec)

mysql> SHOW ENGINE INNODB STATUS;
    Dictionary cache    6431708659 	(212499088 + 6219209571)
Dictionary memory allocated 6219209571
1 row in set (0.00 sec)

mysql> SHOW ENGINE INNODB STATUS;
    Dictionary cache    6429897331 	(212499088 + 6217398243)
Dictionary memory allocated 6217398243
1 row in set (0.00 sec)

The cache on 02 is a different story.  It did not change at all.  Please note that the table_open_cache was 50k initially, but reduced to 20k to help the algorithm identify candidates for dictionary cache eviction.  Of course while this was happening all activity on the db was suspended and no tables were in active use.  Also, although 50k might seem low for this number of tables,  the tables are in a single tablespace and not in a file-per-table mode.  At this scale using multiple buffer pool instances was found to be most performant in that regard, but thoughts on that certainly appreciated.

While I was prodding 02 in order to determine what might be causing the innodb_dict_size_limit to not force dictionary cache instrumentation, 01 crashed.

I can provide much more detail but wanted to get this out on the bug report to illustrate that although the innodb_dict_size_limit was set well above 100M, the mysqld process still experienced crash in one case, and interestingly also experienced unexpected behavior in another.   To be clear though, the 02 server which did not show dictionary cache change, did not crash.

Corruption was found on both instances upon restart.