PBXT table intermittently hanging after 22M rows entered

Asked by Thorn Roby on 2009-08-31

Running 10.0.08 on 5.1.35, RHEL 5.4 X86_64, dual quadcore Xeon, 16GB. Have a single database on the system with about 16 small MyISAM tables and a single PBXT table currently holding about 22M records. The data directory is on an OCFS2 filesystem on an EMC FC SAN. I/O rates are quite low (rarely over 5MB/sec and I have tested the filesystem to deliver over 200MB/sec ). I've been loading data for about 6 weeks, roughly 20000 records per hour (recently increased from about half that). I began to see some strange behavior about 10 days ago where system load average would go very high (I didn't see it, but it was reported to be over 100) and lots of queries would be queued up with no progress. At the time I thought it might be due to adding a mysqldump of the PBXT table, although in fact that was succeeding every day after about 7 hours (generates a 20GB dump file). It's been loading OK and responding well until this morning when the same problems with query response came back. However, in this instance the system load average was low and there was no sign of increased I/O or cpu usage. The time the problem started doesn't seem to correlate with any known activity (specifically, it was about halfway through the mysqldump process). This time I stopped the Java application that loads the data, and tried to restart the DB. However, I waited 5 minutes but the DB would not shut down. I killed it by hand and restarted. The PBXT recovery process started up but I misinterpreted the messages in the error log to mean that recovery was complete, and when I couldn't see the tables I did another manual restart. The second time I let it complete and waited for the "data sweeper" phase to finish (took about 30 minutes). At this point everything looks OK (I had about 50 records reporting

DB-RECOVERY-data int xt_tab_load_ext_data(table_xt.cc:2491) -50: Extended record part does not match reference

not too surprising given the unclean shutdown. I'm not concerned about the data loss, and the second restart reported recovery required but had no errors.

Here's some xtstat output (10 second intervals). It's not too representative as the DB has only been up for a few minutes and I've currently throttled the load down to about 1/3 of normal, but maybe someone will see something wrong with the current table parameters. I'll update tomorrow after the application has been running for a while, and probably after turning up the load to normal.

-- PBXT System Variables --
pbxt_auto_increment_mode = 0
pbxt_checkpoint_frequency = 28MB
pbxt_data_file_grow_size = 16MB
pbxt_data_log_threshold = 128MB
pbxt_garbage_threshold = 50
pbxt_index_cache_size = 1000M
pbxt_log_buffer_size = 256K
pbxt_log_cache_size = 64MB
pbxt_log_file_count = 3
pbxt_log_file_threshold = 32MB
pbxt_max_threads = 207
pbxt_offline_log_function = 0
pbxt_record_cache_size = 2000M
pbxt_row_file_grow_size = 2MB
pbxt_sweeper_priority = 0
pbxt_transaction_buffer_size = 1MB

ilog ilog ilog xlog xlog xlog xlog xlog xlog
   in out syncs/ms in out syncs msec hits miss
3329M 3313M 2t/10.1t 401M 223M 46.0t 379t 233t 22.1t
    0 0 0/0 0 185K 171 287 200 0
    0 0 0/0 0 315K 289 463 308 0

 rec rec rec rec rec rec rec data data data data
   in out syncs/ms hits miss frees %use in out syncs msec
2824M 136M <t/202t 4728t 220t 165t 86.7 2316M 185M 47.7t 517t
29.0M 320K 0/0 10.7t 922 0 88.1 4095K 292K 261 683
41.9M 0 0/0 15.4t 1334 0 90.2 5753K 331K 321 839
23.7M 0 0/0 9825 756 0 91.4 3531K 265K 262 631

 row row row row stat stat ind ind ind ind ind ind
  sel ins upd del read write in out syncs/ms hits miss %use
1133t 6556 43.4t 0 107t 50.0t 1051M 3397M <t/514t 12.2m 67.3t 99.9
 3282 33 150 0 355 182 352K 0 0/0 6277 22 100
 7073 21 236 0 564 261 192K 0 0/0 13.3t 12 100
 6426 38 329 0 788 367 240K 0 0/0 15.9t 15 100

Question information

Language:
English Edit question
Status:
Answered
For:
PBXT Edit question
Assignee:
No assignee Edit question
Last query:
2009-08-31
Last reply:
2009-09-01

Hi Thorn,

We have found a couple of bugs that cause the error "Extended record part does not match reference", after recovery. These have been fixed in version 1.0.08c, which is the most recent version in the lp:pbxt/rc2 release branch.

I am not sure if it is possible, but I would recommend restoring from the dump due to this problem.

The slowdown may be due to the long running time of the dump. Basically if the dump is done using one SELECT, then PBXT is generating a snapshot. If this snapshot is held for as long as 7 hours, then PBXT must hold all changes to the database during this time. You can see this by running xtstat --display=xact. Here you will see "xact dirty" growing.

This would also explain the long recovery time, in which the sweeper takes a long time to complete.

Ideal would be to dump the table in chunks (for example, in blocks of 10000 rows according to the PK). This would, of course, then not be a snapshot of the table.

Your system parameters look OK. pbxt_checkpoint_frequency could be set to, say, 512MB, and pbxt_log_file_threshold to 128MB. The other xtstat output looks OK. xlog syncs are occurring, so transactions are being committed.

Otherwise, it would be great if we could get an xtstat --display=all, during the actual slowdown. This should show us where the problem is. Although I am not sure if this is possible without running xtstat constantly, which would result in a lot of data.

Also, e-mail me your MySQL err log, and I will have a look if I can see any other problems.

Best regards,

Paul

Can you help with this problem?

Provide an answer of your own, or ask Thorn Roby for more information if necessary.

To post a message you must log in.