PBXT table intermittently hanging after 22M rows entered
Running 10.0.08 on 5.1.35, RHEL 5.4 X86_64, dual quadcore Xeon, 16GB. Have a single database on the system with about 16 small MyISAM tables and a single PBXT table currently holding about 22M records. The data directory is on an OCFS2 filesystem on an EMC FC SAN. I/O rates are quite low (rarely over 5MB/sec and I have tested the filesystem to deliver over 200MB/sec ). I've been loading data for about 6 weeks, roughly 20000 records per hour (recently increased from about half that). I began to see some strange behavior about 10 days ago where system load average would go very high (I didn't see it, but it was reported to be over 100) and lots of queries would be queued up with no progress. At the time I thought it might be due to adding a mysqldump of the PBXT table, although in fact that was succeeding every day after about 7 hours (generates a 20GB dump file). It's been loading OK and responding well until this morning when the same problems with query response came back. However, in this instance the system load average was low and there was no sign of increased I/O or cpu usage. The time the problem started doesn't seem to correlate with any known activity (specifically, it was about halfway through the mysqldump process). This time I stopped the Java application that loads the data, and tried to restart the DB. However, I waited 5 minutes but the DB would not shut down. I killed it by hand and restarted. The PBXT recovery process started up but I misinterpreted the messages in the error log to mean that recovery was complete, and when I couldn't see the tables I did another manual restart. The second time I let it complete and waited for the "data sweeper" phase to finish (took about 30 minutes). At this point everything looks OK (I had about 50 records reporting
DB-RECOVERY-data int xt_tab_
not too surprising given the unclean shutdown. I'm not concerned about the data loss, and the second restart reported recovery required but had no errors.
Here's some xtstat output (10 second intervals). It's not too representative as the DB has only been up for a few minutes and I've currently throttled the load down to about 1/3 of normal, but maybe someone will see something wrong with the current table parameters. I'll update tomorrow after the application has been running for a while, and probably after turning up the load to normal.
-- PBXT System Variables --
pbxt_auto_
pbxt_checkpoint
pbxt_data_
pbxt_data_
pbxt_garbage_
pbxt_index_
pbxt_log_
pbxt_log_cache_size = 64MB
pbxt_log_file_count = 3
pbxt_log_
pbxt_max_threads = 207
pbxt_offline_
pbxt_record_
pbxt_row_
pbxt_sweeper_
pbxt_transactio
ilog ilog ilog xlog xlog xlog xlog xlog xlog
in out syncs/ms in out syncs msec hits miss
3329M 3313M 2t/10.1t 401M 223M 46.0t 379t 233t 22.1t
0 0 0/0 0 185K 171 287 200 0
0 0 0/0 0 315K 289 463 308 0
rec rec rec rec rec rec rec data data data data
in out syncs/ms hits miss frees %use in out syncs msec
2824M 136M <t/202t 4728t 220t 165t 86.7 2316M 185M 47.7t 517t
29.0M 320K 0/0 10.7t 922 0 88.1 4095K 292K 261 683
41.9M 0 0/0 15.4t 1334 0 90.2 5753K 331K 321 839
23.7M 0 0/0 9825 756 0 91.4 3531K 265K 262 631
row row row row stat stat ind ind ind ind ind ind
sel ins upd del read write in out syncs/ms hits miss %use
1133t 6556 43.4t 0 107t 50.0t 1051M 3397M <t/514t 12.2m 67.3t 99.9
3282 33 150 0 355 182 352K 0 0/0 6277 22 100
7073 21 236 0 564 261 192K 0 0/0 13.3t 12 100
6426 38 329 0 788 367 240K 0 0/0 15.9t 15 100
Question information
- Language:
- English Edit question
- Status:
- Answered
- For:
- PBXT Edit question
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
Can you help with this problem?
Provide an answer of your own, or ask Thorn Roby for more information if necessary.