Prod mysql hangs after conflict

Asked by Bogdan Kanivets

Hi,
We are using Mysql 5.5.31 and percona-xtradb-cluster 5.5. After the following conflict one of our prod mysql databases became unresponsive for ~30 min. Then it recovered without restart. Nothing in mysql error log. Here is the percona log:

131010 13:04:57 [Note] WSREP: cluster conflict due to high priority abort for threads:
131010 13:04:57 [Note] WSREP: Winning thread:
   THD: 2, mode: applier, state: executing, conflict: no conflict, seqno: 35606
   SQL: update Mapping set archived = true where mapping_id = 75565
131010 13:04:57 [Note] WSREP: Victim thread:
   THD: 87250, mode: local, state: committing, conflict: no conflict, seqno: -1
   SQL: commit
131010 13:04:57 [Note] WSREP: BF kill (1, seqno: 35606), victim: (87250) trx: 6179794
131010 13:04:57 [Note] WSREP: Aborting query: commit
131010 13:04:57 [Note] WSREP: kill query for: 87250
131010 13:04:57 [Note] WSREP: kill trx QUERY_COMMITTING for 6179794
131010 13:04:57 [Note] WSREP: waiting for BF, trx order: 35606 35606

131010 13:04:57 [Note] WSREP: replaying increased: 1, thd: 87250
131010 13:04:57 [Note] WSREP: commit failed for reason: 4
131010 13:04:57 [Note] WSREP: conflict state: 4
131010 13:04:57 [Note] WSREP: replay trx: commit -1
131010 13:04:57 [Note] WSREP: trx_replay successful for: 87250 140434669659904
131010 13:04:57 [Note] WSREP: replaying decreased: 0, thd: 87250

This is the first time we've seen conflict state: 4.

Do you have any ideas about what happened? What logging should we enable to see more information? Other recommendations are welcome.

Thanks

Question information

Language:
English Edit question
Status:
Solved
For:
MySQL patches by Codership Edit question
Assignee:
No assignee Edit question
Solved by:
Bogdan Kanivets
Solved:
Last query:
Last reply:
Revision history for this message
Seppo Jaakola (seppo-jaakola) said :
#1

I wonder why there is SQL statement logged for slave thread (update Mapping set archived = true where mapping_id = 75565).
=> Check if you have configured STATEMENT format for replication. binlog_format should be set to ROW to work correctly.

Revision history for this message
Bogdan Kanivets (bkanivets-h) said :
#2

Seppo,

Thanks for reply. You are right, we have STATEMENT binlog_format. We are a bit concerned about making this change on production cluster. It seams replication works for us with STATEMENT in 99%. Can you tell why ROW format is required, is it because of side effects of STATEMENT format?

Revision history for this message
Alex Yurchenko (ayurchen) said :
#3

1) the very reasons ROW format was introduced in native MySQL replication.
2) STATEMENT events can't be applied in parallel.
3) STATEMENT format hasn't been tested with Galera replication at all and is known to crash nodes under some circumstances. I.e. STATEMENT replication with Galera is a very alpha quality.

Revision history for this message
Bogdan Kanivets (bkanivets-h) said :
#4

We've changed the log format and the problem went away. Haven't seen it for two month. Thanks