myloader freezes/hangs

Asked by Daniel Kaelin on 2011-10-05

I am currently in the process of testing mydumper and myloader on a set of test MySQL Servers hoping it can be a replacement for standard mysqldumps.

I have been able to get the dumping process to work just fine and have no real issues there the problem I am running into is the myloader. This is the exact command I am running in this case:

time myloader -u root -p *server password* -d /export-20111005-081950/

I tried to increase the verbosity of this command but adding -v or --verbose anywhere in the statement results in an error. Whether I put it at the front after "myloader" or at the end. Where should that land in this command?

It appears the database has been restored after I run this command because the database is accessible, I am able to browse through it and the data seems to be complete. The problem is the myloader command never finishes out. After I put in the above command and run it the command line just hangs there and never comes back to the prompt again.

I was running the command below and finally CTRL-C exited out of it after I had let it run for 51 minutes and it should be taking nowhere near this long to restore. Am I doing something wrong with this command or SHOULD it be taking this long to restore?

[root@localhost /]# time myloader -u root -p *server password* -d /export-20111005-081950/

real 51m9.282s
user 0m8.731s
sys 0m3.284s
[root@localhost /]#

Question information

Language:
English Edit question
Status:
Answered
For:
MySQL Data Dumper Edit question
Assignee:
No assignee Edit question
Last query:
2011-10-24
Last reply:
2011-11-04
Andrew Hutchings (linuxjedi) said : #1

Hi Daniel,

What version of mydumper are you using?

myloader should have --verbose=? when ? is a number, --help will give you details.

You may find that if you have a particularly large table and it wasn't broken up into pieces by mydumper (ie. --rows wasn't used or it has a varchar pkey) then that table will be restored single-threaded and may take a while to complete.

MySQL's SHOW PROCESSLIST would give a better idea of what is going on.

Daniel Kaelin (dkaelin) said : #2

I am actually using version 0.51 of the dumper. The latest iteration released 8/12/2011.

So for the verbosity I have to specify what level of verbosity then. That would make sense. I guess I just missed that aspect when reading through the documentation. The verbosity option can appear anywhere in the myloader command?

If I am breaking the mydumper into pieces do you have a recommendation on how many rows I should use when grouping the data? The databases are from Moodle installations so the number of rows is usually under 10,000 in any given table.

Can you elaborate on the varchar pkey? I am not familiar with that particular terminology.

I appreciate the response.

Thanks

Daniel

Daniel Kaelin (dkaelin) said : #3

The installation is being run on CentOS as well.

If you feel like adding any details onto your current documentation these were the dependencies I had to install before I was able to compile the files and get mydumper and myloader working on the server properly.

All of these below installed through the yum install (insert module name)

Dependencies:
cmake
make
gcc
gcc-c++
zlib

Andrew Hutchings (linuxjedi) said : #4

Hi Daniel,

Yes the --verbose= can be anywhere in the command parameters.

The number of rows is entirely dependant on your judgement, maybe 2500 or 5000 to start with (note that the figure is a rough guide and mydumper will make adjustments based on InnoDB statistics).

By varchar pkey I mean if you have a primary key that is not an integer based column mydumper cannot yet estimate how to break the table up so it is dumped as one large dump file. That is a feature we plan to implement in the future.

If you have an apparent hang again please run SHOW FULL PROCESSLIST in the MySQL command line and paste it here so we can see what myloader is doing.

Daniel Kaelin (dkaelin) said : #5

It appears your suggestions are working perfectly for the myloader. Thank you for the input on that.

I was watching the restoration of one of the databases using the myloader command and I was curious how the module handles sequential files when it is loading them into the server.

The mdl_log was the largest table from the restore with 1.3 million rows. I noticed the threads were jumping all over the place as far as what parts were being loaded back in.

Starts with 409, then 511, 355, 276, 115, etc.

** Message: Thread 5 restoring `mdl_bryan`.`mdl_log` part 409
** Message: Thread 8 restoring `mdl_bryan`.`mdl_log` part 511
** Message: Thread 4 restoring `mdl_bryan`.`mdl_log` part 355
** Message: Thread 3 restoring `mdl_bryan`.`mdl_log` part 276
** Message: Thread 7 restoring `mdl_bryan`.`mdl_question_states` part 57
** Message: Thread 7 restoring `mdl_bryan`.`mdl_log` part 115
** Message: Thread 6 restoring `mdl_bryan`.`mdl_log` part 81
** Message: Thread 1 restoring `mdl_bryan`.`mdl_log` part 152
** Message: Thread 2 restoring `mdl_bryan`.`mdl_log` part 287
** Message: Thread 5 restoring `mdl_bryan`.`mdl_question_states` part 10
** Message: Thread 5 restoring `mdl_bryan`.`mdl_log` part 59
** Message: Thread 8 restoring `mdl_bryan`.`mdl_role_allow_override` part 0
** Message: Thread 8 restoring `mdl_bryan`.`mdl_questionnaire_resp_single` part 1
** Message: Thread 8 restoring `mdl_bryan`.`mdl_message_read` part 3
** Message: Thread 8 restoring `mdl_bryan`.`mdl_log` part 45
** Message: Thread 4 restoring `mdl_bryan`.`mdl_log` part 139
** Message: Thread 3 restoring `mdl_bryan`.`mdl_log` part 127
** Message: Thread 7 restoring `mdl_bryan`.`mdl_log` part 451
** Message: Thread 6 restoring `mdl_bryan`.`mdl_question_states` part 152
** Message: Thread 6 restoring `mdl_bryan`.`mdl_question_states` part 34
** Message: Thread 6 restoring `mdl_bryan`.`mdl_log` part 343

Also would you be able to provide any additional information in regards to the daemon mode? Aside from the below documentation you have available I didn't see much else on it.

*** From Your Documentation ***

Mydumper has a daemon mode which will snapshot the dump data every so often whilst continuously retreiving the binary log files. This gives a continuous consistent backup right up to the point where the database server fails. To use this you simply need to use the --daemon option.

In the following example mydumper will use daemon mode, creating a snapshot every half an hour and log to an output file:

mydumper --daemon --snapshot-interval=30 --logfile=dump.log

*** end ***

A few questions I had about the daemon mode.

What information gets put in the log file if you have that enabled?

Are you supposed to specify a database in this mode or is it designed to cover all the databases on a server?

If it does cover all of the databases how does it know where to place the various parts / deltas / updates for each database without the directory option being used?

Is the daemon only designed to periodically update the dumps? I didn't see anything about a daemon for myloader if this was to be ran on a backup database server. Is that best handled in a server script through a CRON to periodically run the myloader to continuously update the database while mydumper is updating the dumps through the daemon?

Once again I appreciate your responsiveness and your help.

Thanks,

Daniel

Launchpad Janitor (janitor) said : #6

This question was expired because it remained in the 'Open' state without activity for the last 15 days.

Daniel Kaelin (dkaelin) said : #7

Any input on the daemon from my post above?

Andrew Hutchings (linuxjedi) said : #8

Hi Daniel,

I apologise for the delay, I have been on vacation.

We should probably do something about the random order issue, I'll file a bug on this. On InnoDB at least we should in theory get higher performance for sequential inserts.

> What information gets put in the log file if you have that enabled?

Just time and date info of snapshots IIRC, basically what is normally sent to the terminal.

> Are you supposed to specify a database in this mode or is it designed to cover all the databases on a server?

Both are possible.

> If it does cover all of the databases how does it know where to place the various parts / deltas / updates for each database without the directory option being used?

It stores two entire snapshots at any time at the moment, in the future I would like it to store deltas or something similar instead. It will automatically create subdirectories for these snapshots as well as a symlink pointing to the last good snapshot.

> Is the daemon only designed to periodically update the dumps? I didn't see anything about a daemon for myloader if this was to be ran on a backup database server. Is that best handled in a server script through a CRON to periodically run the myloader to continuously update the database while mydumper is updating the dumps through the daemon?

It is designed to periodically create a dump of the data as well as continuously retrieving the binary logs. This way you have an up-to-the-second backup of your data. Unfortunately I have not completed the code for myloader to process the binlogs yet.

Can you help with this problem?

Provide an answer of your own, or ask Daniel Kaelin for more information if necessary.

To post a message you must log in.