tar takes a long time, 100% CPU, with no output

Asked by B.J. Herbison

This morning my backups are taking a long time, how can I tell what tar is doing?

Previous incremental backups have taken up to 10 minutes, full backups up to 20 minutes (fairly new system and I haven't move most of my files onto the system yet).

This morning the first incremental ran for almost an hour until I killed it. The tar process was using 100% of a CPU, and the files.tar.gz was still zero bytes long. I restarted it, and the tar process has 24 minutes of CPU time with no recorded output.

Question 1: Should I expect to see the tar file have zero bytes until the end, or should I see the file size increase as it goes?

Question 2: How can I debug this? Is there a way to get debug output from tar as run from nssbackup 0.2.1?

Question information

Language:
English Edit question
Status:
Solved
For:
nssbackup Edit question
Assignee:
No assignee Edit question
Solved by:
Jean-Peer Lorenz
Solved:
Last query:
Last reply:
Revision history for this message
B.J. Herbison (bj-herbison) said :
#1

Additional information: The target device is a mounted network drive, but it is available with over 1TB free. The backup created nine files in the target directory so it is writable. Nothing looks unusual in the log file, the last line talks about invoking tar.

Revision history for this message
Anton (feenstra) said :
#2

On 01/07/10 12:49, B.J. Herbison wrote:
> Question #116342 on NSsbackup changed:
> https://answers.launchpad.net/nssbackup/+question/116342
>
> B.J. Herbison gave more information on the question:
> Additional information: The target device is a mounted network drive,
> but it is available with over 1TB free. The backup created nine files in
> the target directory so it is writable. Nothing looks unusual in the
> log file, the last line talks about invoking tar.

To answer your previous questions, yes in most systems you should expect
the filesize to grow as tar produces its output. In some cases, however
the filesize may not be updated, e.g. over nfs on some systems. Do you
have direct access to the server on which your network drive resides?
There, certainly, filesize must be up to date and thus should be growing.

Other ways to monitor tar is by 'top'; you should normally see
alternating activity of tar (collecting and reading files) and gzip
(compressing tar's output on the fly). In addition, your network monitor
should register outgoing data. Finally, using 'iotop', you should be
able to monitor disk io of the tar process reading files.

Collecting this information should give more clues about where the error
originates. Could you report back to us with that?

--
Groetjes,

Anton
  _____________ _______________________________________________________
| | |
| _ _ ___,| K. Anton Feenstra |
| / \ / \'| | | IBIVU/Bioinformatics - Free University Amsterdam |
|( | )| | | De Boelelaan 1083A - 1081 HV Amsterdam - Netherlands |
| \_/ \_/ | | | Tel +31 20 59 87783 - Fax +31 20 59 87653 - Room P136 |
| | <email address hidden> - www.few.vu.nl/~feenstra/ |
| | "Just a Minute While I Reinvent Myself" (Red Hot |
| | Chili Peppers) |
|_____________|_______________________________________________________|

Revision history for this message
B.J. Herbison (bj-herbison) said :
#3

Thank you for the response. I appreciate that you did not write and do not maintain tar, but I hope you have insight into how nssbackup is using tar.

I don't have insight into the NAS device, but a simple two window experiment with cat and ls -l showed that writing to a file shows up in the file size before the file is closed so it appears that the backup process is hung before gzip has enough data to start writing to disk.

Looking at top, tar is taking 100% of a CPU and gzip 0%. Looking at ps, tar has over nine minutes of CPU time in ten minutes, gzip still shows up at "0:00". There was a burst of network traffic at the backup start (writing the nine files relating to the backup) but the network traffic has been under 4 KiB/s since then. I did not catch the start of the backup with iotop, but I never saw the tar process on the list. (I should have read the man page and used "iotop -o" or pressed "o" before the backup started.)

Do you have any suggestions for what I could try next?

I remember tar having some problems with long file names quite a while ago (over a decade) but I don't remember the symptoms -- and I don't remember doing anything strange that would add unusually long file names between when the backups worked and when they stopped.

(By the way, do the old nssback.<date>.log files left by a failed run get cleaned up automatically at some point or will I need to clean them up manually when this issue gets resolved?)

Thank you for your help with this problem.

Revision history for this message
Best Jean-Peer Lorenz (peer.loz) said :
#4

Hello,
do backups work for you again? You've written 'the first incremental backup this morning'; how long take the backups now?

I guess you are seeing a problematic behavior of tar here: in tests I did on my machine I've seen similar things. Tar is creating the incremental snapshot file (snar file) when using 100% cpu and the files.tar.* file does not increase in size meanwhile. On my machine I experienced 2 hours of preparation followed by 1 hour of data transfer. I don't know exactly what's causing this exactly but I think it is quite undesirable. IMO tar's incremental backup feature has some serious drawbacks.

HTH.
Jean-Peer

Revision history for this message
Jean-Peer Lorenz (peer.loz) said :
#5

> By the way, do the old nssback.<date>.log files left by a failed run get cleaned up automatically

No, don't they and this is intended: if we would remove them you'd have a hard time to track down some problems. What we could do is to remove them when deinstalling the package (good point).

Revision history for this message
B.J. Herbison (bj-herbison) said :
#6

Several updates:

Sorry for the imprecise wording. At the time I wrote my last update no backups had worked for several days.

After Jean-Peer mentioned "2 hours of preparation" I left my machine on overnight. The incremental backup accumulated 420 minutes of CPU time before I killed it this morning, so if if the problem is the same as what Jean-Peer saw I'm not patient enough to find out.

After I killed the process I changed my backup schedule so a full backup was due and started another backup. The full backup completed. I then started another and an incremental backup completed successfully.

Lessons learned:

o There are bugs in tar incremental backup as invoked by nssbackup. Sometimes it will take hours to get started, sometimes over seven hours (and possibly some runs will never finish).

o If there is a problem, run a full backup. This will at least clear the problem temporarily.

Thank you for your assistance with this issue. I'll try to remember to let you know if I see this problem again so we know whether this is a one-time fluke or something that I regularly trigger because of the types/names of files I create and backup.

(On possible source of the issue: I dual-boot this machine with Windows 7 and include some of the Windows files in my backup. The problem file may have been caused by files created when Windows booted, so the file name may be something that wouldn't normally be created by a Linux program. That could explain why I saw the problem but most people don't.)

I created three bug reports (one trivial UI bug and two suggestions for enhancing the configuration) based on my experiences.

Revision history for this message
B.J. Herbison (bj-herbison) said :
#7

Thanks Jean-Peer Lorenz, that solved my question.