Can the "verify" option be used to ensure the entire backup could restore without error?

Asked by Aaron Whitehouse on 2010-07-04

I am looking for a way to verify that the backup could restore without error.

I now use Duplicity as my only backup mechanism. It would be excellent if, once every so often, I could run something that:
(1) did a backup (so that we knew the remote location *should* match the local files);
(2) pulled down all necessary files from the remote backup location (roughly file by file, so that the minimum necessary had to be stored in the local temporary location);
(3) extracted the most recent versions of each file from the backup archives;
(4) compared each file from the backup archives to the local versions; and
(5) reported any errors or differences.

Does the "verify" option in duplicity provide this functionality?

I've read the man page:
http://duplicity.nongnu.org/duplicity.1.html
"verify
    Enter verify mode instead of restore. If the --file-to-restore option is given, restrict verify to that file or directory. duplicity will exit with a non-zero error level if any files are different. On verbosity level 4 or higher, log a message for each file that has changed."
but that wasn't very clear to me.

Question information

Language:
English Edit question
Status:
Solved
For:
Duplicity Edit question
Assignee:
No assignee Edit question
Solved by:
Kenneth Loafman
Solved:
2010-09-21
Last query:
2010-09-21
Last reply:
2010-09-20
Peter Schuller (scode) said : #1

> I am looking for a way to verify that the backup could restore without error.
>
> I now use Duplicity as my only backup mechanism. It would be excellent if, once every so often, I could run something that:
> (1) did a backup (so that we knew the remote location *should* match the local files);
> (2) pulled down all necessary files from the remote backup location (roughly file by file, so that the minimum necessary had to be stored in the local temporary location);
> (3) extracted the most recent versions of each file from the backup archives;
> (4) compared each file from the backup archives to the local versions; and
> (5) reported any errors or differences.
>
> Does the "verify" option in duplicity provide this functionality?

You may wish to wait for Kenneth to weigh in, but I believe verify is
intended to do just that - but it occurs to me that I am not sure how
this interacts with the caching of manifest information. Indeed,
testing it, it appears to be the case. Modifying a file in the cache
directory locally makes the verify fails; this implies the verify will
also succeed (i.e., fail to verify) if the server-side manifest is
corrupted and the local cached version is used during verify. I'll
file a bug on that.

I do think though that if you eliminate the use of cached data verify
should do what you list; or at least that should be the intent.

It occurs to me that there should also be a simple method of disabling
the use of the cache (and perhaps simply have verify imply that).

--
/ Peter Schuller

Thanks a lot, Peter.

To confirm, (aside from your bug) "verify" tests the actual archive files, not just the remote manifests?

So, if my duplicity line is:
duplicity --exclude=$DEST_DIR --include-globbing-filelist=$FILELIST --volsize=10 --sign-key=C512B7B0 --encrypt-key=F0C0AD14 --encrypt-key=D59A4E71 / file://$DEST_DIR --verbosity=8

would my verify line simply be:
duplicity verify --sign-key=C512B7B0 file://$DEST_DIR /
?

And if that does not give any errors, I can sleep easy knowing that every file could restore perfectly from the remote archive?

If that is what verify does, perhaps the man page could be clearer:
"The following command compares the files we backed up, so see what has changed since then:"
To me, that description sounds more like a --dry-run, or similar.

Thanks again!

Peter Schuller (scode) said : #3

> To confirm, (aside from your bug) "verify" tests the actual archive
> files, not just the remote manifests?

I love that you're asking, because this prompted me to test this - and
it seems it does NOT, which should be very clearly documented. I'll
file bugs.

When I've done verification I've actually restored my backups rather
than use verify.

> So, if my duplicity line is:
> duplicity --exclude=$DEST_DIR --include-globbing-filelist=$FILELIST --volsize=10 --sign-key=C512B7B0 --encrypt-key=F0C0AD14 --encrypt-key=D59A4E71 / file://$DEST_DIR --verbosity=8
>
> would my verify line simply be:
> duplicity verify --sign-key=C512B7B0 file://$DEST_DIR /
> ?

I don't use include/exclude but I would presume you'd have to match
your include/exclude patterns or else differences will be found as a
result of that.

> And if that does not give any errors, I can sleep easy knowing that
> every file could restore perfectly from the remote archive?

Apparently not....

> If that is what verify does, perhaps the man page could be clearer:
> "The following command compares the files we backed up, so see what has changed since then:"
> To me, that description sounds more like a --dry-run, or similar.

Agreed, though modified to account for the fact that contents is
seemingly not compared.

--
/ Peter Schuller

Peter Schuller (scode) said : #4

>> If that is what verify does, perhaps the man page could be clearer:
>> "The following command compares the files we backed up, so see what has changed since then:"
>> To me, that description sounds more like a --dry-run, or similar.
>
> Agreed, though modified to account for the fact that contents is
> seemingly not compared.

Well in filing the bug I realized that the above phrasing is part of
an example command, and that the documentation of the 'verify' command
itself is a bit more on the mark. Do you think the phrasing of the
verify command is unclear (other than being wrong given the bug just
filed)?

       verify Enter verify mode instead of restore. If the --file-to-restore
              option is given, restrict verify to that file or directory.
              duplicity will exit with a non-zero error level if any files are
              different. On verbosity level 4 or higher, log a message for
              each file that has changed.

--
/ Peter Schuller

Peter,

Thanks again for all of your help.

Yes, I knew that was an example-I provided the other text in my initial post. To me, that text wasn't clear and that is why I posted this question. It seems to focus too much on things that have changed since the last backup, which I would have thought would be better found by running your normal backup command with the --dry-run option. What I understand "verify" does over and above that is test the integrity of the remote archives, which isn't really mentioned in the man page. If nothing else, it uses "verify" in the definition. Maybe something like the below would be better:
"
verify Enter verify mode instead of restore, to test the integrity of your remote archive
            files without replacing any local files. Duplicity will download the archive files
            from the remote location, decompress and decrypt them, and compare them
            to the local copy of the files. This will both inform you if any files have changed
            in either location since the last backup (on verbosity level 4 or higher, duplicity
            will log a message for each file that has changed) and will alert you to any
            problems in restoring your remote backup (for example, if one of your remote archives
            has corrupted). If the --file-to-restore option is given, duplicity will restrict verify
            to that file or directory. Duplicity will exit with a non-zero error level if any files are
            different or if the remote archive(s) could not be successfully compared.

The following command compares the files backed up to the remote location to the local version of the files. This will tell us what, if anything, has changed since the last backup and will confirm that the remote files could be successfully restored:

    duplicity verify scp://<email address hidden>/some_dir /home/me
"

If you agree with these changes, please let me know and I will file another bug suggesting the changes to the man page.

As you have already made clear, there are two issues here. The first is what duplicity is meant to do. It sounds like we agree that it should provide a means of testing the remote archive files can successfully restore, without actually restoring and replacing your local files with the remote ones. If that is what the command is meant to do, I would suggest that the man page make that clearer, as set out above.

The second issue is that it sounds as though it isn't currently doing that properly. A tool that purports to verify everything but doesn't do so is worse than no tool at all, so thank you very much for filing the bugs about the problems. Presumably I could write some script that ran through each file that was backed up, do a "--file-to-restore" to a temp file, compare that restored file to the original, delete that file and move on to the next one. That seems silly when it can be, and already is mostly, in duplicity itself. As far as you know, Peter, would duplicity's "verify" command (with your bugs fixed) give me an equivalent level of comfort as this imaginary script, or are there other issues with the "verify" option?

I hope that I don't come across as critical. I love duplicity and that is why I'm keen to see it be the default backup program for everything (with the deja dup frontend or similar for desktop distributions). Unlike simply copying files to an external HDD and being able to just check the copy, duplicity files are like random chunks of data. So I get a little nervous that, despite my regular duplicity backups, I'll get caught out when I finally need to rely on the backups. There are enough bugs filed about failed restores to give me some slight hesitation.

So hopefully we can make the verify option the perfect compliment to the backup and restore options, allowing everyone to easily test that they are safe.

Keep up the great work everyone!

Peter Schuller (scode) said : #6

> If you agree with these changes, please let me know and I will file
> another bug suggesting the changes to the man page.

They sounds good to me, but Kenneth will be the final judge ;)

> As you have already made clear, there are two issues here. The first is
> what duplicity is meant to do. It sounds like we agree that it should
> provide a means of testing the remote archive files can successfully
> restore, without actually restoring and replacing your local files with
> the remote ones. If that is what the command is meant to do, I would
> suggest that the man page make that clearer, as set out above.

Agreed.

> The second issue is that it sounds as though it isn't currently doing
> that properly. A tool that purports to verify everything but doesn't do
> so is worse than no tool at all, so thank you very much for filing the
> bugs about the problems. Presumably I could write some script that ran
> through each file that was backed up, do a "--file-to-restore" to a temp
> file, compare that restored file to the original, delete that file and
> move on to the next one. That seems silly when it can be, and already is
> mostly, in duplicity itself. As far as you know, Peter, would
> duplicity's "verify" command (with your bugs fixed) give me an
> equivalent level of comfort as this imaginary script, or are there other
> issues with the "verify" option?

My understanding has always been that what you describe is the intent
of verify. As to whether it gives you the same level of comfort - is a
slightly different question. As I previously indicated I have tended
to do spot checks by actually restoring files rather than using verify
(though I've used verify too).

But I'd say that the *intent* certainly is that 'verify' should allow
you to rest easy.

In reality in addition to fixing the bug about cache penetration and
checking content, there should probably be tests that ensure that all
these things are checked now and in the future.

Disclaimer: My knowledge of the code is spotty, depending on which
parts of it I've worked on in the past. In the case of verify, I'm
essentially making assumptions based on general understanding of the
man page coupled with a general vague remembrance of the code.

But again, my expectation was certainly that it did verify all
contents. It could be that expectation was wrong; we still haven't
heard from anyone else in this thread. Though in either case, I think
said expectation matches what a 'verify' command *should* do,
regardless of original intent.

> I hope that I don't come across as critical. I love duplicity and that
> is why I'm keen to see it be the default backup program for everything
> (with the deja dup frontend or similar for desktop distributions).
> Unlike simply copying files to an external HDD and being able to just
> check the copy, duplicity files are like random chunks of data. So I get
> a little nervous that, despite my regular duplicity backups, I'll get
> caught out when I finally need to rely on the backups. There are enough
> bugs filed about failed restores to give me some slight hesitation.

Skepticism when it comes to backups is a good thing IMO, and something
we need more of in this world. Please don't apologize for that ;)

--
/ Peter Schuller

edso (ed.so) said : #7

i am confused ... when exactly does verify not verify? When timestamps match?

..ede

Peter Schuller (scode) said : #8

> edso requested for more information:
> i am confused ... when exactly does verify not verify? When timestamps
> match?

The actual contents of files does not seem to be compared. So for
example, if you avoid it borking out on meta-data by ensuring
mtime/permissions/etc are correct, yet you introduce corruption in
file contents, the verify still succeeds.

The other part of the problem is that the archive cache is not
penetrated, so the verification is run based on information which is
not confirmed to match that which is in the remote backup.

--
/ Peter Schuller

Peter Schuller (scode) said : #9

And btw, launchpad is auto-answering the question every time I respond
I think... I'm not manually doing it.

--
/ Peter Schuller

edso (ed.so) said : #10

ok .. one would expect a per file restore to a temporary location and a comparision between this and the original...

Regarding the cache: Isn't the archive cache brought up to date on every run?

..ede/duply.net

Peter Schuller (scode) said : #11

> Regarding the cache: Isn't the archive cache brought up to date on every
> run?

Yes, but not in the sense implied by a verify. If the file set matches
(not sure whether sizes are checked), that's enough. Since the point
of the cache is to not download the data, clearly we cannot, in the
general case, confirm that the content matches.

So for that reason I view it as a problem that verify does not
penetrate said cache, rather than a problem with the cache.

--
/ Peter Schuller

edso (ed.so) said : #12

Hence verify has to ignore the archive cache or update it? In case of updating it, would it pose a problem if they are unsynchronized (local more recent then remote)?

..ede

Peter Schuller wrote:
>> I am looking for a way to verify that the backup could restore without error.
>>
>> I now use Duplicity as my only backup mechanism. It would be excellent if, once every so often, I could run something that:
>> (1) did a backup (so that we knew the remote location *should* match the local files);
>> (2) pulled down all necessary files from the remote backup location (roughly file by file, so that the minimum necessary had to be stored in the local temporary location);
>> (3) extracted the most recent versions of each file from the backup archives;
>> (4) compared each file from the backup archives to the local versions; and
>> (5) reported any errors or differences.
>>
>> Does the "verify" option in duplicity provide this functionality?
>
> You may wish to wait for Kenneth to weigh in, but I believe verify is
> intended to do just that - but it occurs to me that I am not sure how
> this interacts with the caching of manifest information. Indeed,
> testing it, it appears to be the case. Modifying a file in the cache
> directory locally makes the verify fails; this implies the verify will
> also succeed (i.e., fail to verify) if the server-side manifest is
> corrupted and the local cached version is used during verify. I'll
> file a bug on that.
>
> I do think though that if you eliminate the use of cached data verify
> should do what you list; or at least that should be the intent.
>
> It occurs to me that there should also be a simple method of disabling
> the use of the cache (and perhaps simply have verify imply that).

[resend with complete CC list]

Going to jump in here rather than later in the thread where its messy.

Duplicity does verify the contents of the archives *as they were*, it
does not do a comparison with the contents on the filesystem. Verify is
done by comparing the archive contents with the stored signatures, i.e.
the original file with its hash value.

The assumption is that the filesystem will probably change shortly after
backup. What you look for in a verify is a check to see if the backup
is stored properly and can be restored. If you want a comparison
function, you'll need to restore and compare the original with the
restored files, or provide a direct comparison function for us to
integrate into duplicity.

If you want to test verify, backup to a local file system, hexedit one
of the archives and try to verify. It will fail to verify. You can
modify the original files at will, and verify will succeed, as it is
designed to do.

...Ken

edso (ed.so) said : #14

Works for me .. We should document this more clearly in the man page. Something like

"
Verify is done by comparing the archive contents with the stored signatures, i.e. the archived file with its hash value at archival time.
If the --file-to-restore option is given, restrict verify to that file or directory. duplicity will exit with a non-zero error level if any files are different. On verbosity level 4 or higher, log a message for each file that has changed.
"

ede

Peter Schuller (scode) said : #15

> Hence verify has to ignore the archive cache or update it? In case of
> updating it, would it pose a problem if they are unsynchronized (local
> more recent then remote)?

Ignore seems better to me (ignoring implementation difficulties).

--
/ Peter Schuller

Peter Schuller (scode) said : #16

[kenneth responded but the question didn't end up on cc; quoting him entirely]

> Going to jump in here rather than later in the thread where its messy.
>
> Duplicity does verify the contents of the archives *as they were*, it
> does not do a comparison with the contents on the filesystem.  Verify is
> done by comparing the archive contents with the stored signatures, i.e.
> the original file with its hash value.
>
> The assumption is that the filesystem will probably change shortly after
> backup.  What you look for in a verify is a check to see if the backup
> is stored properly and can be restored.  If you want a comparison
> function, you'll need to restore and compare the original with the
> restored files, or provide a direct comparison function for us to
> integrate into duplicity.
>
> If you want to test verify, backup to a local file system, hexedit one
> of the archives and try to verify.  It will fail to verify.  You can
> modify the original files at will, and verify will succeed, as it is
> designed to do.

Ok. The fact that it even reported discrepancies at all (with meta
data) always gave me the impression it was intended to actually
compare. And this is not at all useless since a good backup procedure
will typically involve, where possible, backing up data from a
snapshot of a file system rather than a live file system, in which
case such behavior would indeed be useful.

But ok - so verify does, we still believe, verify that the backup has
not been corrupted along the way, but makes no claims to compare
against actual file system contents (even when that contents may be
matching exactly).

In short then, 'verify' verifies that whatever was backed up still
seems to be modified, but does not really try to verify the
correctness of that backup?

I wonder though it not more people are under my mistaken impression
since the meta data discrepancies are clearly reported. If the intent
of verify is just to verify internal integrity, why is a file system
even involved in the process (i.e., why even compare a file system
hierarchy at all)?

Oh well.

--
/ Peter Schuller

edso (ed.so) said : #17

Question:

If verify is supposed to check a backups integrity why do I receive messages like

"Difference found: File test.log has mtime Wed Jul 7 02:02:38 2010, expected Wed Jul 7 01:36:05 2010"

Obviously the time stamp are compared.

..ede

Thanks Kenneth Loafman, that solved my question.

Ken, does that mean that a verify over the internet needs to download all the remote archives to the local machine so that the local machine can decrypt/decompress and compare them?

Exactly.

Thanks Kenneth Loafman, that solved my question.

FYI, I have filed a bug report requesting a "--test-restore" option (Bug #643973) and a bug requesting that the manpage entries for "verify" be clarified (Bug #644816).