reverse diffs

Asked by ceg on 2011-03-29

Could duplicity keep reverse diffs, so the latest backup does not depnend on an old backkup plus a chain of diffs?

Question information

Language:
English Edit question
Status:
Solved
For:
Duplicity Edit question
Assignee:
No assignee Edit question
Solved by:
ceg
Solved:
2011-04-01
Last query:
2011-04-01
Last reply:
2011-04-01
ceg (ceg) said : #1

I think this could make sense at least for local destinations, and also in cases where executing code at the destination (backup) machine (like rdiff-backup) is considered safe, but the physical disks are then kept in a shared storage room for example.

edso (ed.so) said : #2

On 29.03.2011 17:14, ceg wrote:
> New question #150909 on Duplicity:
> https://answers.launchpad.net/duplicity/+question/150909
>
> Could duplicity keep reverse diffs, so the latest backup does not depnend on an old backkup plus a chain of diffs?
>

if you mean if duplicity can merge older full/diffs into a new full - no.

ede/duply.net

ceg (ceg) said : #3

> merge older full/diffs into a new full

I guess that is only part of what rdiff-backup does (on the remote site).
I'd like duplicity to keep a current full plus reverse diffs like rdiff-backup (in a chain) where this makes sense (local filesystem, trusted local machine)

edso (ed.so) said : #4

On 29.03.2011 23:10, ceg wrote:
> Question #150909 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/150909
>
> Status: Answered => Open
>
> ceg is still having a problem:
>> merge older full/diffs into a new full
>
> I guess that is only part of what rdiff-backup does (on the remote site).
> I'd like duplicity to keep a current full plus reverse diffs like rdiff-backup (in a chain) where this makes sense (local filesystem, trusted local machine)
>

could you please elaborate what you mean by 'reverse diff'?

to compare rdiff-backup with duplicity is not possible. the first keeps usable data in the repository, the second saves encrypted volumes (full and incr), which have to be restored to be usable again.

ede/duply.net

ceg (ceg) said : #5

Thanks for your reply edso!

From man rdiff-backup (just for the casual reader):

"The target directory ends up a copy (mirror) of the
       source directory, but extra reverse diffs are stored in a special sub-
       directory of that target directory, so you can still recover files lost
       some time ago."

Since the current data is not stored in the form of an old full backup + a chain of diffs, a currupted diff in the chain wont't affect your ability to recover your current data. Instead, you always have a copy (mirror) of your current data, and it's only the older versions that are only available by applying the (reverse, or going back) diffs, and having an increased risk of file corruption in the chain.

Since duplicity, keeps volumes it could rename any volume that has not changed to be part of the current state. If a file in the volume has changed, it could produce a new volume and then calculate a reverse diff on the volume.

Hopefully, with this duplicity should also be able to omit very long and slow full backups (with large slowly changing datasets) every x backups and when purging older versions.

edso (ed.so) said : #6

On 30.03.2011 10:08, ceg wrote:
> Question #150909 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/150909
>
> Status: Answered => Open
>
> ceg is still having a problem:
> Thanks for your reply edso!

no problem

> Since duplicity, keeps volumes it could rename any volume that has not
> changed to be part of the current state. If a file in the volume has
> changed, it could produce a new volume and then calculate a reverse diff
> on the volume.
>
> Hopefully, with this duplicity should also be able to omit very long and
> slow full backups (with large slowly changing datasets) every x backups
> and when purging older versions.
>

duplicity diffs are actually a tgz/gpg'd sum of all rsync detected changes in the backup source. They are only split into volumes to get handy uniform sized chunks for backup repositories that might need it (e.g. gmail).

so actually there is no possibility of renaming/reusing anything. duplicity has to go through the chain and reassemble your data. if corruption occured, then at least some data is definitely lost. there are loose plans to integrate redundancy checksumming, but that is far future.

for now the suggested way to deal with this weakness is to do periodically fulls to keep chains short and the probability of data loss small.

ede/duply.net

ceg (ceg) said : #7

uh, so duplicity is no chunky bacon. ;-)

Doing unnecessary full backups of hundreds of GB if only a couple MB have changed is just too unfortunate.
But having all data stored in a single data stream that causes everything behind a single corruption to be lost, well, many thanks edso for pointing that out!

Shouldn't duplicity chunk the files to volumes first and then compress and gpg them?

Well, probably just switch from using .tar to .dar for various reasons.

Dar seems much better then .tar and already supports reduncancy checksumming etc.

edso (ed.so) said : #8

On 01.04.2011 16:40, ceg wrote:
> Question #150909 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/150909
>
> Status: Answered => Open
>
> ceg is still having a problem:
> uh, so duplicity is no chunky bacon. ;-)
>
> Doing unnecessary full backups of hundreds of GB if only a couple MB have changed is just too unfortunate.
> But having all data stored in a single data stream that causes everything b ehind a single corruption to be lost, o

well, not everything. just the changes that ended up in this volume. Although there is some hassle to deal with, because duplicity will hiccup if gpg dies because of a corrupted file.

eventually this is a legacy from the earliest duplicity and the maintainer is still looking for a new concept as well the time to implement it. see
http://duplicity.nongnu.org/new_format.html

>well, many thanks edso for pointing that out!

np

>
> Shouldn't duplicity chunk the files to volumes first and then compress
> and gpg them?

that's what it does, while gpg also does the compression

>
> Well, probably just switch from using .tar to .dar for various reasons.>
> Dar seems much better then .tar and already supports reduncancy
> checksumming etc.
>

will look into it, still it won't help, because we need a redundancy checksumming for the encrypted files, not their contents.

..ede/duply.net

ceg (ceg) said : #9

From the wanted features you linked, it really looks like dar is a good match.

Dar may provide means for checksumming the final archives. It provides hooks, and includes some scripts it can execute after a slice (volume) has been created. One such script seems to run parchive for adding parity information for the slice (volume).

Duplicity could use dar's encryption mechanism, or have it call gpg before the parchive script adds parity information.

http://dar.linux.free.fr/doc/samples/index.html