Would it be a problem if some files were falsely assumed to be unchanged in one scan?

Asked by Daan W.

I am currently pondering how to do /fast/ backups ( http://ubuntuforums.org/showthread.php?t=2142201 )
One possibility that comes to my mind is using filesystem snapshot diffs like btrfs provides them to obtain a list of changed files before I even start the backup.

The general idea goes like this:
- Take an initial file system snapshot S1.
- Every time I want to do an incremental backup:
    - create a new snapshot S2
    - use some file system voodoo to get a list of changes or at least of the changed files between S1 and S2
    - use my backup script/application to only perform a backup of those files since no other files changed between the snapshots
    - reassign S2 to be the new 'current state', i.e. S1 <-- S2

As I gather it would not be a problem to pass such a list to rsync or thelike.
But duplicity doesn't support this so far, right?

Now I am wondering if passing such a list and making a mistake in the process would be a problem for duplicity.
I.e. if I hypothetically implemented a feature to only scan for changed files that are listed in a given file and this list was not complete, would duplicity pick up the other changed files on the next full scan without problems?
I am wondering because obviously speed should not come at the expense of completeness...

Question information

Language:
English Edit question
Status:
Solved
For:
Duplicity Edit question
Assignee:
No assignee Edit question
Solved by:
edso
Solved:
Last query:
Last reply:
Revision history for this message
Best edso (ed.so) said :
#1

On 08.05.2013 02:36, Daan W. wrote:> New question #228382 on Duplicity:
> https://answers.launchpad.net/duplicity/+question/228382
>
> I am currently pondering how to do /fast/ backups ( http://ubuntuforums.org/showthread.php?t=2142201 )
> One possibility that comes to my mind is using filesystem snapshot diffs like btrfs provides them to obtain a list of changed files before I even start the backup.

nifty idea. i like it.

> The general idea goes like this:
> - Take an initial file system snapshot S1.
> - Every time I want to do an incremental backup:
> - create a new snapshot S2
> - use some file system voodoo to get a list of changes or at least of the changed files between S1 and S2
> - use my backup script/application to only perform a backup of those files since no other files changed between the snapshots
> - reassign S2 to be the new 'current state', i.e. S1 <-- S2
>
> As I gather it would not be a problem to pass such a list to rsync or thelike.
> But duplicity doesn't support this so far, right?

use in/exclude feature. convert the list like

+ folder1/file
+ folder2/subfolder/file
- **

and use that as as globbing include list '--include-globbing-filelist'. see manpage
http://duplicity.nongnu.org/duplicity.1.html#sect10

> Now I am wondering if passing such a list and making a mistake in the process would be a problem for duplicity.
> I.e. if I hypothetically implemented a feature to only scan for changed files that are listed in a given file and this list was not complete, would duplicity pick up the other changed files on the next full scan without problems?
> I am wondering because obviously speed should not come at the expense of completeness...
>

duplicity will probably treat all excluded files as deleted, so although the above will work, you will probably loose the incremental feature as deleted files will go completely into the backup set again if provided later via the list.

hacking this feature would be probably quite easy. you could e.g.
 add a duplicity switch e.g. '--filelist-only' that would keep not listed file entries instead of marking them deleted.

..ede/duply.net

Revision history for this message
Daan W. (dwynen) said :
#2

Thanks edso, that solved my question.