Duplicity Adds the same file again after full backup

Asked by Rodrigo Alvarez

I run a backup on my photo library and since it is the first time duplicity does a full backup. It successfully finishes by adding all the files and creating the duplicity-full* volumes and duplicity-full-signatures.* files.

Next day I run the backup again, and now duplicity start an incremental backup. However, this time around, it also adds a lot of files again. These files were already added in the first full backup and have not changed since 2010.

On the third day, I run my backup again and now duplicity is does not add all this files (as it should).

I'm running a rather large set so the double store is a good-sized nuisance. Any suggestions? How is duplicity determining that the files change/or don't?

This is running on Ubuntu 12.04 and duplicity 0.6.18 (February 29, 2012) with Python 2.7.3.

Your advice would be greatly appreciated.

Question information

Language:
English Edit question
Status:
Answered
For:
Duplicity Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
edso (ed.so) said :
#1

On 03.05.2014 20:56, Rodrigo Alvarez wrote:
> New question #248111 on Duplicity:
> https://answers.launchpad.net/duplicity/+question/248111
>
> I run a backup on my photo library and since it is the first time duplicity does a full backup. It successfully finishes by adding all the files and creating the duplicity-full* volumes and duplicity-full-signatures.* files.
>
> Next day I run the backup again, and now duplicity start an incremental backup. However, this time around, it also adds a lot of files again. These files were already added in the first full backup and have not changed since 2010.
>
> On the third day, I run my backup again and now duplicity is does not add all this files (as it should).
>
> I'm running a rather large set so the double store is a good-sized nuisance. Any suggestions? How is duplicity determining that the files change/or don't?

duplicity checks modification time, size, permissions, owner .. if there is a difference the files are backed up again. backing up means running it against librsync, comparing blocks (rolling checksum) and only saving changed blocks and changed metadata.

so, if your files really didn't change but are added again it shouldn't hurt as no data changed.

anyway run duplicity with max. verbosity '-v9' and it will tell you, a bit cryptic but understandable, why it thinks the file should be backed up agn.

>
> This is running on Ubuntu 12.04 and duplicity 0.6.18 (February 29, 2012) with Python 2.7.3.

update to the latest stable.. lots of changes since 2 years

what is your source's file system?

..ede/duply.net

Revision history for this message
Rodrigo Alvarez (rodrigo-alvarez-i) said :
#2

Hi Edso,

Thanks for your reply. Indeed the behavior you describe is what I expect, so I took your advice and ran with -v9 in an attempt to reproduce the anomaly with greater visibility. Overall, it seems that although on the 1st pass duplicity states that it added a set of files (~53000), during the second pass it compares them against "None", as if it had not really added them. Details of what I did and what I observed follow.

I clear the backend of all duplicity volumes and signature files

1st Pass
I start a new backup of my ~500GB photo library. Duplicity recognizes there are no previous backups and starts adding all files. Duplicity runs for about ~14 hrs and finishes with the following statistics:
--------------[ Backup Statistics ]--------------
StartTime 1399264740.37 (Sun May 4 21:39:00 2014)
EndTime 1399315243.63 (Mon May 5 11:40:43 2014)
ElapsedTime 50503.26 (14 hours 1 minute 43.26 seconds)
SourceFiles 146166
SourceFileSize 534825699131 (498 GB)
NewFiles 146166
NewFileSize 534825699131 (498 GB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 146166
RawDeltaSize 534812567355 (498 GB)
TotalDestinationSizeChange 517724175942 (482 GB)
Errors 0
-------------------------------------------------

2nd Pass
I rerun duplicity, now expecting no incremental data to be added because the source files have not changed, yet in the end duplicity returns the following statistics indicating a new ~183 GB. I see this as a set of incremental volumes on the backend.

--------------[ Backup Statistics ]--------------
StartTime 1399362663.27 (Tue May 6 00:51:03 2014)
EndTime 1399378166.41 (Tue May 6 05:09:26 2014)
ElapsedTime 15503.14 (4 hours 18 minutes 23.14 seconds)
SourceFiles 146166
SourceFileSize 534825699131 (498 GB)
NewFiles 53631
NewFileSize 196565375491 (183 GB)
DeletedFiles 0
ChangedFiles 4
ChangedFileSize 20463616 (19.5 MB)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 53635
RawDeltaSize 196559616928 (183 GB)
TotalDestinationSizeChange 189125153141 (176 GB)
Errors 0

3rd pass
I rerun duplicity one 3rd time, and this time I see the behavior I expected on the second pass, which is not to add any incremental.
--------------[ Backup Statistics ]--------------
StartTime 1399381492.80 (Tue May 6 06:04:52 2014)
EndTime 1399381602.38 (Tue May 6 06:06:42 2014)
ElapsedTime 109.58 (1 minute 49.58 seconds)
SourceFiles 146166
SourceFileSize 534825699131 (498 GB)
NewFiles 0
NewFileSize 0 (0 bytes)
DeletedFiles 0
ChangedFiles 4
ChangedFileSize 20463616 (19.5 MB)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 4
RawDeltaSize 72022 (70.3 KB)
TotalDestinationSizeChange 4978 (4.86 KB)
Errors 0

Probe 2nd pass logs to see what was added
Now I analyze the log files (written with -v9) to see which files were added on the 2nd pass.
$ pass1log=duplicity_home_to_primary_bakup_2014_05_04-213859.log
$ pass2log=duplicity_home_to_primary_bakup_2014_05_06-005101.log
$ pass3log=duplicity_home_to_primary_bakup_2014_05_06-060450.log
$ cat ${pass2log} | grep "^A " | head
A Masters/Rolls/2010/10/_MG_0246.CR2
A Masters/Rolls/2010/10/_MG_0247.CR2
A Masters/Rolls/2010/10/_MG_0248.CR2
A Masters/Rolls/2010/10/_MG_0249.CR2
… and another some 53631 more files…

Probe 1st pass logs to see if re-added files where there
I probe the 1st pass log, focusing on the first file that was added on the second pass (even though it was there on the 1st pass). On the first pass I see that this file was Selected, Compared to None, Got Delta, Added and that the volume was written.
$ fileOfInterest="Masters/Rolls/2010/10/_MG_0246.CR2"
$ grep -A 400 "${fileOfInterest}" ${pass1log}
Selecting /home/Photos/Masters/Rolls/2010/10/_MG_0246.CR2
Comparing ('Masters', 'Rolls', '2010', '10', '_MG_0246.CR2') and None
Getting delta of (('Masters', 'Rolls', '2010', '10', '_MG_0246.CR2') /home/Photos/Masters/Rolls/2010/10/_MG_0246.CR2 reg) and None
A Masters/Rolls/2010/10/_MG_0246.CR2
…. A few lines lower I see the volume was sent to be written.
AsyncScheduler: scheduling task for asynchronous execution
AsyncScheduler: active workers = 1
Processed volume 1254
Writing /media/primary_backup/raid_bkup/Photos/duplicity-full.20140505T043900Z.vol1254.difftar.gpg
Registering (mktemp) temporary file /home/Backups/.duplicity_tmp/duplicity-ATyk1O-tempdir/mktemp-wHBzYl-1256
… and a few lines later I see the write succeeded.
Deleting /home/Backups/.duplicity_tmp/duplicity-ATyk1O-tempdir/mktemp-cK6Ccl-1255
Forgetting temporary file /home/Backups/.duplicity_tmp/duplicity-ATyk1O-tempdir/mktemp-cK6Ccl-1255
AsyncScheduler: task execution done (success: True)
AsyncScheduler: active workers = 0

Probe 2nd pass logs to see why files where re-added
I probe the 2nd pass log the same way and I see that the same file, was Selected, Compared to NONE <- This doesn't make sense! Added, etc. It seems that for some reason, duplicity thinks this file was NOT added on the 1st pass.
$ grep -A 5 "${fileOfInterest}" ${pass2log}
Selecting /home/Photos/Masters/Rolls/2010/10/_MG_0246.CR2
Comparing ('Masters', 'Rolls', '2010', '10', '_MG_0246.CR2') and None
Getting delta of (('Masters', 'Rolls', '2010', '10', '_MG_0246.CR2') /home/Photos/Masters/Rolls/2010/10/_MG_0246.CR2 reg) and None
A Masters/Rolls/2010/10/_MG_0246.CR2

Probe 3rd pass log to see why files were not re-added (as expected).
Finally, I probe the 3rd pass log for the same file and I see that the file was Selected and Compared to its previous version.
grep -A 5 "${fileOfInterest}" ${pass3log}
Selecting /home/Photos/Masters/Rolls/2010/10/_MG_0246.CR2
Comparing ('Masters', 'Rolls', '2010', '10', '_MG_0246.CR2') and ('Masters', 'Rolls', '2010', '10', '_MG_0246.CR2')

It appears that for some reason duplicity did not add this file (or the other +53000) thousand files to its record on the first pass (although it did add data for this.) Can you think of any reason why this would be the case? I'm running duplicity as root, could this be permission issues?

As you will note, it is somewhat unsatisfying that the 2nd backup consumes an unnecessary ~183GB of space. I've uploaded the full logs for the three runs to http://sira.duckdns.org/temp//duplicity_logs_photos_bakup_2014_05.tar.gz in case you want to probe them deeper.

Thank you for your consideration!

Revision history for this message
edso (ed.so) said :
#3

On 07.05.2014 08:46, Rodrigo Alvarez wrote:
> Question #248111 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/248111
>
> Status: Answered => Open
>
> Rodrigo Alvarez is still having a problem:
> Hi Edso,
>
> Thanks for your reply. Indeed the behavior you describe is what I
> expect, so I took your advice and ran with -v9 in an attempt to
> reproduce the anomaly with greater visibility. Overall, it seems that
> although on the 1st pass duplicity states that it added a set of files
> (~53000), during the second pass it compares them against "None", as if
> it had not really added them. Details of what I did and what I
> observed follow.
>
> I clear the backend of all duplicity volumes and signature files
>
> 1st Pass
> I start a new backup of my ~500GB photo library. Duplicity recognizes there are no previous backups and starts adding all files. Duplicity runs for about ~14 hrs and finishes with the following statistics:
> --------------[ Backup Statistics ]--------------
> StartTime 1399264740.37 (Sun May 4 21:39:00 2014)
> EndTime 1399315243.63 (Mon May 5 11:40:43 2014)
> ElapsedTime 50503.26 (14 hours 1 minute 43.26 seconds)
> SourceFiles 146166
> SourceFileSize 534825699131 (498 GB)
> NewFiles 146166
> NewFileSize 534825699131 (498 GB)
> DeletedFiles 0
> ChangedFiles 0
> ChangedFileSize 0 (0 bytes)
> ChangedDeltaSize 0 (0 bytes)
> DeltaEntries 146166
> RawDeltaSize 534812567355 (498 GB)
> TotalDestinationSizeChange 517724175942 (482 GB)
> Errors 0
> -------------------------------------------------
>
> 2nd Pass
> I rerun duplicity, now expecting no incremental data to be added because the source files have not changed, yet in the end duplicity returns the following statistics indicating a new ~183 GB. I see this as a set of incremental volumes on the backend.
>
> --------------[ Backup Statistics ]--------------
> StartTime 1399362663.27 (Tue May 6 00:51:03 2014)
> EndTime 1399378166.41 (Tue May 6 05:09:26 2014)
> ElapsedTime 15503.14 (4 hours 18 minutes 23.14 seconds)
> SourceFiles 146166
> SourceFileSize 534825699131 (498 GB)
> NewFiles 53631
> NewFileSize 196565375491 (183 GB)
> DeletedFiles 0
> ChangedFiles 4
> ChangedFileSize 20463616 (19.5 MB)
> ChangedDeltaSize 0 (0 bytes)
> DeltaEntries 53635
> RawDeltaSize 196559616928 (183 GB)
> TotalDestinationSizeChange 189125153141 (176 GB)
> Errors 0
>
> 3rd pass
> I rerun duplicity one 3rd time, and this time I see the behavior I expected on the second pass, which is not to add any incremental.
> --------------[ Backup Statistics ]--------------
> StartTime 1399381492.80 (Tue May 6 06:04:52 2014)
> EndTime 1399381602.38 (Tue May 6 06:06:42 2014)
> ElapsedTime 109.58 (1 minute 49.58 seconds)
> SourceFiles 146166
> SourceFileSize 534825699131 (498 GB)
> NewFiles 0
> NewFileSize 0 (0 bytes)
> DeletedFiles 0
> ChangedFiles 4
> ChangedFileSize 20463616 (19.5 MB)
> ChangedDeltaSize 0 (0 bytes)
> DeltaEntries 4
> RawDeltaSize 72022 (70.3 KB)
> TotalDestinationSizeChange 4978 (4.86 KB)
> Errors 0
>
> Probe 2nd pass logs to see what was added
> Now I analyze the log files (written with -v9) to see which files were added on the 2nd pass.
> $ pass1log=duplicity_home_to_primary_bakup_2014_05_04-213859.log
> $ pass2log=duplicity_home_to_primary_bakup_2014_05_06-005101.log
> $ pass3log=duplicity_home_to_primary_bakup_2014_05_06-060450.log
> $ cat ${pass2log} | grep "^A " | head
> A Masters/Rolls/2010/10/_MG_0246.CR2
> A Masters/Rolls/2010/10/_MG_0247.CR2
> A Masters/Rolls/2010/10/_MG_0248.CR2
> A Masters/Rolls/2010/10/_MG_0249.CR2
> … and another some 53631 more files…
>
> Probe 1st pass logs to see if re-added files where there
> I probe the 1st pass log, focusing on the first file that was added on the second pass (even though it was there on the 1st pass). On the first pass I see that this file was Selected, Compared to None, Got Delta, Added and that the volume was written.
> $ fileOfInterest="Masters/Rolls/2010/10/_MG_0246.CR2"
> $ grep -A 400 "${fileOfInterest}" ${pass1log}
> Selecting /home/Photos/Masters/Rolls/2010/10/_MG_0246.CR2
> Comparing ('Masters', 'Rolls', '2010', '10', '_MG_0246.CR2') and None
> Getting delta of (('Masters', 'Rolls', '2010', '10', '_MG_0246.CR2') /home/Photos/Masters/Rolls/2010/10/_MG_0246.CR2 reg) and None
> A Masters/Rolls/2010/10/_MG_0246.CR2
> …. A few lines lower I see the volume was sent to be written.
> AsyncScheduler: scheduling task for asynchronous execution
> AsyncScheduler: active workers = 1
> Processed volume 1254
> Writing /media/primary_backup/raid_bkup/Photos/duplicity-full.20140505T043900Z.vol1254.difftar.gpg
> Registering (mktemp) temporary file /home/Backups/.duplicity_tmp/duplicity-ATyk1O-tempdir/mktemp-wHBzYl-1256
> … and a few lines later I see the write succeeded.
> Deleting /home/Backups/.duplicity_tmp/duplicity-ATyk1O-tempdir/mktemp-cK6Ccl-1255
> Forgetting temporary file /home/Backups/.duplicity_tmp/duplicity-ATyk1O-tempdir/mktemp-cK6Ccl-1255
> AsyncScheduler: task execution done (success: True)
> AsyncScheduler: active workers = 0
>
> Probe 2nd pass logs to see why files where re-added
> I probe the 2nd pass log the same way and I see that the same file, was Selected, Compared to NONE <- This doesn't make sense! Added, etc. It seems that for some reason, duplicity thinks this file was NOT added on the 1st pass.
> $ grep -A 5 "${fileOfInterest}" ${pass2log}
> Selecting /home/Photos/Masters/Rolls/2010/10/_MG_0246.CR2
> Comparing ('Masters', 'Rolls', '2010', '10', '_MG_0246.CR2') and None
> Getting delta of (('Masters', 'Rolls', '2010', '10', '_MG_0246.CR2') /home/Photos/Masters/Rolls/2010/10/_MG_0246.CR2 reg) and None
> A Masters/Rolls/2010/10/_MG_0246.CR2
>
> Probe 3rd pass log to see why files were not re-added (as expected).
> Finally, I probe the 3rd pass log for the same file and I see that the file was Selected and Compared to its previous version.
> grep -A 5 "${fileOfInterest}" ${pass3log}
> Selecting /home/Photos/Masters/Rolls/2010/10/_MG_0246.CR2
> Comparing ('Masters', 'Rolls', '2010', '10', '_MG_0246.CR2') and ('Masters', 'Rolls', '2010', '10', '_MG_0246.CR2')
>
> It appears that for some reason duplicity did not add this file (or the
> other +53000) thousand files to its record on the first pass (although
> it did add data for this.) Can you think of any reason why this would
> be the case? I'm running duplicity as root, could this be permission
> issues?
>
> As you will note, it is somewhat unsatisfying that the 2nd backup
> consumes an unnecessary ~183GB of space. I've uploaded the full logs
> for the three runs to
> http://sira.duckdns.org/temp//duplicity_logs_photos_bakup_2014_05.tar.gz
> in case you want to probe them deeper.
>
> Thank you for your consideration!
>

aren't you polite :)

ok.. here're my thoughts. first check your backend backup file sizes. the second go should not be sized 183GB but around 19.5MB. the second go status states

NewFileSize 196565375491 (183 GB)
...
ChangedFiles 4
ChangedFileSize 20463616 (19.5 MB)

which essentially means that there were 4 changed files that together have the size of 183GB of that were 19.5MB changed and only this change is going to be backed up (google "rsync rolling checksum").

on the other hand the second log also has lots of entries like

Selecting path/to/file.name
Comparing ('path', 'to', 'file.name') and None
Getting delta of (('path', 'to', 'file.name') /path/to/file.name reg) and None
A path/to/file.name

adding files via A is totally normal and simply means the file was not deleted.
comparing against none however means it couldn't find the file in the backup.

at this point:

update to latest release 0.6.23 or wait for soonish 0.6.24 and see if the error persists. no point trying to figure out a bug that might be solved since two years ago.

..ede/duply

Revision history for this message
Rodrigo Alvarez (rodrigo-alvarez-i) said :
#4

Hi Edso, Thanks for the follow up.

I looked at the commutative file size of all the duplicity-inc* volumes from my second pass and `du -hs` reports 177GB; Not what we'd expect from rdiff/librsync rolling checksums. I've upgraded to duplicity 0.63, cleared all volumes, caches and temp files and launched a new backup set. I'll report back in a couple of days to see if this fixes the "Comparing to None" issue.

Best,

+Rodrigo

Revision history for this message
edso (ed.so) said :
#5

On 07.05.2014 11:06, Rodrigo Alvarez wrote:
> Question #248111 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/248111
>
> Rodrigo Alvarez posted a new comment:
> Hi Edso, Thanks for the follow up.
>
> I looked at the commutative file size of all the duplicity-inc* volumes
> from my second pass and `du -hs` reports 177GB; Not what we'd expect
> from rdiff/librsync rolling checksums. I've upgraded to duplicity
> 0.63, cleared all volumes, caches and temp files and launched a new
> backup set. I'll report back in a couple of days to see if this fixes
> the "Comparing to None" issue.
>

hmm, usually the changed size is reflected in the incr bkp size. but in your case it also stated lot's of new files, so your finding seems consequential.

TIP:
do a verify between the runs! it'll inform you if it thinks there were changes.

..ede/duply.net

Revision history for this message
edso (ed.so) said :
#6

On 07.05.2014 11:21, edso wrote:
> Question #248111 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/248111
>
> edso proposed the following answer:
> On 07.05.2014 11:06, Rodrigo Alvarez wrote:
>> Question #248111 on Duplicity changed:
>> https://answers.launchpad.net/duplicity/+question/248111
>>
>> Rodrigo Alvarez posted a new comment:
>> Hi Edso, Thanks for the follow up.
>>
>> I looked at the commutative file size of all the duplicity-inc* volumes
>> from my second pass and `du -hs` reports 177GB; Not what we'd expect
>> from rdiff/librsync rolling checksums. I've upgraded to duplicity
>> 0.63, cleared all volumes, caches and temp files and launched a new
>> backup set. I'll report back in a couple of days to see if this fixes
>> the "Comparing to None" issue.
>>
>
> hmm, usually the changed size is reflected in the incr bkp size. but in
> your case it also stated lot's of new files, so your finding seems
> consequential.
>
> TIP:
> do a verify between the runs! it'll inform you if it thinks there were changes.
>

ok, just had another look at my duplicity stats. i confused ChangedFileSize with RawDeltaSize
it's pretty self explanatory, but i add some hints anyway
Delta's are modified existing block chunks. if there were only chunks added to a file you'll get no Delta's.

--------------[ Backup Statistics ]--------------
StartTime 1399424418.52 (Wed May 7 03:00:18 2014)
EndTime 1399424651.44 (Wed May 7 03:04:11 2014)
ElapsedTime 232.92 (3 minutes 52.92 seconds)
SourceFiles 113455
SourceFileSize 4560812785 (4.25 GB)
NewFiles 281
NewFileSize 9192719 (8.77 MB)
DeletedFiles 8
ChangedFiles 81
ChangedFileSize 46077585 (43.9 MB)
ChangedDeltaSize 0 (0 bytes)
^-- no changes in the changed files only additions
DeltaEntries 370
^-- count of chunks changed in total
RawDeltaSize 9569326 (9.13 MB)
^-- amount of bytes to backup in total
TotalDestinationSizeChange 6696412 (6.39 MB)
^-- not sure why it's less here, compression?
Errors 7
^-- i had some errors ;) there are some files inaccessible due to permissions in my backup which i don't care about
-------------------------------------------------

..ede/duply.net

Revision history for this message
Rodrigo Alvarez (rodrigo-alvarez-i) said :
#7

Dear Edso,

I ran another test after updating to duplicity 0.23 and the issue persists.

1. I upgraded duplicity, cleaned all out cache and tmp directories and ran a backup.
2. Ran verify
3. Ran a 2nd backup pass
4. Ran a 3rd backup pass.

On the 2nd backup pass, I still observed that about ~180 GB of data was re-added (as listed in the log and confirmed by looking at the volumes), even though the 1st pass (and now the verify) state that those files where there. The 3rd pass behaved as expected and did not add any additional files or data.

You had previously asked about which file system I'm running. I have ext4 on both source and destination.

I've uploaded a new set of logs (~6MB) )to http://sira.duckdns.org/temp/duplicity_logs_photos_bakup_2014_05_08.tar.gz

Poking into the code, I found that the comparison that triggers the 'delta add' is happening around line 172 of pyshared/duplicity/diffdir.py, but without much context, I'm worry to launch into a hacking spree. I'd be happy to add any debug statements and rerun.

Best,

+R

On May 7, 2014, at 2:47 AM, edso <email address hidden> wrote:

> Your question #248111 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/248111
>
> edso proposed the following answer:
> On 07.05.2014 11:21, edso wrote:
>> Question #248111 on Duplicity changed:
>> https://answers.launchpad.net/duplicity/+question/248111
>>
>> edso proposed the following answer:
>> On 07.05.2014 11:06, Rodrigo Alvarez wrote:
>>> Question #248111 on Duplicity changed:
>>> https://answers.launchpad.net/duplicity/+question/248111
>>>
>>> Rodrigo Alvarez posted a new comment:
>>> Hi Edso, Thanks for the follow up.
>>>
>>> I looked at the commutative file size of all the duplicity-inc* volumes
>>> from my second pass and `du -hs` reports 177GB; Not what we'd expect
>>> from rdiff/librsync rolling checksums. I've upgraded to duplicity
>>> 0.63, cleared all volumes, caches and temp files and launched a new
>>> backup set. I'll report back in a couple of days to see if this fixes
>>> the "Comparing to None" issue.
>>>
>>
>> hmm, usually the changed size is reflected in the incr bkp size. but in
>> your case it also stated lot's of new files, so your finding seems
>> consequential.
>>
>> TIP:
>> do a verify between the runs! it'll inform you if it thinks there were changes.
>>
>
> ok, just had another look at my duplicity stats. i confused ChangedFileSize with RawDeltaSize
> it's pretty self explanatory, but i add some hints anyway
> Delta's are modified existing block chunks. if there were only chunks added to a file you'll get no Delta's.
>
> --------------[ Backup Statistics ]--------------
> StartTime 1399424418.52 (Wed May 7 03:00:18 2014)
> EndTime 1399424651.44 (Wed May 7 03:04:11 2014)
> ElapsedTime 232.92 (3 minutes 52.92 seconds)
> SourceFiles 113455
> SourceFileSize 4560812785 (4.25 GB)
> NewFiles 281
> NewFileSize 9192719 (8.77 MB)
> DeletedFiles 8
> ChangedFiles 81
> ChangedFileSize 46077585 (43.9 MB)
> ChangedDeltaSize 0 (0 bytes)
> ^-- no changes in the changed files only additions
> DeltaEntries 370
> ^-- count of chunks changed in total
> RawDeltaSize 9569326 (9.13 MB)
> ^-- amount of bytes to backup in total
> TotalDestinationSizeChange 6696412 (6.39 MB)
> ^-- not sure why it's less here, compression?
> Errors 7
> ^-- i had some errors ;) there are some files inaccessible due to permissions in my backup which i don't care about
> -------------------------------------------------
>
> ..ede/duply.net
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
> https://answers.launchpad.net/duplicity/+question/248111/+confirm?answer_id=5
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https://answers.launchpad.net/duplicity/+question/248111
>
> You received this question notification because you asked the question.

Revision history for this message
Kenneth Loafman (kenneth-loafman) said :
#8

Mind running a verify on the final result from 4? Post the logs if there
are any differences.

On Sat, May 10, 2014 at 3:41 AM, Rodrigo Alvarez <
<email address hidden>> wrote:

> Question #248111 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/248111
>
> Status: Answered => Open
>
> Rodrigo Alvarez is still having a problem:
> Dear Edso,
>
> I ran another test after updating to duplicity 0.23 and the issue
> persists.
>
> 1. I upgraded duplicity, cleaned all out cache and tmp directories and ran
> a backup.
> 2. Ran verify
> 3. Ran a 2nd backup pass
> 4. Ran a 3rd backup pass.
>
> On the 2nd backup pass, I still observed that about ~180 GB of data was
> re-added (as listed in the log and confirmed by looking at the volumes),
> even though the 1st pass (and now the verify) state that those files
> where there. The 3rd pass behaved as expected and did not add any
> additional files or data.
>
> You had previously asked about which file system I'm running. I have
> ext4 on both source and destination.
>
> I've uploaded a new set of logs (~6MB) )to
> http://sira.duckdns.org/temp/duplicity_logs_photos_bakup_2014_05_08.tar.gz
>
> Poking into the code, I found that the comparison that triggers the
> 'delta add' is happening around line 172 of
> pyshared/duplicity/diffdir.py, but without much context, I'm worry to
> launch into a hacking spree. I'd be happy to add any debug statements
> and rerun.
>
> Best,
>
> +R
>
> On May 7, 2014, at 2:47 AM, edso <email address hidden>
> wrote:
>
> > Your question #248111 on Duplicity changed:
> > https://answers.launchpad.net/duplicity/+question/248111
> >
> > edso proposed the following answer:
> > On 07.05.2014 11:21, edso wrote:
> >> Question #248111 on Duplicity changed:
> >> https://answers.launchpad.net/duplicity/+question/248111
> >>
> >> edso proposed the following answer:
> >> On 07.05.2014 11:06, Rodrigo Alvarez wrote:
> >>> Question #248111 on Duplicity changed:
> >>> https://answers.launchpad.net/duplicity/+question/248111
> >>>
> >>> Rodrigo Alvarez posted a new comment:
> >>> Hi Edso, Thanks for the follow up.
> >>>
> >>> I looked at the commutative file size of all the duplicity-inc* volumes
> >>> from my second pass and `du -hs` reports 177GB; Not what we'd expect
> >>> from rdiff/librsync rolling checksums. I've upgraded to duplicity
> >>> 0.63, cleared all volumes, caches and temp files and launched a new
> >>> backup set. I'll report back in a couple of days to see if this
> fixes
> >>> the "Comparing to None" issue.
> >>>
> >>
> >> hmm, usually the changed size is reflected in the incr bkp size. but in
> >> your case it also stated lot's of new files, so your finding seems
> >> consequential.
> >>
> >> TIP:
> >> do a verify between the runs! it'll inform you if it thinks there were
> changes.
> >>
> >
> > ok, just had another look at my duplicity stats. i confused
> ChangedFileSize with RawDeltaSize
> > it's pretty self explanatory, but i add some hints anyway
> > Delta's are modified existing block chunks. if there were only chunks
> added to a file you'll get no Delta's.
> >
> > --------------[ Backup Statistics ]--------------
> > StartTime 1399424418.52 (Wed May 7 03:00:18 2014)
> > EndTime 1399424651.44 (Wed May 7 03:04:11 2014)
> > ElapsedTime 232.92 (3 minutes 52.92 seconds)
> > SourceFiles 113455
> > SourceFileSize 4560812785 (4.25 GB)
> > NewFiles 281
> > NewFileSize 9192719 (8.77 MB)
> > DeletedFiles 8
> > ChangedFiles 81
> > ChangedFileSize 46077585 (43.9 MB)
> > ChangedDeltaSize 0 (0 bytes)
> > ^-- no changes in the changed files only additions
> > DeltaEntries 370
> > ^-- count of chunks changed in total
> > RawDeltaSize 9569326 (9.13 MB)
> > ^-- amount of bytes to backup in total
> > TotalDestinationSizeChange 6696412 (6.39 MB)
> > ^-- not sure why it's less here, compression?
> > Errors 7
> > ^-- i had some errors ;) there are some files inaccessible due to
> permissions in my backup which i don't care about
> > -------------------------------------------------
> >
> > ..ede/duply.net
> >
> > --
> > If this answers your question, please go to the following page to let us
> > know that it is solved:
> >
> https://answers.launchpad.net/duplicity/+question/248111/+confirm?answer_id=5
> >
> > If you still need help, you can reply to this email or go to the
> > following page to enter your feedback:
> > https://answers.launchpad.net/duplicity/+question/248111
> >
> > You received this question notification because you asked the question.
>
> --
> You received this question notification because you are a member of
> duplicity-team, which is an answer contact for Duplicity.
>
> _______________________________________________
> Mailing list: https://launchpad.net/~duplicity-team
> Post to : <email address hidden>
> Unsubscribe : https://launchpad.net/~duplicity-team
> More help : https://help.launchpad.net/ListHelp
>

Revision history for this message
Kenneth Loafman (kenneth-loafman) said :
#9

After the verify, please do an "ls -l duplicity*signatures* /media/primary_backup/raid_bkup/Photos/"

Is the /media/primary_backup a mounted NAS? What kind? What OS?

Revision history for this message
Rodrigo Alvarez (rodrigo-alvarez-i) said :
#10

Hi Kenneth,

The verify completed and reported zero differences. There was another backup pass before I ran the verify (it triggers automatically every day) but that didn't add any data. I've uploaded the logs to these two runs as well as the results for the following three commands to : http://sira.duckdns.org/temp/duplicity_logs_photos_bakup_2014_05_10.tar.gz

find /home/Photos > find_Photos
ls -lR /home/Photos > ls_photos
ls -l /media/primary_backup/raid_bkup/Photos/ > ls_raid_bkup_photos

Here is the result from ls you asked for:
root@sira:/media/primary_backup/raid_bkup/Photos# ls -l duplicity*signatures*
-rw------- 1 root root 2147523889 May 8 12:35 duplicity-full-signatures.20140508T072903Z.sigtar.gpg
-rw------- 1 root root 1260283891 May 9 16:57 duplicity-new-signatures.20140508T072903Z.to.20140509T183804Z.sigtar.gpg
-rw------- 1 root root 103 May 10 00:19 duplicity-new-signatures.20140509T183804Z.to.20140510T071703Z.sigtar.gpg
-rw------- 1 root root 201 May 10 06:08 duplicity-new-signatures.20140510T071703Z.to.20140510T130628Z.sigtar.gpg

The system is running Ubuntu 12.04
/home/Photos is a local software RAID 5 with ext4
media/primary_backup/ is a local hard drive with ext4. mount reports "/dev/sdd1 on /media/primary_backup type ext4 (rw)"

I've created a smaller mirror of some of the misbehaving directories (/Photos/Masters/Rolls/2010) that is only 20GB and will experiment to see if I can reproduce the issue with this set. If the issue remains this will allow quicker turn arounds.

Best,

+R

On May 10, 2014, at 12:06 PM, Kenneth Loafman <email address hidden> wrote:

> Your question #248111 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/248111
>
> Kenneth Loafman requested more information:
> After the verify, please do an "ls -l duplicity*signatures*
> /media/primary_backup/raid_bkup/Photos/"
>
> Is the /media/primary_backup a mounted NAS? What kind? What OS?
>
> --
> To answer this request for more information, you can either reply to
> this email or enter your reply at the following page:
> https://answers.launchpad.net/duplicity/+question/248111
>
> You received this question notification because you asked the question.

Revision history for this message
Rodrigo Alvarez (rodrigo-alvarez-i) said :
#11

The smaller mirror test (created using cp -laR and then removing directories from 2000 through 2009) showed no issues. This rules out that the problem was a simple name-comparison issue.

Revision history for this message
edso (ed.so) said :
#12

On 11.05.2014 10:06, Rodrigo Alvarez wrote:
> Question #248111 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/248111
>
> Rodrigo Alvarez posted a new comment:
> The smaller mirror test (created using cp -laR and then removing
> directories from 2000 through 2009) showed no issues. This rules out
> that the problem was a simple name-comparison issue.
>

hmm.. weird. maybe you should stick to the original test case albeit big it at least sports the problem.

can you confirm that you
- ran a full
- verify without differences
- ran a second backup that found and added files
?

shot in the dark. maybe you hit a limit with the amount of files somehow?

..ede/duply.net

Revision history for this message
edso (ed.so) said :
#13

Rodrigo,

ignore the previous request, just saw you earlier comment.

here comes a thought. the second backup log suggests that after roughly 92000 files the files get compared to None, meaning they seem to be missing in the backup. the earlier verify though pretty clearly says they are in there.

i remember an issue on Mac where python had a bug that silently capped array lists to some maximum length.

can you try another python? preferrably latest stable of 2.6 ?

..ede/duply.net

Revision history for this message
Rodrigo Alvarez (rodrigo-alvarez-i) said :
#14

Hi Edso,

Nice theory! I hadn't noticed that after the 92K file threshold all other files are added again.

I'm currently running Python 2.7.3--supposedly the latest production version for Ubuntu 12.04 LTS. Are you recommending that I roll back to the retired Python 2.6.6? I could also try upgrading to 2.7.6. Reading around, it seems fiddling with Python version switches is a very good way to ruin your system. Can you think of any test to confirm the theory before going for the down/up-grade?

Best,

+R

On May 11, 2014, at 3:26 AM, edso <email address hidden> wrote:

> Your question #248111 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/248111
>
> edso posted a new comment:
> Rodrigo,
>
> ignore the previous request, just saw you earlier comment.
>
> here comes a thought. the second backup log suggests that after roughly
> 92000 files the files get compared to None, meaning they seem to be
> missing in the backup. the earlier verify though pretty clearly says
> they are in there.
>
> i remember an issue on Mac where python had a bug that silently capped
> array lists to some maximum length.
>
> can you try another python? preferrably latest stable of 2.6 ?
>
> ..ede/duply.net
>
> --
> You received this question notification because you asked the question.

Revision history for this message
Rodrigo Alvarez (rodrigo-alvarez-i) said :
#15

Ok, I created a test set with 150K files (very small files) to test the theory.

On May 11, 2014, at 10:03 AM, Rodrigo Alvarez <email address hidden> wrote:

> Hi Edso,
>
> Nice theory! I hadn't noticed that after the 92K file threshold all other files are added again.
>
> I'm currently running Python 2.7.3--supposedly the latest production version for Ubuntu 12.04 LTS. Are you recommending that I roll back to the retired Python 2.6.6? I could also try upgrading to 2.7.6. Reading around, it seems fiddling with Python version switches is a very good way to ruin your system. Can you think of any test to confirm the theory before going for the down/up-grade?
>
> Best,
>
> +R
>
>
> On May 11, 2014, at 3:26 AM, edso <email address hidden> wrote:
>
>> Your question #248111 on Duplicity changed:
>> https://answers.launchpad.net/duplicity/+question/248111
>>
>> edso posted a new comment:
>> Rodrigo,
>>
>> ignore the previous request, just saw you earlier comment.
>>
>> here comes a thought. the second backup log suggests that after roughly
>> 92000 files the files get compared to None, meaning they seem to be
>> missing in the backup. the earlier verify though pretty clearly says
>> they are in there.
>>
>> i remember an issue on Mac where python had a bug that silently capped
>> array lists to some maximum length.
>>
>> can you try another python? preferrably latest stable of 2.6 ?
>>
>> ..ede/duply.net
>>
>> --
>> You received this question notification because you asked the question.
>

Revision history for this message
Rodrigo Alvarez (rodrigo-alvarez-i) said :
#16

The test gave a negative results.

I created a set of ~150,000 files, all small text files, and mostly hard linked copies of the same file. Then I ran a duplicity backup, a verify and a 2nd backup. The 2nd backup did not re-add any additional files. This does not support the theory of a python array capping ~92000 files. Logs at http://sira.duckdns.org/temp/duplicity_logs_test_2014_05_11.tar.gz

I will try creating the same set with longer file names to push python's memory footprint but it seems a shot in the dark.

+R

On May 11, 2014, at 2:46 AM, edso <email address hidden> wrote:

> Your question #248111 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/248111
>
> Status: Open => Answered
>
> edso proposed the following answer:
> On 11.05.2014 10:06, Rodrigo Alvarez wrote:
>> Question #248111 on Duplicity changed:
>> https://answers.launchpad.net/duplicity/+question/248111
>>
>> Rodrigo Alvarez posted a new comment:
>> The smaller mirror test (created using cp -laR and then removing
>> directories from 2000 through 2009) showed no issues. This rules out
>> that the problem was a simple name-comparison issue.
>>
>
> hmm.. weird. maybe you should stick to the original test case albeit big
> it at least sports the problem.
>
> can you confirm that you
> - ran a full
> - verify without differences
> - ran a second backup that found and added files
> ?
>
> shot in the dark. maybe you hit a limit with the amount of files
> somehow?
>
> ..ede/duply.net
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
> https://answers.launchpad.net/duplicity/+question/248111/+confirm?answer_id=11
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https://answers.launchpad.net/duplicity/+question/248111
>
> You received this question notification because you asked the question.

Revision history for this message
edso (ed.so) said :
#17

well done. yeah it's a shot in the dark and the error was on MacOSX ;( but worth a shot.

when retrying try to account for your long path names in some way.

another idea: try your original dataset again, but with a different volumesize this time. maybe the bug lurks there?

..ede/duply.net

On 11.05.2014 19:51, Rodrigo Alvarez wrote:
> Question #248111 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/248111
>
> Rodrigo Alvarez gave more information on the question:
> The test gave a negative results.
>
> I created a set of ~150,000 files, all small text files, and mostly hard
> linked copies of the same file. Then I ran a duplicity backup, a verify
> and a 2nd backup. The 2nd backup did not re-add any additional files.
> This does not support the theory of a python array capping ~92000 files.
> Logs at
> http://sira.duckdns.org/temp/duplicity_logs_test_2014_05_11.tar.gz
>
> I will try creating the same set with longer file names to push python's
> memory footprint but it seems a shot in the dark.
>
> +R
>
> On May 11, 2014, at 2:46 AM, edso <email address hidden>
> wrote:
>
>> Your question #248111 on Duplicity changed:
>> https://answers.launchpad.net/duplicity/+question/248111
>>
>> Status: Open => Answered
>>
>> edso proposed the following answer:
>> On 11.05.2014 10:06, Rodrigo Alvarez wrote:
>>> Question #248111 on Duplicity changed:
>>> https://answers.launchpad.net/duplicity/+question/248111
>>>
>>> Rodrigo Alvarez posted a new comment:
>>> The smaller mirror test (created using cp -laR and then removing
>>> directories from 2000 through 2009) showed no issues. This rules out
>>> that the problem was a simple name-comparison issue.
>>>
>>
>> hmm.. weird. maybe you should stick to the original test case albeit big
>> it at least sports the problem.
>>
>> can you confirm that you
>> - ran a full
>> - verify without differences
>> - ran a second backup that found and added files
>> ?
>>
>> shot in the dark. maybe you hit a limit with the amount of files
>> somehow?
>>
>> ..ede/duply.net
>>
>> --
>> If this answers your question, please go to the following page to let us
>> know that it is solved:
>> https://answers.launchpad.net/duplicity/+question/248111/+confirm?answer_id=11
>>
>> If you still need help, you can reply to this email or go to the
>> following page to enter your feedback:
>> https://answers.launchpad.net/duplicity/+question/248111
>>
>> You received this question notification because you asked the question.
>

Revision history for this message
Rodrigo Alvarez (rodrigo-alvarez-i) said :
#18

Hi Edso,

Longer test with ~1 million files and long path names did not re-add any files; logs (~24MB) at http://sira/duckdns.org/temp/duplicity_test_2014_05_11.tar.gz

I'm trying your suggestion of using default volume sizes on the original set, and while we are at it, I'm disabling encryption.

+R

On May 11, 2014, at 11:02 AM, edso <email address hidden> wrote:

> Your question #248111 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/248111
>
> Status: Open => Answered
>
> edso proposed the following answer:
> well done. yeah it's a shot in the dark and the error was on MacOSX ;(
> but worth a shot.
>
> when retrying try to account for your long path names in some way.
>
> another idea: try your original dataset again, but with a different
> volumesize this time. maybe the bug lurks there?
>
> ..ede/duply.net
>
> On 11.05.2014 19:51, Rodrigo Alvarez wrote:
>> Question #248111 on Duplicity changed:
>> https://answers.launchpad.net/duplicity/+question/248111
>>
>> Rodrigo Alvarez gave more information on the question:
>> The test gave a negative results.
>>
>> I created a set of ~150,000 files, all small text files, and mostly hard
>> linked copies of the same file. Then I ran a duplicity backup, a verify
>> and a 2nd backup. The 2nd backup did not re-add any additional files.
>> This does not support the theory of a python array capping ~92000 files.
>> Logs at
>> http://sira.duckdns.org/temp/duplicity_logs_test_2014_05_11.tar.gz
>>
>> I will try creating the same set with longer file names to push python's
>> memory footprint but it seems a shot in the dark.
>>
>> +R
>>
>> On May 11, 2014, at 2:46 AM, edso <email address hidden>
>> wrote:
>>
>>> Your question #248111 on Duplicity changed:
>>> https://answers.launchpad.net/duplicity/+question/248111
>>>
>>> Status: Open => Answered
>>>
>>> edso proposed the following answer:
>>> On 11.05.2014 10:06, Rodrigo Alvarez wrote:
>>>> Question #248111 on Duplicity changed:
>>>> https://answers.launchpad.net/duplicity/+question/248111
>>>>
>>>> Rodrigo Alvarez posted a new comment:
>>>> The smaller mirror test (created using cp -laR and then removing
>>>> directories from 2000 through 2009) showed no issues. This rules out
>>>> that the problem was a simple name-comparison issue.
>>>>
>>>
>>> hmm.. weird. maybe you should stick to the original test case albeit big
>>> it at least sports the problem.
>>>
>>> can you confirm that you
>>> - ran a full
>>> - verify without differences
>>> - ran a second backup that found and added files
>>> ?
>>>
>>> shot in the dark. maybe you hit a limit with the amount of files
>>> somehow?
>>>
>>> ..ede/duply.net
>>>
>>> --
>>> If this answers your question, please go to the following page to let us
>>> know that it is solved:
>>> https://answers.launchpad.net/duplicity/+question/248111/+confirm?answer_id=11
>>>
>>> If you still need help, you can reply to this email or go to the
>>> following page to enter your feedback:
>>> https://answers.launchpad.net/duplicity/+question/248111
>>>
>>> You received this question notification because you asked the question.
>>
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
> https://answers.launchpad.net/duplicity/+question/248111/+confirm?answer_id=16
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https://answers.launchpad.net/duplicity/+question/248111
>
> You received this question notification because you asked the question.

Revision history for this message
edso (ed.so) said :
#19

On 12.05.2014 08:31, Rodrigo Alvarez wrote:
> Question #248111 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/248111
>
> Status: Answered => Open
>
> Rodrigo Alvarez is still having a problem:
> Hi Edso,
>
> Longer test with ~1 million files and long path names did not re-add any
> files; logs (~24MB) at
> http://sira/duckdns.org/temp/duplicity_test_2014_05_11.tar.gz
>
> I'm trying your suggestion of using default volume sizes on the original
> set, and while we are at it, I'm disabling encryption.
>

i'd suggest you just change one parameter at a time as you will have to anyway to pinpoint which had an effect in the first place.

also, do _not_ use --no-compression as that is broken currently.

..ede/duply.net

Revision history for this message
Rodrigo Alvarez (rodrigo-alvarez-i) said :
#20

Hi Edso,

The test was already a few hours in by the time I saw your message about changing one parameter so I let it continue. Even using default volume sizes and no encryption the test behaved qualitatively the same: First backup pass added ~500GB, verify succeeded, 2nd backup pass started adding exactly the same files. The only difference was that the 2nd backup pass crashed with error 30. Logs at http://sira.duckdns.org/temp/duplicity_logs_photos_bakup_2014_05_11.tar.gz

Looking at file permissions I noticed some files have sticky bit set, which is not that common. Do you think this could have anything to do with this? Any other thoughts?

Best,

+R

On May 12, 2014, at 1:36 AM, edso <email address hidden> wrote:

> Your question #248111 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/248111
>
> Status: Open => Answered
>
> edso proposed the following answer:
> On 12.05.2014 08:31, Rodrigo Alvarez wrote:
>> Question #248111 on Duplicity changed:
>> https://answers.launchpad.net/duplicity/+question/248111
>>
>> Status: Answered => Open
>>
>> Rodrigo Alvarez is still having a problem:
>> Hi Edso,
>>
>> Longer test with ~1 million files and long path names did not re-add any
>> files; logs (~24MB) at
>> http://sira/duckdns.org/temp/duplicity_test_2014_05_11.tar.gz
>>
>> I'm trying your suggestion of using default volume sizes on the original
>> set, and while we are at it, I'm disabling encryption.
>>
>
> i'd suggest you just change one parameter at a time as you will have to
> anyway to pinpoint which had an effect in the first place.
>
> also, do _not_ use --no-compression as that is broken currently.
>
> ..ede/duply.net
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
> https://answers.launchpad.net/duplicity/+question/248111/+confirm?answer_id=18
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https://answers.launchpad.net/duplicity/+question/248111
>
> You received this question notification because you asked the question.

Revision history for this message
Rodrigo Alvarez (rodrigo-alvarez-i) said :
#21

Hi Edso and Kenneth, this thread seems to have fizzled out.

As a workaround to the problem of files being re-added I started a cleared both my front end data and the backup backend, caches, temps, etc. Then I started adding a few files to my photo library, running a duplicity backup, verifying, rerunning (expecting zero size increase) and then adding some more files and repeating the backup, verify, backup process.

In doing so, I managed to grow my original data all the way to ~500 GB without duplicity ever double-adding any files. This now gives me a ground on which to run incremental backups and solves the nuisance of consuming ~180GB of unnecessary space but it does not solve the problem.

Since no one else has come forth with this problem it will fade into oblivion, but if you ever have a thought about what could cause it and how to solve it, please come back and comment about it.

In case you are curious, the log files for my incremental adds are at http://sira.duckdns.org/temp/duplicity_logs_photos_bakup_2014_05_08.tar.gz

Thanks for all your help!

+Rodrigo

Revision history for this message
Rodrigo Alvarez (rodrigo-alvarez-i) said :
#22

Hi there friends,

The chickens are coming home to roost; I'm getting hit by this issue again.

This time is with a different backup set (a TimeMachine sparse bundle) and the impact is more severe. The source data is ~700GB and the redundant data (backed up twice) is ~300GB. This issue is now manifesting on Ubuntu 14.04 with Python 2.7.6 and Duplicity 0.7.01, and it happens after backing up about 37K files (before the threshold was about 92K files).

Has anyone had any thoughts about this since last year?

Best,

+Rodrigo

Revision history for this message
Kenneth Loafman (kenneth-loafman) said :
#23

Please try 0.7.06 from the PPA. If that still fails, please report this as a bug.

Is there anything that in your processing that would change the date/time of the files being duplicated?

Revision history for this message
Rodrigo Alvarez (rodrigo-alvarez-i) said :
#24

I upgraded to duplicity 0.7.06 and reran and observed the exact same behavior.

1) I run a fresh full backup of a TimeMachine Sparse bundle set that uses about 700GB. This takes about 14 hrs to complete.
2) Without any changes to the primary set I rerun a second backup, intending to get a zero-sized incremental backup. Instead, duplicity starts backing up about 300 GB.

I've confirmed that the files in the primary set are not changing (by looking at their modification date, which for some files that are being re-backed up is in 2014).

This problem had previously manifested in a smaller data set and a work around (to avoid using necessary space in the backend) was to build the primary set progressively. Previously, the backup duplication started after about 92K files. This time it starts at about 49K files.

More specifically, the set consists of 180288, about half of which are invisible files (.AppleDouble) which are being excluded. The bulk is a folder named 'bands' which contains 90132 files, all off which are at most 8.0MBytes (except about 54 of them that are smaller). From this, it seems the problem on the second backup pass starts after duplicity has gone through 49794 bands, or 398,352 Mbytes. This number seems consistent with the point at which the previous backup set would also start duplicating (but I'm estimating from memory here).

I'd be very happy to run experiments or provide more information (including logs) to help debug this issue.

Is it my specific system or am I stressing duplicity way beyond what normal people use it for?

A bug about this issue had previously been created (as per the Related Bug above).

Revision history for this message
Rodrigo Alvarez (rodrigo-alvarez-i) said :
#25

The previous workaround of building the backup set incrementally still works. In this case, using excludes, I progressively reduced the exclusions in 4 stages and then ran a final 5th test. As the following results show, by adding smaller (file numbers and size) there were no repeated backups.

Following are the summaries of the 4 incremental backups as I relaxed the exclusions:
SourceFileSize 32430067124 (30.2 GB)
NewFiles 3890
NewFileSize 32430067124 (30.2 GB)

SourceFileSize 405542682548 (378 GB)
NewFiles 44494
NewFileSize 373112615424 (347 GB)

SourceFileSize 584098928564 (544 GB)
NewFiles 21292
NewFileSize 178556246016 (166 GB)

SourceFileSize 755897555892 (704 GB)
NewFiles 20480
NewFileSize 171798627328 (160 GB)

Next is the final test pass where duplicity did not duplicate any backup.
SourceFileSize 755897555892 (704 GB)
NewFiles 0
NewFileSize 0 (0 bytes)

Revision history for this message
Rodrigo Alvarez (rodrigo-alvarez-i) said :
#26

Just a hunch here. I notice the signatures and manifest files can get sizable, a few Mbytes and GBytes respectively. Could this issue be due to duplicity not fully loading one of these files after the first backup?

Revision history for this message
edso (ed.so) said :
#27

On 04.01.2016 06:27, Rodrigo Alvarez wrote:
> Question #248111 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/248111
>
> Rodrigo Alvarez gave more information on the question:
> The previous workaround of building the backup set incrementally still
> works. In this case, using excludes, I progressively reduced the
> exclusions in 4 stages and then ran a final 5th test. As the following
> results show, by adding smaller (file numbers and size) there were no
> repeated backups.
>
> Following are the summaries of the 4 incremental backups as I relaxed the exclusions:
> SourceFileSize 32430067124 (30.2 GB)
> NewFiles 3890
> NewFileSize 32430067124 (30.2 GB)
>
> SourceFileSize 405542682548 (378 GB)
> NewFiles 44494
> NewFileSize 373112615424 (347 GB)
>
> SourceFileSize 584098928564 (544 GB)
> NewFiles 21292
> NewFileSize 178556246016 (166 GB)
>
> SourceFileSize 755897555892 (704 GB)
> NewFiles 20480
> NewFileSize 171798627328 (160 GB)
>
> Next is the final test pass where duplicity did not duplicate any backup.
> SourceFileSize 755897555892 (704 GB)
> NewFiles 0
> NewFileSize 0 (0 bytes)
>

sounds like duplicity stops adding files properly to the backup, after some max amount of files/size..

ede/duply.net

Can you help with this problem?

Provide an answer of your own, or ask Rodrigo Alvarez for more information if necessary.

To post a message you must log in.