Duplicity

Bug #1652410
Comment #23

Comment 23 for bug 1652410

Revision history for this message

Naël (nathanael-naeri) wrote on 2017-01-16:

#23

The file and function I examined are those mentioned in the traceback reported by David for Ubuntu 16.04, deja-dup 34.2-0ubuntu1.1, duplicity 0.7.06-2ubuntu2:

  File "/usr/lib/python2.7/dist-packages/duplicity/collections.py",
  line 105, in add_filename
    (self.volume_name_dict, filename)
  AssertionError: ({1: 'duplicity-full.20161129T015237Z.vol1.difftar'},
    'duplicity-full.20161129T015237Z.vol1.difftar.gz')

For Ubuntu 14.04, deja-dup 30.0-0ubuntu4.1, duplicity 0.6.23-1ubuntu4.1:

  File "/usr/lib/python2.7/dist-packages/duplicity/collections.py",
  line 100, in add_filename
    (self.volume_name_dict, filename)
  AssertionError: ({1: 'duplicity-full.20170115T235149Z.vol1.difftar'},
    'duplicity-full.20170115T235149Z.vol1.difftar.gz')

Function add_filename(self, filename) parses the examined filename (in this case "duplicity-full.$timestring.vol1.difftar.gz") and decides if it should be added to the examined backup set (self, an instance of the BackupSet class, consisting for now of just one backup volume, "duplicity-full.$timestring.vol1.difftar").

The function returns True if it adds the examined filename to the current backup set, False if it rejects it, for instance it rejects files whose parse results don't include "full" or "inc", and files whose parse results indicate a different timestring than the timestring of the current backup set.

The assertion that fails is "not self.volume_name_dict.has_key(pr.volume_number)", meaning that the volume number found in the parse results (in our case 1) is already in the backup set (stored as a volumenumber/filename dictionnary: {1: duplicity-full.$timestring.vol1.difftar}).

A quick way to avoid that bug would be to turn the assertion into an "if volume-number already in backup-set, reject filename" (i.e. return False), which means that when encountering two or more of:

  duplicity-$type.$timestring.vol$n.difftar
  duplicity-$type.$timestring.vol$n.difftar.gz
  duplicity-$type.$timestring.vol$n.difftar.gpg

duplicity would accept into the backup set the first filename it processes, and reject the others.

A probably better way would be to also compare compression flags when grouping the files into sets, in addition to backup type (full/inc) and times. That way an uncompressed file couldn't be added to a backup set made up of compressed files, and vice versa. In the simplest case, this would require:

adding the attribute self.compressed to the class BackupSet

  adding self.compressed = bool(pr.compressed) to the method
  BackupSet.set_info, that initializes the attributes of the
  class instances

  adding something like
    if bool(pr.compressed) != bool(self.compressed):
        return False
  to BackupSet.add_filename after it tests for backup type
  and backup times

But then it gets complicated by the fact that a backup set can be partially encrypted, if I understand correctly the rest of the add_filename function ("if bool(pr.encrypted) != bool(self.encrypted)..."). So perhaps the function should also accommodate backup sets which are partially compressed? Also, assuming the add_filename is changed to group files into sets of homogeneous compression/encryption status, then how would the function that groups sets into chains have to be modified?

I don't know duplicity's source code well enough, and won't any time soon, to take a decision regarding this issue and fix it by myself, sorry.