S3 data transfer out - getObject requests

Asked by Jasper on 2017-10-18

I have Duplicity running nightly backups of 7 servers and I'm seeing very high S3 costs of $400 a month that I didn't expect.

After investigating with Amazon's cost explorer I noticed those costs are almost all due to the large amount of 'data transfer out, getObject requests'.

I'm not restoring any data, so why is Duplicity doing so many getObject requests to S3?

I noticed that I might have misconfigured the amount of backup cycles a bit, could that be the reason?

First my config was as follows:

MAX_AGE=2M
MAX_FULL_BACKUPS=2
MAX_FULLS_WITH_INCRS=1
MAX_FULLBKP_AGE=1M
VOLSIZE=250

I have now changed it to this:

MAX_AGE=6M
MAX_FULL_BACKUPS=2
MAX_FULLS_WITH_INCRS=1
# Deactivated this
#MAX_FULLBKP_AGE=1M
VOLSIZE=250

Question information

Language:
English Edit question
Status:
Answered
For:
Duplicity Edit question
Assignee:
No assignee Edit question
Last query:
2017-10-20
Last reply:
2017-10-20
edso (ed.so) said : #1

On 19.10.2017 08:28, Jasper wrote:
> Question #659626 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/659626
>
> Description changed to:
> I have Duplicity running nightly backups of 7 servers and I'm seeing
> very high S3 costs of $400 a month that I didn't expect.
>
> After investigating with Amazon's cost explorer I noticed those costs
> are almost all due to the large amount of 'data transfer out, getObject
> requests'.
>
> I'm not restoring any data, so why is Duplicity doing so many getObject
> requests to S3?
>
> I noticed that I might have misconfigured the amount of backup cycles a
> bit, could that be the reason?
>
> First my config was as follows:
>
> MAX_AGE=2M
> MAX_FULL_BACKUPS=2
> MAX_FULLS_WITH_INCRS=1
> MAX_FULLBKP_AGE=1M
> VOLSIZE=250
>
> I have now changed it to this:
>
> MAX_AGE=6M
> MAX_FULL_BACKUPS=2
> MAX_FULLS_WITH_INCRS=1
> # Deactivated this
> #MAX_FULLBKP_AGE=1M
> VOLSIZE=250
>

Jasper,

what is your duply command to run this conf? ..ede/duply.net

Jasper (jasperjorna) said : #2

41 1 * * * nice -n19 ionice -c2 -n7 duply websites backup_verify_purge --force --name websites >> /var/log/duplicity.log 2>&1

edso (ed.so) said : #3

from the manpage http://duplicity.nongnu.org/duplicity.1.html

"
verify [--compare-data] [--time <time>] [--file-to-restore <rel_path>] <url> <local_path>

Restore backup contents temporarily file by file and compare against the local path’s contents. duplicity will exit with a non-zero error level if any files are different. On verbosity level info (4) or higher, a message for each file that has changed will be logged.
The --file-to-restore option restricts verify to that file or folder. The --time option allows to select a backup to verify against. The --compare-data option enables data comparison (see below).
"

every verify essentially download the whole recent chgain and tries to restore every file. that should account for theamount of incoming traffic in your case. ..ede/duply.net

Jasper (jasperjorna) said : #4

Ah, that makes sense.

I grabbed it from the Duply docs where it showed it as an example for cron execution:

  a one line batch job on 'humbug' for cron execution:
    duply humbug backup_verify_purge --force

So I should be able to run it as:

duply humbug backup_purge --force?

edso (ed.so) said : #5

On 20.10.2017 14:03, Jasper wrote:
> Question #659626 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/659626
>
> Status: Answered => Open
>
> Jasper is still having a problem:
> Ah, that makes sense.
>
> I grabbed it from the Duply docs where it showed it as an example for
> cron execution:
>
> a one line batch job on 'humbug' for cron execution:
> duply humbug backup_verify_purge --force
>
> So I should be able to run it as:
>
> duply humbug backup_purge --force?
>

you should. but that's dangerous, you should only purge when you are sure that the current chain is not corrupted. how about

daily backups
 and
once a week a backup followed by a verify_and_purge, note the condition!

..ede/duply.net

Jasper (jasperjorna) said : #6

Right, thanks for the heads up.

Would it be possible to combine them like this?

0 0 * * 1-6 duply system backup
0 0 * * 0 duply system backup_verify_and_purge --force

So a normal backup every day from monday - saturday, and a backup with verify and purge on sunday?

edso (ed.so) said : #7

On 20.10.2017 14:33, Jasper wrote:
> Question #659626 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/659626
>
> Status: Answered => Open
>
> Jasper is still having a problem:
> Right, thanks for the heads up.
>
> Would it be possible to combine them like this?
>
> 0 0 * * 1-6 duply system backup
> 0 0 * * 0 duply system backup_verify_and_purge --force
>
> So a normal backup every day from monday - saturday, and a backup with
> verify and purge on sunday?
>

looks good to me. if the weekly verify is still to costly you can look around the net, there are nifty conditional calls, that allow you to define things like every 2 week or last friday of the month

eg. this one scrubs a btrfs vol of mine, when it's the last friday of every 3rd month, only if the volume is mounted

0 20 * 3,6,9,12 Fri mount | grep -q /mnt/raid && ( [ $(date +\%d -d '+7 days') -lt '8' ] && btrfs scrub start /mnt/raid )

..ede/duply.net

Can you help with this problem?

Provide an answer of your own, or ask Jasper for more information if necessary.

To post a message you must log in.