Forced disconnect of my internet connection during backup

Asked by nils on 2016-11-12

Hi there,
I'm facing an interesting situation and I'm sure that I'm not alone:
I have a backup set that comprises way more data than what my internet connection can handle within 24 hours (the set is about 200 GB and that should take round about a week).
Unfortunately, my ISP disconnects my internet connection every 24 hours and assigns me a new IP. And the backends that I've tried so far (SSH and dropbox) cannot handle the closed socket (even though the internet connectivity is back after a few seconds).
I tried quite a few things but in the end failed. So, I have some questions:
1) Does it somehow harm the quality of the backup if I would start the backup process over manually (or via the bash) 20 times? I don't find it a good solution to resume the backup for so often but currently I see no other option. I really would appreciate your opinion on that
2) Are there or will there be any backends that can hanlde such a situation? In principle, it's pretty simple. The backend "only" would have to start over authentication and reconnect completely in case of a permanent error (at least trying this in case of a permanent error would be very useful).
3) Is anybody here encountering the same problem and maybe found a different solution that I did not yet think of?

Thanks in advance
Have a nice weekend!
Nils

Question information

Language:
English Edit question
Status:
Solved
For:
Duplicity Edit question
Assignee:
No assignee Edit question
Solved by:
nils
Solved:
2016-11-16
Last query:
2016-11-16
Last reply:
2016-11-15

This question was reopened

  • 2016-11-13 by nils
edso (ed.so) said : #1

On 12.11.2016 16:37, nils wrote:
> New question #404018 on Duplicity:
> https://answers.launchpad.net/duplicity/+question/404018
>
> Hi there,

hey Nils, long time no read ;)

> I'm facing an interesting situation and I'm sure that I'm not alone:
> I have a backup set that comprises way more data than what my internet connection can handle within 24 hours (the set is about 200 GB and that should take round about a week).
> Unfortunately, my ISP disconnects my internet connection every 24 hours and assigns me a new IP. And the backends that I've tried so far (SSH and dropbox) cannot handle the closed socket (even though the internet connectivity is back after a few seconds).
> I tried quite a few things but in the end failed. So, I have some questions:
> 1) Does it somehow harm the quality of the backup if I would start the backup process over manually (or via the bash) 20 times? I don't find it a good solution to resume the backup for so often but currently I see no other option. I really would appreciate your opinion on that

it shouldn't although resuming always contains a minimal risk, that wouldn't be there otherwise. i suggest you do regular verify runs to make sure that yur backups are in good shape.

> 2) Are there or will there be any backends that can hanlde such a situation? In principle, it's pretty simple. The backend "only" would have to start over authentication and reconnect completely in case of a permanent error (at least trying this in case of a permanent error would be very useful).

not as such. resuming is only done on when duplicity detects a that the conditions are right to resume on a new backup run.
however what backends can do is retry and you may finetune the retry behaviour via --num-retries . the delay is currently hardcoded as 30s in http://bazaar.launchpad.net/~duplicity-team/duplicity/0.8-series/view/head:/duplicity/backend.py#L400

> 3) Is anybody here encountering the same problem and maybe found a different solution that I did not yet think of?

probably. the usual workaround that is mentioned on the ml for issues like that is to backup to a local file:// target and rsync or use some other means to sync it to the backend of your choice. this way the backup process does not get interrupted by uplink issues.

have fun ..ede/duply.net

nils (dernils) said : #2

> Hi there,

hey Nils, long time no read ;)

Nils: Indeed, currently a huge amount of my spare time somehow ends up with duplicity :-)

> I'm facing an interesting situation and I'm sure that I'm not alone:
> I have a backup set that comprises way more data than what my internet connection can handle within 24 hours (the set is about 200 GB and that should take round about a week).
> Unfortunately, my ISP disconnects my internet connection every 24 hours and assigns me a new IP. And the backends that I've tried so far (SSH and dropbox) cannot handle the closed socket (even though the internet connectivity is back after a few seconds).
> I tried quite a few things but in the end failed. So, I have some questions:
> 1) Does it somehow harm the quality of the backup if I would start the backup process over manually (or via the bash) 20 times? I don't find it a good solution to resume the backup for so often but currently I see no other option. I really would appreciate your opinion on that

it shouldn't although resuming always contains a minimal risk, that wouldn't be there otherwise. i suggest you do regular verify runs to make sure that yur backups are in good shape.

Nils: OK, I guess, I'll consider going this way then.

> 2) Are there or will there be any backends that can hanlde such a situation? In principle, it's pretty simple. The backend "only" would have to start over authentication and reconnect completely in case of a permanent error (at least trying this in case of a permanent error would be very useful).

not as such. resuming is only done on when duplicity detects a that the conditions are right to resume on a new backup run.
however what backends can do is retry and you may finetune the retry behaviour via --num-retries . the delay is currently hardcoded as 30s in http://bazaar.launchpad.net/~duplicity-team/duplicity/0.8-series/view/head:/duplicity/backend.py#L400

Nils: I already played around with the timeouts and the numbers of retries but that does not help as (with the forced disconnect) the complete socket is gone. The backends would have to go one step further and repeat authentication against the backend server but they don't do that :-(

> 3) Is anybody here encountering the same problem and maybe found a different solution that I did not yet think of?

probably. the usual workaround that is mentioned on the ml for issues like that is to backup to a local file:// target and rsync or use some other means to sync it to the backend of your choice. this way the backup process does not get interrupted by uplink issues.

Nils: Yes, I read that. For me that does not work as it would consume disk space that I don't have :-(

edso (ed.so) said : #3

On 13.11.2016 16:17, nils wrote:
> Question #404018 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/404018
>
> nils posted a new comment:
>
>> Hi there,
>
> hey Nils, long time no read ;)
>
> Nils: Indeed, currently a huge amount of my spare time somehow ends up
> with duplicity :-)
>
>> I'm facing an interesting situation and I'm sure that I'm not alone:
>> I have a backup set that comprises way more data than what my internet connection can handle within 24 hours (the set is about 200 GB and that should take round about a week).
>> Unfortunately, my ISP disconnects my internet connection every 24 hours and assigns me a new IP. And the backends that I've tried so far (SSH and dropbox) cannot handle the closed socket (even though the internet connectivity is back after a few seconds).
>> I tried quite a few things but in the end failed. So, I have some questions:
>> 1) Does it somehow harm the quality of the backup if I would start the backup process over manually (or via the bash) 20 times? I don't find it a good solution to resume the backup for so often but currently I see no other option. I really would appreciate your opinion on that
>
> it shouldn't although resuming always contains a minimal risk, that
> wouldn't be there otherwise. i suggest you do regular verify runs to
> make sure that yur backups are in good shape.
>
> Nils: OK, I guess, I'll consider going this way then.
>
>> 2) Are there or will there be any backends that can hanlde such a
> situation? In principle, it's pretty simple. The backend "only" would
> have to start over authentication and reconnect completely in case of a
> permanent error (at least trying this in case of a permanent error would
> be very useful).
>
> not as such. resuming is only done on when duplicity detects a that the conditions are right to resume on a new backup run.
> however what backends can do is retry and you may finetune the retry behaviour via --num-retries . the delay is currently hardcoded as 30s in http://bazaar.launchpad.net/~duplicity-team/duplicity/0.8-series/view/head:/duplicity/backend.py#L400
>
> Nils: I already played around with the timeouts and the numbers of
> retries but that does not help as (with the forced disconnect) the
> complete socket is gone. The backends would have to go one step further
> and repeat authentication against the backend server but they don't do
> that :-(

pretty sure that there are some that do eg. WebDAV or pexpect+ssh

if you need it, consider patching the backends or file a bug report wrt. to this issue.

>> 3) Is anybody here encountering the same problem and maybe found a
> different solution that I did not yet think of?
>
> probably. the usual workaround that is mentioned on the ml for issues
> like that is to backup to a local file:// target and rsync or use some
> other means to sync it to the backend of your choice. this way the
> backup process does not get interrupted by uplink issues.
>
> Nils: Yes, I read that. For me that does not work as it would consume
> disk space that I don't have :-(
>

makes sense.. ede/duply.net

nils (dernils) said : #4

I'll have a look.
Also, patching the backends for Dropbox and SSH/SCP doesn't look to hard. I'll have a look as soon as iI figured out some other issues that I'm currently facing :-)

nils (dernils) said : #5

I just updated the Dropbox backend. I'll need to undergo some more testing but it looks promising so far.
Would you also accept contributions via email or the github clone under https://github.com/henrysher/duplicity ?
I'm not familiar with launchpad/bzr at all and my motivation to get into another VCS is very limited ;-)

nils (dernils) said : #6

The patch on the Dropbox backend is working now. Will test it over the next few weeks in my backup system.
You can find it under https://github.com/henrysher/duplicity/pull/9

nils (dernils) said : #7

I was just about to add some more functionality for retrying in case of a disconnect to the SSH backend but would like to clarify a thing from a design perspective:
From what I understand it is not the responsibility of the dedicated backend to retry a put/get in case of an error as this logic seems to be contained in backend.py. Is this correct? If yes, I'll just make sure that the functions will work after a reconnect without adding any additional logic on retries.....

edso (ed.so) said : #8

On 15.11.2016 09:23, nils wrote:
> Question #404018 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/404018
>
> nils gave more information on the question:
> I was just about to add some more functionality for retrying in case of a disconnect to the SSH backend but would like to clarify a thing from a design perspective:
>>From what I understand it is not the responsibility of the dedicated backend to retry a put/get in case of an error as this logic seems to be contained in backend.py. Is this correct? If yes, I'll just make sure that the functions will work after a reconnect without adding any additional logic on retries.....
>

correct, each backend is autowrapped with backend.BackendWrapper, which holds the retry logic itself.

while you are at it, it might be nice to make the retry delay configurable. could you please add a cmd line parameter '--backend-retry-delay' in duplicity/commandline.py,globals.py and use it in backend.BackendWrapper ?

..ede/duply.net

edso (ed.so) said : #9

On 14.11.2016 22:02, nils wrote:
> Question #404018 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/404018
>
> nils posted a new comment:
> The patch on the Dropbox backend is working now. Will test it over the next few weeks in my backup system.
> You can find it under https://github.com/henrysher/duplicity/pull/9
>

you should drop
  def internet_on():
and the urllib2 import, as they do not seem to be needed.

..ede/duply.net

nils (dernils) said : #10

You are right. That's a relict from my first attempt to solve the issue. I just removed it.
Regarding the configurable delay: That sounds like a very good idea. I'll look into it as soon as I did some paid work ;-) I have also some loose plans to update the Dropbox backend to the latest Dropbox SDK.

edso (ed.so) said : #11

On 15.11.2016 14:17, nils wrote:
> Question #404018 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/404018
>
> nils posted a new comment:
> You are right. That's a relict from my first attempt to solve the issue. I just removed it.

it's not a good idea either. the service (google web server in that case) might be out of order for other reasons than your uplink being down. so it's no sure way to detect anything.

the the established approach in other backends is to check the socket/auth and recreate it as needed, as you eventually did as well.

> Regarding the configurable delay: That sounds like a very good idea. I'll look into it as soon as I did some paid work ;-) I have also some loose plans to update the Dropbox backend to the latest Dropbox SDK.
>

sounds good.. ede

nils (dernils) said : #12

And I finally managed to get the basics of bzr working :-))) Interesting concept.
So, my contribution and upcoming contribution can also be found under https://code.launchpad.net/~dernils/duplicity/robust-dropbox-backend

nils (dernils) said : #13

And now I also introduced the --backend-retry-delay and tested it briefly. It seems to work. Pushed to https://code.launchpad.net/~dernils/duplicity/robust-dropbox-backend and will propose to merge now...