Dropbox extremely slow?

Asked by Henrik Farre

Hi

I'm trying to backup ~600Gb using the dropbox backend, and in 24 hours only 100Gb has been uploaded.

1. Is there anything I can do to make this go faster?
2. And will the next full backup take as long?

Some info about my setup:
- 50/50Mbit internet connection
- i3-4130T CPU @ 2.90GHz
- 8Gb Ram
- Data that is backed up is on a raid5 array
- archive-dir is on a SSD
- tempdir is on tmpfs
- system load is minimal, load average: 0,02, 0,08, 0,12
- Router bandwidth: http://i.imgur.com/QMtGyNz.png
- iotop: http://i.imgur.com/KrbxwJ6.png
- iftop: http://i.imgur.com/V8h6vTJ.png
- vmstat: http://i.imgur.com/XYdn09O.png
- iostat: http://i.imgur.com/jz5lg7d.png
- Duplicity cmd options:
duplicity incr --exclude-device-files --exclude-other-filesystems --encrypt-key XXXXXX --log-file /var/log/duplicity.log --archive-dir /var/cache/duplicity --asynchronous-upload --full-if-older-than 4W --volsize 500 /home dpbx:///home

Backup Statistics from another machine (also to dropbox)
--------------[ Backup Statistics ]--------------
StartTime 1460894623.94 (Sun Apr 17 14:03:43 2016)
EndTime 1460896214.91 (Sun Apr 17 14:30:14 2016)
ElapsedTime 1590.98 (26 minutes 30.98 seconds)
SourceFiles 41089
SourceFileSize 3273785332 (3.05 GB)
NewFiles 41089
NewFileSize 3273785332 (3.05 GB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 41089
RawDeltaSize 3248060411 (3.02 GB)
TotalDestinationSizeChange 1958214748 (1.82 GB)
Errors 0
-------------------------------------------------

Question information

Language:
English Edit question
Status:
Solved
For:
Duplicity Edit question
Assignee:
No assignee Edit question
Solved by:
Henrik Farre
Solved:
Last query:
Last reply:
Revision history for this message
edso (ed.so) said :
#1

On 18.04.2016 14:02, Henrik Farre wrote:
> New question #291337 on Duplicity:
> https://answers.launchpad.net/duplicity/+question/291337
>
> Hi
>
> I'm trying to backup ~600Gb using the dropbox backend, and in 24 hours only 100Gb has been uploaded.
>
> 1. Is there anything I can do to make this go faster?
> 2. And will the next full backup take as long?
>
> Some info about my setup:
> - 50/50Mbit internet connection
> - i3-4130T CPU @ 2.90GHz
> - 8Gb Ram
> - Data that is backed up is on a raid5 array
> - archive-dir is on a SSD
> - tempdir is on tmpfs
> - system load is minimal, load average: 0,02, 0,08, 0,12
> - Router bandwidth: http://i.imgur.com/QMtGyNz.png
> - iotop: http://i.imgur.com/KrbxwJ6.png
> - iftop: http://i.imgur.com/V8h6vTJ.png
> - vmstat: http://i.imgur.com/XYdn09O.png
> - iostat: http://i.imgur.com/jz5lg7d.png
> - Duplicity cmd options:
> duplicity incr --exclude-device-files --exclude-other-filesystems --encrypt-key XXXXXX --log-file /var/log/duplicity.log --archive-dir /var/cache/duplicity --asynchronous-upload --full-if-older-than 4W --volsize 500 /home dpbx:///home
>
> Backup Statistics from another machine (also to dropbox)
> --------------[ Backup Statistics ]--------------
> StartTime 1460894623.94 (Sun Apr 17 14:03:43 2016)
> EndTime 1460896214.91 (Sun Apr 17 14:30:14 2016)
> ElapsedTime 1590.98 (26 minutes 30.98 seconds)
> SourceFiles 41089
> SourceFileSize 3273785332 (3.05 GB)
> NewFiles 41089
> NewFileSize 3273785332 (3.05 GB)
> DeletedFiles 0
> ChangedFiles 0
> ChangedFileSize 0 (0 bytes)
> ChangedDeltaSize 0 (0 bytes)
> DeltaEntries 41089
> RawDeltaSize 3248060411 (3.02 GB)
> TotalDestinationSizeChange 1958214748 (1.82 GB)
> Errors 0
> -------------------------------------------------
>

what's your duplicity version?

..ede/duply.net

Revision history for this message
Henrik Farre (henrik-rockhopper) said :
#2

Hi

Argh.. forgot it while dumping all the other information :):

duplicity 0.7.07
python2-dropbox 6.1-1
Linux kernel 4.5
Dist: Arch Linux

Revision history for this message
edso (ed.so) said :
#3

On 18.04.2016 14:37, Henrik Farre wrote:
> Question #291337 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/291337
>
> Henrik Farre posted a new comment:
> Hi
>
> Argh.. forgot it while dumping all the other information :):
>
> duplicity 0.7.07
> python2-dropbox 6.1-1
> Linux kernel 4.5
> Dist: Arch Linux
>

duplicity is up to date.. https://pypi.python.org/pypi/dropbox suggests that your dropbox module is latest as well. maybe you should ask them about speeds to be expected?

btw. 100GiB in 24h means 100*8/24/60/60 ca. 9Mbit/s upload.. that's not very slow, it's just not fast ;)

..ede/duply.net

Revision history for this message
Henrik Farre (henrik-rockhopper) said :
#4

Yeah, it looks like dropbox just is slow...

If duplicity runs a full backup again, will it upload all ~600Gb again?

Revision history for this message
Henrik Farre (henrik-rockhopper) said :
#5

Hi

Small update:

If I change DPBX_UPLOAD_CHUNK_SIZE in /usr/lib/python2.7/site-packages/duplicity/backends/dpbxbackend.py from 16 MB to 128 MB, duplicitys progress switch reports up to 20MB/s instead of 2-3MB/s, which also changes the ETA from 2-3 days to ~8Hours.

If I increase DPBX_UPLOAD_CHUNK_SIZE to the same size as volsize (500), I get this error:

Exception [('Connection aborted.', error(104, 'Connection reset by peer'))]:
| Traceback (most recent call last):
| File "/usr/lib/python2.7/site-packages/duplicity/backends/dpbxbackend.py", line 79, in wrapper
| return f(self, *args)
| File "/usr/lib/python2.7/site-packages/duplicity/backends/dpbxbackend.py", line 186, in _put
| upload_sid = self.api_client.files_upload_session_start(buf)
| File "/usr/lib/python2.7/site-packages/dropbox/base.py", line 815, in files_upload_session_start
| f,
| File "/usr/lib/python2.7/site-packages/dropbox/dropbox.py", line 220, in request
| request_binary)
| File "/usr/lib/python2.7/site-packages/dropbox/dropbox.py", line 303, in request_json_string_with_retry
| request_binary)
| File "/usr/lib/python2.7/site-packages/dropbox/dropbox.py", line 374, in request_json_string
| verify=True,
| File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 511, in post
| return self.request('POST', url, data=data, json=json, **kwargs)
| File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 468, in request
| resp = self.send(prep, **send_kwargs)
| File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
| r = adapter.send(request, **kwargs)
| File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 426, in send
| raise ConnectionError(err, request=request)
| ConnectionError: ('Connection aborted.', error(104, 'Connection reset by peer'))
dpbx code error "('Connection aborted.', error(104, 'Connection reset by peer'))"

Revision history for this message
edso (ed.so) said :
#6

On 19.04.2016 10:47, Henrik Farre wrote:
> Question #291337 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/291337
>
> Henrik Farre posted a new comment:
> Hi
>
> Small update:
>
> If I change DPBX_UPLOAD_CHUNK_SIZE in /usr/lib/python2.7/site-
> packages/duplicity/backends/dpbxbackend.py from 16 MB to 128 MB,
> duplicitys progress switch reports up to 20MB/s instead of 2-3MB/s,
> which also changes the ETA from 2-3 days to ~8Hours.

good to know! there are some comments surrounding the code you mentioned. taking these into account and maybe researching what the variables do (python dropbox mailing list?) should shed some light on this.

maybe we should make them tunable, so users with broadband upload can utilize their connection better. can you research and come back with some more detailed infos, links to docs?

..ede/duply.net

> If I increase DPBX_UPLOAD_CHUNK_SIZE to the same size as volsize (500),
> I get this error:
>
> Exception [('Connection aborted.', error(104, 'Connection reset by peer'))]:
> | Traceback (most recent call last):
> | File "/usr/lib/python2.7/site-packages/duplicity/backends/dpbxbackend.py", line 79, in wrapper
> | return f(self, *args)
> | File "/usr/lib/python2.7/site-packages/duplicity/backends/dpbxbackend.py", line 186, in _put
> | upload_sid = self.api_client.files_upload_session_start(buf)
> | File "/usr/lib/python2.7/site-packages/dropbox/base.py", line 815, in files_upload_session_start
> | f,
> | File "/usr/lib/python2.7/site-packages/dropbox/dropbox.py", line 220, in request
> | request_binary)
> | File "/usr/lib/python2.7/site-packages/dropbox/dropbox.py", line 303, in request_json_string_with_retry
> | request_binary)
> | File "/usr/lib/python2.7/site-packages/dropbox/dropbox.py", line 374, in request_json_string
> | verify=True,
> | File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 511, in post
> | return self.request('POST', url, data=data, json=json, **kwargs)
> | File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 468, in request
> | resp = self.send(prep, **send_kwargs)
> | File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
> | r = adapter.send(request, **kwargs)
> | File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 426, in send
> | raise ConnectionError(err, request=request)
> | ConnectionError: ('Connection aborted.', error(104, 'Connection reset by peer'))
> dpbx code error "('Connection aborted.', error(104, 'Connection reset by peer'))"
>

Revision history for this message
Henrik Farre (henrik-rockhopper) said :
#7

Hi

Yeah, I will take a look at it.

I'm also wondering why there are gaps in the upload, there is no activity in ~30 sec and the log just says):

AsyncScheduler: scheduling task for asynchronous execution
15.1GB 02:09:41 [0.0B/s] [=> ] 3% ETA Stalled!
15.1GB 02:09:44 [0.0B/s] [=> ] 3% ETA Stalled!
15.1GB 02:09:47 [0.0B/s] [=> ] 3% ETA Stalled!
15.1GB 02:09:50 [0.0B/s] [=> ] 3% ETA Stalled!
15.1GB 02:09:53 [0.0B/s] [=> ] 3% ETA Stalled!
15.1GB 02:09:56 [0.0B/s] [=> ] 3% ETA Stalled!
15.1GB 02:09:59 [0.0B/s] [=> ] 3% ETA Stalled!
15.1GB 02:10:02 [0.0B/s] [=> ] 3% ETA Stalled!
15.1GB 02:10:05 [0.0B/s] [=> ] 3% ETA Stalled!
15.1GB 02:10:08 [0.0B/s] [=> ] 3% ETA Stalled!
dpbx,files_upload_session_start(): UploadSessionStartResult(session_id=u'XXXXXXXXXXXXXXXXXXX')
dpbx,files_upload_sesssion_append([134217728 bytes], offset=134217728)
15.3GB 02:10:11 [22.3MB/s] [=> ] 3% ETA 6h 6min
15.3GB 02:10:14 [15.6MB/s] [=> ] 3% ETA 6h 7min
15.3GB 02:10:17 [0.0B/s] [=> ] 3% ETA Stalled!
15.3GB 02:10:20 [0.0B/s] [=> ] 3% ETA Stalled!
15.3GB 02:10:23 [0.0B/s] [=> ] 3% ETA Stalled!
...

Revision history for this message
edso (ed.so) said :
#8

On 19.04.2016 12:52, Henrik Farre wrote:
> Question #291337 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/291337
>
> Henrik Farre posted a new comment:
> Hi
>
> Yeah, I will take a look at it.
>
> I'm also wondering why there are gaps in the upload, there is no
> activity in ~30 sec and the log just says):
>
> AsyncScheduler: scheduling task for asynchronous execution
> 15.1GB 02:09:41 [0.0B/s] [=> ] 3% ETA Stalled!
> 15.1GB 02:09:44 [0.0B/s] [=> ] 3% ETA Stalled!
SNIP
> 15.1GB 02:10:08 [0.0B/s] [=> ] 3% ETA Stalled!
> dpbx,files_upload_session_start(): UploadSessionStartResult(session_id=u'XXXXXXXXXXXXXXXXXXX')
> dpbx,files_upload_sesssion_append([134217728 bytes], offset=134217728)
> 15.3GB 02:10:11 [22.3MB/s] [=> ] 3% ETA 6h 6min
> 15.3GB 02:10:14 [15.6MB/s] [=> ] 3% ETA 6h 7min
> 15.3GB 02:10:17 [0.0B/s] [=> ] 3% ETA Stalled!
> 15.3GB 02:10:20 [0.0B/s] [=> ] 3% ETA Stalled!
> 15.3GB 02:10:23 [0.0B/s] [=> ] 3% ETA Stalled!
> ...
>

can as well be the progress bar code or the async scheduler.. try switching them off.

..ede/duply.net

Revision history for this message
Henrik Farre (henrik-rockhopper) said :
#9

So... I don't know what to do next :)

I have tested different chunk sizes using this example code: https://www.dropboxforum.com/hc/en-us/community/posts/205544836-python-upload-big-file-example using a very unscientific method where I just added datetime.datetime.now() before and after the upload, and then converted the numbers to MB/s:

1Gb in 4Mb chunks: 10min, 44 sec ~ 1.58MB/s
1Gb in 16Mb chunks: 7min, 53 sec ~ 2.15MB/s
1Gb in 150Mb chunks: 5min, 45sec ~ 2.97MB/s
1Gb in 250Mb chunks: 5min, 23sec ~ 3,17MB/s
1Gb in 300Mb chunks: 5min, 15sec ~ 3,25MB/s
1Gb in 350Mb chunks: Connection reset by peer
1Gb in 500Mb chunks: Connection reset by peer

150Mb is max recommended by http://dropbox-sdk-python.readthedocs.org/en/master/moduledoc.html?highlight=files_upload_session_start#dropbox.dropbox.Dropbox.files_upload_session_start

The test script has ~15secs between uploads, e.g. there is no network traffic in that time.

When I run duplicity there is no network traffic in ~30secs, and if apply this patch: https://github.com/dropbox/dropbox-sdk-python/pull/54 I get a timeout:

dpbx code error "HTTPSConnectionPool(host='content.dropboxapi.com', port=443): Read timed out. (read timeout=30.0)"
Attempt 1 failed. ReadTimeout: HTTPSConnectionPool(host='content.dropboxapi.com', port=443): Read timed out. (read timeout=30.0)

So I think I just have to suck it up, change chunk size to 150 and let the backup run until it is done :)

Revision history for this message
Henrik Farre (henrik-rockhopper) said :
#10

Backup has completed:

--------------[ Backup Statistics ]--------------
StartTime 1461220327.34 (Thu Apr 21 08:32:07 2016)
EndTime 1461372829.59 (Sat Apr 23 02:53:49 2016)
ElapsedTime 152502.25 (42 hours 21 minutes 42.25 seconds)
SourceFiles 192485
SourceFileSize 421265398672 (392 GB)
NewFiles 192485
NewFileSize 421265398672 (392 GB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 192485
RawDeltaSize 421159612481 (392 GB)
TotalDestinationSizeChange 323026310170 (301 GB)
Errors 0
-------------------------------------------------

So about 2.5MB/s

I'm currently running a verify, and there are already a couple of errors: dpbx code error "dpbx: wrong file size: 385801717 (expected: 629179863)"

I will close this question now, and at some point look into the pause between uploads :)