Python segfaults reliably mid-backup on OS X (Duplicity 0.7.06)

Asked by Jonathan Elchison

Hi folks, long-time fan of Duplicity. Thanks for the great product!

Today I've become stumped, as a backup that had been working for years started reliably segfaulting mid-backup this month. Full (sanitized) dump is below.

Sadly, I have no stack trace, and VERBOSE=9 produces nothing more then messages while searching/selecting. No hints as to the segfault.

When running duplicity without duply, same result: no stack trace.

I have duplicity installed via brew, so it's feasible that today was my first attempt running v0.7.06 on OS X.

What can I try/provide to help diagnose?

Thanks!

--

```
$ sudo duply Computer_External bkp+purge+purgeFull_cleanup_status --force
Start duply v1.10.1, time is 2016-02-07 15:27:59.
Using profile '/Users/user/.duply/Computer_External'.
Using installed duplicity version 0.7.06, python 2.7.11, gpg: unsafe (Home: ~/.gnupg), awk 'awk version 20070501', grep 'grep (BSD grep) 2.5.1-FreeBSD', bash '3.2.57(1)-release (x86_64-apple-darwin15)'.
Checking TEMP_DIR '/tmp' is a folder (OK)
Checking TEMP_DIR '/tmp' is writable (OK)
Test - En/Decryption skipped. (Testing disabled)

--- Start running command BKP at 15:28:00.000 ---
Reading globbing filelist /Users/user/.duply/Computer_External/exclude
Local and Remote metadata are synchronized, no sync needed.
Warning, found incomplete backup sets, probably left from aborted session
Last full backup date: Sun Dec 20 00:28:02 2015
Reading globbing filelist /Users/user/.duply/Computer_External/exclude
<snip>
0.0KB 00:13:58 [0.0B/s] [> ] 0% ETA Stalled!
/usr/local/bin/duply: line 1151: 1643 Segmentation fault: 11 TMPDIR='/tmp' PASSPHRASE=‘enckeypass’ SIGN_PASSPHRASE=‘signkeypass’ duplicity --name duply_Computer_External --encrypt-key enckey1 --encrypt-key enckey2 --sign-key signkey1 --verbosity '4' --full-if-older-than 1Y --progress --asynchronous-upload --file-prefix-archive 'archive-' --allow-source-mismatch --force --exclude-filelist '/Users/user/.duply/Computer_External/exclude' '/Users/otheruser’ 'file:///Volumes/External/bak/machines/Computer/'
16:00:24.000 Task 'BKP' failed with exit code '139'.
--- Finished state FAILED 'code 139' at 16:00:24.000 - Runtime 00:32:24.000 ---

--- Skipping command PURGE ! ---

--- Skipping command PURGEFULL ! ---

--- Start running command CLEANUP at 16:00:24.000 ---
Another instance is already running with this archive directory
If you are sure that this is the only instance running you may delete
the following lockfile and run the command again :
 /Users/user/.cache/duplicity/duply_Computer_External/lockfile.lock
16:00:34.000 Task 'CLEANUP' failed with exit code '23'.
--- Finished state FAILED 'code 23' at 16:00:34.000 - Runtime 00:00:10.000 ---

--- Start running command STATUS at 16:00:34.000 ---
Another instance is already running with this archive directory
If you are sure that this is the only instance running you may delete
the following lockfile and run the command again :
 /Users/user/.cache/duplicity/duply_Computer_External/lockfile.lock
16:00:34.000 Task 'STATUS' failed with exit code '23'.
--- Finished state FAILED 'code 23' at 16:00:34.000 - Runtime 00:00:00.000 ---
```

Question information

Language:
English Edit question
Status:
Solved
For:
Duplicity Edit question
Assignee:
No assignee Edit question
Solved by:
Jonathan Elchison
Solved:
Last query:
Last reply:
Revision history for this message
edso (ed.so) said :
#1

On 07.02.2016 22:32, Jonathan Elchison wrote:
> New question #284756 on Duplicity:
> https://answers.launchpad.net/duplicity/+question/284756
>
> Hi folks, long-time fan of Duplicity. Thanks for the great product!
>
> Today I've become stumped, as a backup that had been working for years started reliably segfaulting mid-backup this month. Full (sanitized) dump is below.
>
> Sadly, I have no stack trace, and VERBOSE=9 produces nothing more then messages while searching/selecting. No hints as to the segfault.
>
> When running duplicity without duply, same result: no stack trace.
>
> I have duplicity installed via brew, so it's feasible that today was my first attempt running v0.7.06 on OS X.
>
> What can I try/provide to help diagnose?
>

looks like python is segfaulting

1. try w/o --progress´, it makes weird things sometimes

2. find where OSX logs it's segfaults. they should end up in a kernel log or such.

3. have duply generate the command line for you via --preview switch.. run the command line manually and see if it now print the whole segfault info, for you to take and web search for a solution

4. try to downgrade python, to a version that worked before

..ede/duply.net

Revision history for this message
edso (ed.so) said :
#2

additionally

> gpg: unsafe (Home: ~/.gnupg),

looks wrong.. what is the output of 'gpg --version' on your terminal.

..ede/duply.net

Revision history for this message
Jonathan Elchison (jelchison) said :
#3

thanks for the quick responses!

i will attempt to organize my replies logically, in separate comments.

first, concerning the unsafe gpg: gpg marks unsafe whenever it's not running as the owner of the gpg config file it's using. this is rare, except when using sudo. when backing up files outside of the current user (as in my case),sudo is what causes the unsafe marker to appear.

this has been this way as long as i can remember, and i don't think it's related to my python segfault at hand.

here's the relevant output:

*** no sudo (not marked unsafe) ***

$ gpg --version
gpg (GnuPG) 1.4.19
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Home: ~/.gnupg
Supported algorithms:
Pubkey: RSA, RSA-E, RSA-S, ELG-E, DSA
Cipher: IDEA, 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH,
        CAMELLIA128, CAMELLIA192, CAMELLIA256
Hash: MD5, SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224
Compression: Uncompressed, ZIP, ZLIB, BZIP2

*** with sudo (marked unsafe) ***

$ sudo gpg --version
Password:
gpg: WARNING: unsafe ownership on configuration file `/Users/xxxxx/.gnupg/gpg.conf'
gpg (GnuPG) 1.4.19
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Home: ~/.gnupg
Supported algorithms:
Pubkey: RSA, RSA-E, RSA-S, ELG-E, DSA
Cipher: IDEA, 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH,
        CAMELLIA128, CAMELLIA192, CAMELLIA256
Hash: MD5, SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224
Compression: Uncompressed, ZIP, ZLIB, BZIP2

more comments to follow on your other suggestions...

Revision history for this message
Jonathan Elchison (jelchison) said :
#4

you are correct, python itself is crashing, always at the same location inside librsync. here are the interesting parts of the log:

Process: Python [1643]
Path: /usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
Identifier: Python
Version: 2.7.11 (2.7.11)
Code Type: X86-64 (Native)
Parent Process: bash [1396]
Responsible: Python [1643]
User ID: 0

Date/Time: 2016-02-07 16:00:25.963 -0500
OS Version: Mac OS X 10.11.3 (15D21)
Report Version: 11
Anonymous UUID: xxxxx

Time Awake Since Boot: 2400 seconds

System Integrity Protection: enabled

Crashed Thread: 0 Dispatch queue: com.apple.main-thread

Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x00007fd45914ae04
Exception Note: EXC_CORPSE_NOTIFY

VM Regions Near 0x7fd45914ae04:
    Stack 0000700000807000-0000700000c09000 [ 4104K] rw-/rwx SM=COW thread 2
-->
    MALLOC_TINY 00007fd6d8400000-00007fd6d8800000 [ 4096K] rw-/rwx SM=PRV

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 librsync.2.dylib 0x0000000105545213 rs_search_for_block + 414
1 librsync.2.dylib 0x000000010554252c rs_delta_s_scan + 201
2 librsync.2.dylib 0x000000010554313c rs_job_iter + 87
3 _librsync.so 0x000000010553d737 _librsync_deltamaker_cycle + 161
4 org.python.python 0x0000000104f06b57 PyEval_EvalFrameEx + 26864
5 org.python.python 0x0000000104f0a6b1 fast_function + 264
6 org.python.python 0x0000000104f06a7d PyEval_EvalFrameEx + 26646
7 org.python.python 0x0000000104f00075 PyEval_EvalCodeEx + 1583
8 org.python.python 0x0000000104f0a61e fast_function + 117
9 org.python.python 0x0000000104f06a7d PyEval_EvalFrameEx + 26646
10 org.python.python 0x0000000104f0a6b1 fast_function + 264
11 org.python.python 0x0000000104f06a7d PyEval_EvalFrameEx + 26646
12 org.python.python 0x0000000104f0a6b1 fast_function + 264
13 org.python.python 0x0000000104f06a7d PyEval_EvalFrameEx + 26646
14 org.python.python 0x0000000104f0a6b1 fast_function + 264
15 org.python.python 0x0000000104f06a7d PyEval_EvalFrameEx + 26646
16 org.python.python 0x0000000104f00075 PyEval_EvalCodeEx + 1583
17 org.python.python 0x0000000104f0a61e fast_function + 117
18 org.python.python 0x0000000104f06a7d PyEval_EvalFrameEx + 26646
19 org.python.python 0x0000000104f00075 PyEval_EvalCodeEx + 1583
20 org.python.python 0x0000000104f0a61e fast_function + 117
21 org.python.python 0x0000000104f06a7d PyEval_EvalFrameEx + 26646
22 org.python.python 0x0000000104f0a6b1 fast_function + 264
23 org.python.python 0x0000000104f06a7d PyEval_EvalFrameEx + 26646
24 org.python.python 0x0000000104f0a6b1 fast_function + 264
25 org.python.python 0x0000000104f06a7d PyEval_EvalFrameEx + 26646
26 org.python.python 0x0000000104f0a6b1 fast_function + 264
27 org.python.python 0x0000000104f06a7d PyEval_EvalFrameEx + 26646
28 org.python.python 0x0000000104f0a6b1 fast_function + 264
29 org.python.python 0x0000000104f06a7d PyEval_EvalFrameEx + 26646
30 org.python.python 0x0000000104f00075 PyEval_EvalCodeEx + 1583
31 org.python.python 0x0000000104effa40 PyEval_EvalCode + 54
32 org.python.python 0x0000000104f24041 run_mod + 53
33 org.python.python 0x0000000104f240e4 PyRun_FileExFlags + 133
34 org.python.python 0x0000000104f23c33 PyRun_SimpleFileExFlags + 698
35 org.python.python 0x0000000104f35531 Py_Main + 3137
36 libdyld.dylib 0x00007fff8fe1c5ad start + 1

Revision history for this message
Jonathan Elchison (jelchison) said :
#5

no printouts for segfault when running duplicity directly (sans duply). also, removing `--progress` has no effect.

however, i isolated the issue, it's caused by upgrading librsync from 1.0.0 to 2.0.0.

relevant threads, for posterity:
* https://github.com/librsync/librsync/issues/50
* https://bbs.archlinux.org/viewtopic.php?pid=1593684
* https://bugs.archlinux.org/task/47645
* https://bbs.archlinux.org/viewtopic.php?id=207080

sorry for the hassle, thanks for your help!

Revision history for this message
edso (ed.so) said :
#6

good to know you've found the issue!

can you tell me in which log you discovered the crash's thread stack? ..ede

Revision history for this message
Jonathan Elchison (jelchison) said :
#7

sure thing! i followed the instructions at https://apple.stackexchange.com/a/26483

i found references to Python crashes in the syslog, and they linked to Python crash reports. the crash reports are also available under "System Diagnostic Reports".

Revision history for this message
Artur Bodera (abodera) said :
#8

Confirmed. I am unable to run any backups with duplicity because it crashes constantly :-(

I'm using 0.7.06_01 via homebrew (https://github.com/Homebrew/homebrew/blob/master/Library/Formula/duplicity.rb)

Here's and example crash file (thread stack):
https://gist.github.com/Thinkscape/765ac5eaa1171887eb61

With log level DEBUG it looks like this:
-----------
Comparing Taco.sparsebundle/bands/0 and Taco.sparsebundle/bands/0
Selection: examining path /Volumes/Data/Taco.sparsebundle/bands/1
Selection: result: None from function: Command-line exclude glob: /Volumes/Data/.*
Selection: result: 1 from function: Command-line include glob: /Volumes/Data/*.sparsebundle
Selection: + including file
Selecting /Volumes/Data/Taco.sparsebundle/bands/1
Comparing Taco.sparsebundle/bands/1 and Taco.sparsebundle/bands/1
Getting delta of (Taco.sparsebundle/bands/1 reg) and (Taco.sparsebundle/bands/1 reg)
M Taco.sparsebundle/bands/1
-------------
... it just exits (crashes) after that ..

Revision history for this message
Artur Bodera (abodera) said :
#9

Good news to anyone following this. It seems that the bug has been fixed in librsync:
https://github.com/librsync/librsync/issues/50#issuecomment-183803254

From duplicity side, we'll probably have to wait for them to tag librsync 2.0.1.

Revision history for this message
edso (ed.so) said :
#10

Artur,

> From duplicity side, we'll probably have to wait for them to tag librsync 2.0.1.

duplicity does not enforce any version of librsync. that's the job of package maintainers if they feel inclined to do so.

..ede/duply.net

Revision history for this message
Artur Bodera (abodera) said :
#11

You're right. My bad. I'm pushing this issue downstream to osx homebrew ...

Although people using Duplicity on OSX from other sources will still be affected.
Maybe a version check, or at least a warning message or something?

Revision history for this message
Jonathan Elchison (jelchison) said :
#12

I concur with edso. Loosely speaking, Duplicity is a user of Python, and respects its interfaces. Python's interaction with librsync is the interface under question. Asking Duplicity to perform version checking on an item further down the stack breaks abstraction/interface rules.

Artur, if you're interested in helping push this problem through, I don't think homebrew is the right place to go. I encourage you to pick up the discussion with the librsync folks at https://github.com/librsync/librsync/issues/50. They're currently looking for a portable way to unit test the proposed solution. Perhaps you can lend a hand if you have a reliable and reproducible test case?

Revision history for this message
Artur Bodera (abodera) said :
#13

Thanks @jelchison. Agreed, it's too far into the forest.

As for fix, it needs attention in both places. Fixing the underlying issues is one thing, but homebrew "formulas" control which version of librsync to install/compile on user's OSX, so that's why I mentioned homebrew users can expect this problem to automagically go away in very near future.

I guess other OSX users who compiled librsync via python/pip are out of luck but hopefully they'll find this ticket and navigate to a solution.

to recap:

----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~
## librsync 2.0.0 IS UNSTABLE UNDER OSX AND CRASHES DUPLICITY -- UPGRADE NOW ###
----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~ ----~

Revision history for this message
Jonathan Elchison (jelchison) said :
#14

Sadly, I don't believe there is anything (yet) to upgrade to, unless you build your own. As of current, librsync 2.0.0 is the latest version tagged/released. https://github.com/librsync/librsync/releases

I don't think Homebrew versions packages out of sync with upstream, so I think the ball is in librsync's court. The recipe still pulls 2.0.0 tagged: https://github.com/Homebrew/homebrew/blob/master/Library/Formula/librsync.rb

As mentioned above, I think librsync is still looking for a portable way to unit test the proposed solution.

FYI, your statement is misleading: librsync 2.0.0 only crashes Duplicity under very specific scenarios, as outlined in the librsync bug report. Many Duplicity users may very well never see this segfault.