gpg-agent poor performance / millions of futex errors

Asked by Andrey Arapov on 2020-11-16

Hello,

after upgrading from Ubuntu 16.04 to Ubuntu 18.04 we've noticed the issues which came along with the gpg v2.x.

The gpg-agent produces millions of futex syscall errors during a very short time (a second or two) when it's being loaded either by the SaltStack's salt-master decrypting the pillars (our main use case) or when it is being directly tested with "parallel" tool from moreutils package.

```
$ sudo strace -f -p <pidof gpg-agent>
...
...Ctrl+C just after couple of seconds while "gpg -d" commands are running in parallel (see below for details)
...
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
 96.35 2800.145231 305 9194103 2009552 futex
  3.63 105.404136 373774 282 pselect6
  0.01 0.431338 102 4246 read
  0.00 0.104701 12 8490 write
  0.00 0.029085 103 283 accept
  0.00 0.016549 58 284 madvise
  0.00 0.012201 22 567 close
  0.00 0.010979 8 1410 getpid
  0.00 0.010405 12 849 access
  0.00 0.006341 22 284 284 wait4
  0.00 0.004350 15 283 openat
  0.00 0.003764 13 283 clone
  0.00 0.002668 9 283 getsockopt
  0.00 0.002568 9 283 fstat
  0.00 0.002564 9 283 set_robust_list
  0.00 0.001941 7 283 lseek
------ ----------- ----------- --------- --------- ----------------
100.00 2906.188821 9212496 2009836 total
```

I'll describe the issue and steps to reproduce it.

First, prepare the "enc" file:

```
cat /usr/share/doc/base-files/README | gpg -ear "some-4K-RSA-publick-key" > enc
```

Run parallel decryptions using "time" to measure it:

```
time parallel -j 30 sh -c "cat enc | gpg --no-tty -d -q -o /dev/null" -- $(seq 1 3000)
```

Running "gpg -d" (GPG v2.x, with the gpg-agent) in parallel as described above took:
- 1minute 18seconds on a big HW; (48 cores, *gpg-agent 2.2.4*-1ubuntu1.2)
- 32 seconds on my laptop; (4 cores, *gpg-agent 2.2.19*-3ubuntu2)

Running the same commands but with GPG v1.4.20 (no agent):
- 9 seconds on a big HW: (40 cores, *gnupg 1.4.20*-1ubuntu3.3)
- 21 seconds on a VM; (1 core, *gnupg 1.4.20*-1ubuntu3.3)

Note: in order to prevent "command 'PKDECRYPT' failed: Cannot allocate memory <gcrypt>" error, the gpg-agent is running either with "--auto-expand-secmem 0x30000" flag or with "auto-expand-secmem" in ~/.gnupg/gpg-agent.conf file.

Since our use case is to have SaltStack's salt-master decrypt many pillars for hundreds of servers, the Ubuntu 16.04 => 18.04 upgrade severely degrades the SaltStack performance making it almost unusable, i.e. it becomes 10 times slower, requires us figuring workarounds such as increasing "gather_job_timeout" or probably even rolling back to gpg v1.x -- not sure if Ubuntu Bionic fully supports that and won't break though (we haven't tested the gpg 2.x => 1.x downgrade scenario yet -- any insights are highly appreciated!).

Any suggestions?

Kind regards,
Andrey Arapov

Question information

Language:
English Edit question
Status:
Answered
For:
Ubuntu Edit question
Assignee:
No assignee Edit question
Last query:
2020-11-18
Last reply:
2020-11-19
Bernard Stafford (bernard010) said : #1

I looked at your bug report. If you run apport in a terminal and include it with the bug report it could be helpful.
ubuntu-bug linux https://wiki.ubuntu.com/Apport - Thank You-

Bernard Stafford (bernard010) said : #2

One thing that you might test is making a new gpg key and check the response time and error rate.
Be sure to keep your old keys.
Consider debugging the gpg-agent.
 Manpage: http://manpages.ubuntu.com/manpages/cosmic/man1/gpg-agent.1.html

Bernard Stafford (bernard010) said : #3

Package: salt-master has a recommended package on their package list: python3-pygit2 [bindings for libgit2-Python 3.x]
https://packages.ubuntu.com/bionic/salt-master

Andrey Arapov (andrey-arapov) said : #4

Hi Bernard,

> I looked at your bug report. If you run apport in a terminal and include it with the bug report it could be helpful.

The gpg-agent isn't crashing and Apport is meant for debugging program crashes by intercepting them.
Could you please elaborate how would Apport help in this situation?

The distro version, the gpg-agent version are in the initial report. The gpg-agent is running with the default arguments, except for the "auto-expand-secmem" added as mentioned too.

> One thing that you might test is making a new gpg key and check the response time and error rate.

I've tried this with different keys, it didn't change the weather.
It is worth mentioning that the private gpg key we are using isn't encrypted.

> Package: salt-master has a recommended package on their package list: python3-pygit2 [bindings for libgit2-Python 3.x]

we are using salt-master 2018.3.5+ds-1 and python-pygit2 0.26.2-2.
The Saltstack support didn't indicate any issues with our Salt configuration except that they have agreed on the gpg is causing this bottleneck.

Werner Koch (author of GnuPG) brought more light on the situation saying:
> Note that you actually run 30 independent processes with gpg 1.4 but with gpg-agent there is just one process to handle the private key operations (decrypt). To utilize more cores you need to setup several GNUPGHOME with the same private keys.

Source: https://dev.gnupg.org/T5137#139066

He has also changed this report to a feature request: "Allow several processes to run public key decryption using the same set of private keys." so, I think this will solve the issue :-)

Kind regards,
Andrey Arapov

Manfred Hampl (m-hampl) said : #5

The command that you should use is

apport-collect 1904416

This will collect debugging information about your problem, even if there is no crash file.

Bernard Stafford (bernard010) said : #6

That is good. Even having several processes running public key decryption will help. I do not think they are understanding the huge volume of hundreds of servers all making call requests some at the same time. I wonder if they could develop a multi-thread gpg agent that will utilize each core as an independent gpg agent. Like on your Big HW utilizing 48 cores to answer that many call requests. I am glad they might have found a solution for the situation. "gniibe" posted an experiment : to allow computation by multiple threads.
-Thank You-

Can you help with this problem?

Provide an answer of your own, or ask Andrey Arapov for more information if necessary.

To post a message you must log in.