We suspect this is a kernel race condition on epoll that is more likely to happen with a recent change on kernel 5.4.0-132-generic (and other 5.4 kernels carrying that same change).
I have a kernel building with that epoll change reverted at ppa:cascardo/ppa (that should give you kernel version 5.4.0-132.148+epollcascardo1). Would you be able to test that with that kernel installed, the problem cannot be so easily reproduced anymore?
We are also working on the real fix for the race condition, given we were able to reproduce it with a synthetic workload even on 5.4.0-131-generic.
We suspect this is a kernel race condition on epoll that is more likely to happen with a recent change on kernel 5.4.0-132-generic (and other 5.4 kernels carrying that same change).
I have a kernel building with that epoll change reverted at ppa:cascardo/ppa (that should give you kernel version 5.4.0-132. 148+epollcascar do1). Would you be able to test that with that kernel installed, the problem cannot be so easily reproduced anymore?
We are also working on the real fix for the race condition, given we were able to reproduce it with a synthetic workload even on 5.4.0-131-generic.
Cascardo.