Ubuntu
linux package

Bug #131094
Comment #416

Comment 416 for bug 131094

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2015-10-01:

#416

It might not be good to stir up such an old bug, but it gets regularly updated and new complains so maybe a new approach might help.

So let us make one thing clear, IMHO if something overloads your machine with disk I/O it has to stall it.
So the solutions paths are more like this:
a) beat it with more Processsing / IO HW
b) mitigate the effect as far as possible
c) avoid the overload before it starts

The issue is a common one - so I'll keep my explanations general and not specific to trackerd or any other case that was mentioned before.

### a) beat it with more Processsing / IO HW ###
There are way more expensive machines out there which can handle way more I/O without being slown down. The reason is that they have more I/O Cards, virtual functions to spread over CPUs handling that and at the high end servers with totally different I/O IRQ designs.
We should agree that on cheap/slow or even medium machines I/O overload just *IS* an issue to responsiveness.
But that isn't important - the question is what can a normal user do about it and spending x000000 $ on a machine isn't the solution.

### b) mitigate the effect as far as possible ###
So regarding mitigation there were already some approaches in this bug discussion.
Like using ionice and several dirty ratio tunings, but all these don't prevent the I/O overload.
E.g. if you overload the system with only "Best Effort" I/O class, the only difference it makes is that "other I/O" might pass faster, but your system is still fairly busy => unresponsive
Also dirty ratios come down to spending the process remaining time slice to clean up dirty memory as soon as a certain level is reached, now while you can configure higher ratios (at the price of endangering integrity) it also won't stop the burst of I/O. No instead it will allow to submit more data to dirty the page cache and thereby indirectly more I/O overloading the system again.

### c) avoid the overload before it starts ###
It must be said, since this bug starts back in 2007 and a lot of the reports are related to I/O+*sync that just for sync&journaling various filesystem and general kernel improvements have been mad. Several posts in this bug confirm this already.
Now what I didn't see people trying throttle the processes that overload the system.
Throttling at => https://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt
As any - this approach has certain limitations, but it is a new way to tackle the overall issue.
It also need certain cgroup and filesystem features (like accounting writeback through pagecache) which might only be available in modern ubuntu releases.

### Experiment ###
As an experiment to prove the solution I use the tools fio and latencytop to compare:
1. no background load checking latencytop
2. running a random read/write mutlithread fio in background checking latencytop
3. running a throttled random read/write mutlithread fio in background checking latencytop

# Background Load #
A fio job file like this:

[global]
ioengine=libaio
rw=randrw
bssplit=1k/25:4k/50:64k/25
size=512m
directory=/home/paelzer/latencytest
iodepth=8

[dio]
direct=1
numjobs=8

[pgc]
direct=0
numjobs=8

# Case 2 - Unrestricted background load overloading the I/O subsystem shows massive impact
- ext4 data/log writes
- memory management due to trashing page cache
...
=> Fast
Jobs: 16 (f=16): [m(16)] [6.7% done] [92482KB/99.50MB/0KB /s] [6302/6483/0 iops] [eta 01m:51s]

# Case 3 - Now the same workload but contained in a blkio throttled cgroup
mkdir /sys/fs/cgroup/blkio/limitbgload
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 29,3G 0 disk
├─sda1 8:1 0 28,3G 0 part /
├─sda2 8:2 0 1K 0 part
└─sda5 8:5 0 1021M 0 part
# Limit to 4MB/s write and 8 MB/s write speed
echo 8:0 $((1024*1024*4)) > /sys/fs/cgroup/blkio/limitbgload/blkio.throttle.write_bps_device
echo 8:0 $((1024*1024*8)) > /sys/fs/cgroup/blkio/limitbgload/blkio.throttle.read_bps_device
cgexec -g blkio:limitbgload fio causelatency.fiojob

The workload shows throttling is working:
Jobs: 16 (f=16): [m(16)] [22.0% done] [6724KB/8915KB/0KB /s] [577/598/0 iops] [eta 09m:25s]

=> this shows almost only the stalls due to throttling itself which are wanted
=> the dirtying and filesystem latencies are way smaller now
=> the system "feels" right regarding responsiveness

### TL;DR ###
- huge machines just beat I/O overload with more HW or better I/O Architecture
- Code improves to mitigate effects but can never be perfect for *ALL* users at once (especially in the default config)
- try throttling your processes overloading I/O if you are not requiring its results asap
=> Let us discuss if that would be an option and if so let us close this bug and open a separate one requesting configurable throttling for each component applicable like trackerd and so many other I/O heavy background tasks

It might not be good to stir up such an old bug, but it gets regularly updated and new complains so maybe a new approach might help.

The issue is a common one - so I'll keep my explanations general and not specific to trackerd or any other case that was mentioned before.

### c) avoid the overload before it starts ###
It must be said, since this bug starts back in 2007 and a lot of the reports are related to I/O+*sync that just for sync&journaling  various filesystem and general kernel improvements have been mad. Several posts in this bug confirm this already.
Now what I didn't see people trying throttle the processes that overload the system.
Throttling at => https://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt
As any - this approach has certain limitations, but it is a new way to tackle the overall issue.
It also need certain cgroup and filesystem features (like accounting writeback through pagecache) which might only be available in modern ubuntu releases.

# Background Load #
A fio job file like this:

[global]
ioengine=libaio
rw=randrw
bssplit=1k/25:4k/50:64k/25
size=512m
directory=/home/paelzer/latencytest
iodepth=8

[dio]
direct=1
numjobs=8

[pgc]
direct=0
numjobs=8

# Case 1 - No background load => almost no latency
Cause                                                Maximum     Percentage
Waiting for event (select)                          5,0 msec         39,7 %
Waiting for event (poll)                            5,0 msec         33,9 %
Userspace lock contention                           4,8 msec         25,7 %
[do_wait]                                           2,7 msec          0,4 %
[ep_poll]                                           2,4 msec          0,2 %
Reading from file                                   0,9 msec          0,0 %
Reading EXT3 directory htree                        0,2 msec          0,0 %
[hrtimer_nanosleep]                                 0,1 msec          0,0 %

Cause                                                Maximum     Percentage
[ext4_file_write_iter]                             91,8 msec          0,3 %
[wait_transaction_locked]                          63,4 msec          0,1 %
Marking inode dirty                                61,2 msec          0,9 %
[SyS_io_destroy]                                   46,3 msec          0,3 %
[lru_add_drain_all]                                18,0 msec          0,1 %
[__block_write_begin]                              16,8 msec         38,5 %
[__lock_page_killable]                             16,2 msec         34,7 %
[read_events]                                       5,0 msec         21,2 %
Waiting for event (poll)                            5,0 msec          1,9 %

# Case 3 - Now the same workload but contained in a blkio throttled cgroup
mkdir /sys/fs/cgroup/blkio/limitbgload
lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0 29,3G  0 disk 
├─sda1   8:1    0 28,3G  0 part /
├─sda2   8:2    0    1K  0 part 
└─sda5   8:5    0 1021M  0 part
# Limit to 4MB/s write and 8 MB/s write speed
echo 8:0 $((1024*1024*4)) > /sys/fs/cgroup/blkio/limitbgload/blkio.throttle.write_bps_device
echo 8:0 $((1024*1024*8)) > /sys/fs/cgroup/blkio/limitbgload/blkio.throttle.read_bps_device
cgexec -g blkio:limitbgload fio causelatency.fiojob

The workload shows throttling is working:
Jobs: 16 (f=16): [m(16)] [22.0% done] [6724KB/8915KB/0KB /s] [577/598/0 iops] [eta 09m:25s]

But we can also see its desired effect avoiding to overload the system with I/O.
Cause                                                Maximum     Percentage
[__lock_page_killable]                            132,2 msec         46,5 %
[__block_write_begin]                             131,4 msec         47,9 %
fsync() on a file (type 'F' for details)           30,7 msec          0,0 %
Marking inode dirty                                21,5 msec          0,1 %
[ext4_file_write_iter]                              5,2 msec          0,0 %
Waiting for event (select)                          5,0 msec          1,4 %
Userspace lock contention                           5,0 msec          1,0 %
Waiting for event (poll)                            5,0 msec          1,7 %
[read_events]                                       4,9 msec          1,3 %

=> this shows almost only the stalls due to throttling itself which are wanted
=> the dirtying and filesystem latencies are way smaller now
=> the system "feels" right regarding responsiveness

Ubuntulinux package

Comment 416 for bug 131094

Ubuntu
linux package