Qemu guest system gets random high io-waittime

Asked by G. Kooistra

We are running KVM on Ubuntu (16.04). Sometimes a guest machine (Ubuntu 14.04/16.04) is getting a very high IO-Wait time, but there is not that much of disk-activity needed for the running processes.
The monitoring of the machines with Zabbix don't show any special.
For example, the average io-waitime is 0.5 and shoots spontaneously to an average of 2.5.

Restarting the virtual machine solves the problem (not restarting the services running on that machine, like apache and php). At the same time, the other guests dos not have the problem, running on the same hyperviser (so it is just that one virtual machine).

This happens on multipe of our hypervisor machines and multiple virtuale machines.
There is no exact time when it happents (could be afther two days of restarting te server, of afther weeks)

We are unable to reproduce the problem on test-machines and therefore cannot trace what caused it. It just happens sometimes.

Is there some configuration why io-waittimes can rise and don't drop until a restart?

Question information

Language:
English Edit question
Status:
Solved
For:
Ubuntu qemu Edit question
Assignee:
No assignee Edit question
Solved by:
G. Kooistra
Solved:
Last query:
Last reply:
Revision history for this message
actionparsnip (andrew-woodhead666) said :
#1

Is there anything in dmesg at the time of the issue?

Revision history for this message
G. Kooistra (gkooistrago) said :
#2

On the guest:
The only two things are an UFW BLOCK and a PHP segfault.
But those messages also occur before that time.

On the HyperVisor:
drbd r0/0 drbd0: Digest mismatch
But that message we will getting often

Revision history for this message
actionparsnip (andrew-woodhead666) said :
#3

Sounds like you need to resolve your drbd issue then. Possibly a split brain (Do you have a cluster setup?)

Revision history for this message
G. Kooistra (gkooistrago) said :
#4

The problem was found.

It's a bug with mod-security, creating a file that is to large on the /tmp partition.
So after an reboot, that file is deleted and de system is working fine again.