Heavy Disk I/O harms desktop responsiveness

Bug #131094 reported by Jamie McCracken
This bug affects 180 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Low
Unassigned
Nominated for Hardy by gururise
Nominated for Intrepid by unggnu
Nominated for Jaunty by Jeffery Davis
Nominated for Karmic by daneel
Nominated for Lucid by geek
Nominated for Maverick by Montblanc

Bug Description

Binary package hint: linux-source-2.6.22

When compared with 2.6.15 in feisty, heavy disk I/O causes increased iowait times and affects desktop responsiveness in 2.6.22

this appears to be a regression from 2.6.15 where iowait is much lower and desktop responsiveness is unaffected with the same I/O load

Easy to reproduce with tracker - index the same set of files with 2.6.15 kernel and 2.6.22 kernel and the difference in desktop responsiveness is massive

I have not confirmed if a non-tracker process which does heavy disk i/o (especially writing) replicates this yet - will do further investigation soon

Tags: cft-2.6.27
Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

Further investigation has led me to conclude that this bug is no longer valid

Slowdown in system can be eliminated by:

1) Clean install of tribe 4. I originally had tribe 3 when problem occurred and it persisted when upgrading but clean install somehow fixes the desktop responsiveness issues

2) Apps still feel slow but this is not a kernel issue - disabling esd sound in sound preferences makes gutsy as fast as feisty (see https://bugs.launchpad.net/ubuntu/+source/libgnome/+bug/115652)

Changed in linux-source-2.6.22:
status: New → Invalid
Revision history for this message
Jamie Lokier (jamie-shareable) wrote :

I have esd sound disabled, and performance is still incredibly slow when trackerd is running on a 2.6.22-{7,8,9} kernel. When I want to actually get some work done, I "killall -STOP trackerd".

The effect on desktop performance is weird: it feels exactly like heavy swapping. Menus etc. take seconds to appear. New apps take ages. Dragging a window can even take 10 seconds or more before it responds.

But there is free RAM, and especially there's plenty of reclaimable (i.e. not used by programs) RAM. I have 1GB.

It's not using much CPU either. (I have a Core Duo; neither core sees much usage while trackerd is running).

So it may be in some way dependent on I/O. But this is with the trackerd set to maximum throttling, i.e. slowest scanning.

Interestingly, the disk activity monitoring applet shows very little activity (little spikes every second or two), but the disk light is constantly on.

There's something else fishy: strace -p on the trackerd process shows expected system calls, but sometimes killing the strace prints "Process xxx detached" but then strace doesn't terminate, even with kill -9.

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

Im reopening this

Other users have experienced this (see comments in https://bugs.launchpad.net/ubuntu/+source/tracker/+bug/135115) and reported that a fresh install cures the problem

This indicates there's a bug when upgrading to gutsy which causes the high iowait times which can only be solved by doing a clean install.

I cant say whether this bug only occurs when upgrading from older gutsy versions or from feisty...

All I can say is that it started from clean install of tribe 3 and persisted when upgraded and did not go away until clean install of tribe 4

Changed in linux-source-2.6.22:
status: Invalid → Confirmed
Revision history for this message
Tom Badran (tom-badran) wrote :

I've marked the bug i filed against trackerd as a dup against this bug.

Like i say, a fresh install has made a substantial difference (completely unuseable machien with trackerd -> useable). I do however still hear my disk being hit fairly often. Its not impacting interactivity as severely as it used to, but there are still noticeable short stalls doing fairly trivial things such as opening menus etc.

Revision history for this message
Miguel Martinez (el-quark) wrote :

I'm also experiencing the slowdowns during large dist-upgrades involving several packages. This is a dist-upgraded Gutsy. Furthermore, I've seen firefox crashing pretty often during those heavy I/O periods. Sometimes, it has taken thunderbird with him.

Revision history for this message
Michael Vogt (mvo) wrote :

I milestone this bug as it is important to get this fixed if we use tracker by default.

Changed in linux-source-2.6.22:
importance: Undecided → High
Revision history for this message
Ben Collins (ben-collins) wrote :

Please try booting with elevator=deadline and tell me if that helps any.

Changed in linux-source-2.6.22:
assignee: nobody → ben-collins
status: Confirmed → In Progress
Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

if anything elevator=deadline seems to cause higher iowait and for longer periods (I even saw a 100% for it with that setting) when running trackerd

average iowait values when tracker is flushing to disk during heavy indexing of same files:

for feisty 2.6.20-15 : 90-95%
for 2.6.22-9 : 90-99%
for 2.6.22-9 with elevator=deadline: 95-100%

Revision history for this message
Miguel Martinez (el-quark) wrote : Re: [Bug 131094] Re: Heavy Disk I/O harms desktop responsiveness

Same here. elevator=deadline doesn't seem to help, although I don't have
any objective data to complement Jamie's

Jamie McCracken escribió:
> if anything elevator=deadline seems to cause higher iowait and for
> longer periods (I even saw a 100% for it with that setting) when
> running trackerd
>
> average iowait values when tracker is flushing to disk during heavy
> indexing of same files:
>
> for feisty 2.6.20-15 : 90-95%
> for 2.6.22-9 : 90-99%
> for 2.6.22-9 with elevator=deadline: 95-100%
>

--
----------------------------------------
Miguel Martínez Canales
    Dto. Física de la Materia Condensada
    UPV/EHU
    Facultad de Ciencia y Tecnología
    Apdo. 644
    48080 Bilbao (Spain)
Fax: +34 94 601 3500
Tlf: +34 94 601 5437
----------------------------------------

  "If you have an apple and I have an apple and
  we exchange these apples then you and I will
  still each have one apple. But if you have an
  idea and I have an idea and we exchange these
  ideas, then each of us will have two ideas."

  George Bernard Shaw

Revision history for this message
Ben Collins (ben-collins) wrote :

Ok, for the fun of it, please also try elevator=anticipatory

Revision history for this message
Jeff Schroeder (sejeff) wrote :

The latest gutsy kernel have the right settings to use blktrace. Try these commands
sudo apt-get install blktrace
sudo mount -t debugfs debugfs /sys/kernel/debug/

# If /dev/sda is the disk that / is located on
sudo btrace /dev/sda

# Let it run for a few seconds and then kill it with CTRL C.

That will show the top processes using your disk.

Revision history for this message
Jeff Schroeder (sejeff) wrote :

Make that:
sudo btrace -s /dev/sda

It gives a summary of the disk usage of each proccess.

Revision history for this message
Jeff Schroeder (sejeff) wrote :

Also note that gutsy has an 'ionice' command that you can use to slow don't IO for a process like trackerd. man ionice.

Revision history for this message
Julien Olivier (julo) wrote :

Hi,

I have upgraded from feisty to gutsy and also noticed that my GNOME desktop felt way slower than on feisty. I tried disabling esd, but it didn't help. The thing is that I laos tried to disable trackerd, but the slowness remains when I open F-Spot, or when I use Firefox. Is there a way to know if the problem really comes from the kernel ? Is it safe to re-install linux-image-2.6.20 from feisty ? If yes, are there any other packages I should downgrade too ?

Revision history for this message
Tobias Oetiker (tobi-oetiker) wrote :

Julien,

it seems that kernels 2.6.18 to 2.6.21 have some serios issues with
heavy disk io especially when multiple processes are fighting over
io and if read and write are going on in parallel ...

for us the upgrade to 2.6.22 helped a lot ...

there were changes to the io schedulers and massive changes to the
default values of the /proc/sys/vm/dirty_* tunables ...

we also found that the problems were more pronounced when using lvm
... unfortunately this is all anecdotal and non conclusive.

so if you have the chance, you might want to try 2.6.22 ...

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten
http://it.oetiker.ch <email address hidden> ++41 62 213 9902

Revision history for this message
Julien Olivier (julo) wrote :

Tobias,

as I said, I have upgraded to gutsy recently, so I do have kernel 2.6.22, and I still have speed problems. Whether or not the kernel is the culprit is still a mystery to me though.

Someone said that the problems seem to persist when you upgrade from feisty (versus a fresh install), so maybe I have inherited wrong values in /proc/sys/vm/dirty_* ?

I would be really pleased to help, so if there is anything I can test, I'm ready to help.

PS: I installed kernel 2.6.20 from feisty and booted on it, and it didn't change anything.

Revision history for this message
Tobias Oetiker (tobi-oetiker) wrote :

Today Julien Olivier wrote:

> Tobias,
>
> as I said, I have upgraded to gutsy recently, so I do have kernel
> 2.6.22, and I still have speed problems. Whether or not the kernel is
> the culprit is still a mystery to me though.
>
> Someone said that the problems seem to persist when you upgrade from
> feisty (versus a fresh install), so maybe I have inherited wrong values
> in /proc/sys/vm/dirty_* ?

this is highly unlikely ... check /etc/sysctl.conf to see if there
are any explicit settings

> I would be really pleased to help, so if there is anything I can test,
> I'm ready to help.
>
> PS: I installed kernel 2.6.20 from feisty and booted on it, and it
> didn't change anything.

in that case I am fresh out of ideas unfortunately.

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten
http://it.oetiker.ch <email address hidden> ++41 62 213 9902

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

I think there are two separate issues here

1) something in old tribes affects disk access (HAL or UDEV?) and on some occasions they persist when upgraded and only a fresh install cures the problem. This is what affected me and all disk IO read and writes were affected very badly even without tracker running. This only happens rarely as only a few people had this...

2) Ext3 write performance is very poor on both feisty and Gutsy - as soon as pdflush starts it tends to hog the disk. Putting $Home/.cache/tracker on a different FS like XFS improves things a lot (I only did this on feisty but not gutsy)

if default pdflush params have changed on gutsy kernel that could also affect write performance negatively.

 Another thing is my hard disk is whisper quiet on feisty but extremely noisy on gutsy - I had to hdparm to lower the noise. WOuld be nice to make it quiet by default too especailly as tracker makes it very noisy at times

Revision history for this message
Julien Olivier (julo) wrote :

Jamie,

about #1: any idea what exactly went wrong, and is there a chance that it might still be unfixed for some users ?

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

julien: Ive no idea what caused it but the effect was very noticeable even with light disk access. Only two people (myself included) have done a fresh install to solve the issue so i think its quite rare.

Im not sure if its recommended to dist-upgrade from feisty or not? (ive read a few cases where it did not work properly on osnews)

Revision history for this message
Julien Olivier (julo) wrote :

OK, I will try to re-install everything from scratch then.

Revision history for this message
Martin (martin615) wrote :

I disabled Tracker as a result of all the disk trashing. Yes, Tracker is nice. But I seriously question enabling it by default while this problem is still around (wherever the problem might lie).

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

Martin,

if the disk io issues are only tracker related then thats ok as we have fix for that in latest version (not yet in gutsy though) which should reduce the problem and prevent tracker from hogging the disk for long periods.

Revision history for this message
Jeff Fortin Tam (kiddo) wrote :

Please, don't tell me this will be unfixed for users who went the dist-upgrade way.
This is not as rare as you think, and clean installing for lots of people is not someting you want to do all the time. Isn't it possible to fix that with upgrades? If some config broke sometime, it should be possible to reverse it for everyone no?

I actually don't even know what is going on exactly anymore, but the thing I do see is that all my gutsy computers have really horrible performance whenever I do anything that uses the hard drive.

Revision history for this message
Martin (martin615) wrote :

Jamie,

Ok, that sounds great. I'll try enabling it again when the fix hits Gutsy.

Revision history for this message
Alexey Borzenkov (snaury) wrote :

I can confirm strange disk-related performance problems too, and I dist-upgraded to gutsy way after tribe5 was already out (thus I don't think it could be something from previous tribes). Also I wonder if other problems (like desktop often not showing after I relogin [so I always have to restart if I logout, not even /etc/init.d/gdm restart helps], and login sound not playing the first time, even after I installed esound) could be cured by a fresh install, but I won't have time to do it for several weeks... I guess it will be after gutsy is already released.

And somehow I don't believe it's rare... I wonder how many people actually dist-upgraded, as opposed to fresh install of tribe5?

Revision history for this message
Lukas Kolbe (lukas-einfachkaffee) wrote :

I can confirm this problem on latest Gutsy. It bothered me a while, but shamefully I didn't yet took the time to report it and I forgot wether this first appeared in feisty or in gutsy. My system was upgraded at least since feisty, possibly also since dapper. I actually can't remember when I last installed ubuntu from scratch.

Attached are the outputs of dmesg, hdparm -tT, smartctl -a, lspci -vvn and a vmstat 2 during my latest dist-upgrade that made the system heavily unresponsive (again). Also, while tracker is indexing, or evolution is starting or any other normal disk-io is happening, the system becomes unusable. Dist-upgrades of only a few packages take ages.

If there's anything I can do to help identify the root cause, please ask.

Revision history for this message
Lukas Kolbe (lukas-einfachkaffee) wrote :

And as this was mentioned before I thought it might be important: I'm using LVM. Attached is the complete disk-layout on my system.

Revision history for this message
Amit Kucheria (amitk) wrote :

This thread seems to be catching fire :-)

I did some IO testing of the Feisty and Gutsy kernels on Gutsy userspace. Results are at https://wiki.ubuntu.com/GutsyFeistySchedulerShootout?action=show

If someone can repeat these tests and posts the results, it would help drill down into the problem. Currently, it seems like only users doing dist-upgrades are having problems. Unfortunately, my machine was a fresh install.

Revision history for this message
Lukas Kolbe (lukas-einfachkaffee) wrote :

I run your test, the numbers seem quite equal to yours, but during the test my system became unresponsive like hell. Switching desktops (from web to evolution) took more than 20 seconds (probably due to swapping, I have 768MB RAM), subsequent switches took up to five seconds. I could see the drawing while I tried scrolling in evolutions' folder list. vim took ages to load etc. pp - all in all very sluggish.

Revision history for this message
Tobias Oetiker (tobi-oetiker) wrote :

I don't think the problem is entirely ubunty made ... Other people
are looking at IO performance too.

This does look interesting

  http://lkml.org/lkml/2007/8/16/77

and this ... http://lkml.org/lkml/diff/2007/8/23/218/1

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten
http://it.oetiker.ch <email address hidden> ++41 62 213 9902

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

Could this be sata related?

Can everyone who has this problem indicate if this is so?

just wondering if its related to https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/119730

Revision history for this message
Tom Badran (tom-badran) wrote :

I am on a sata machine, however i never had a problem with file copy
throughut speed etc., its just interactivity.

On 29/09/2007, Jamie McCracken <email address hidden> wrote:
>
> Could this be sata related?
>
> Can everyone who has this problem indicate if this is so?
>
> just wondering if its related to
> https://bugs.launchpad.net/ubuntu/+source/linux-
> source-2.6.20/+bug/119730
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>

--
Tom Badran
http://badrunner.net

Revision history for this message
Miguel Martinez (el-quark) wrote :

I don't think it's sata-related as I have an "old" Pentium-M (735) that
doesn't support SATA, and my laptop does suffer from the I/O issue.

Jamie McCracken escribió:
> Could this be sata related?
>
> Can everyone who has this problem indicate if this is so?
>
> just wondering if its related to
> https://bugs.launchpad.net/ubuntu/+source/linux-
> source-2.6.20/+bug/119730
>

--
----------------------------------------
Miguel Martínez Canales
    Dto. Física de la Materia Condensada
    UPV/EHU
    Facultad de Ciencia y Tecnología
    Apdo. 644
    48080 Bilbao (Spain)
Fax: +34 94 601 3500
Tlf: +34 94 601 5437
----------------------------------------

  "If you have an apple and I have an apple and
  we exchange these apples then you and I will
  still each have one apple. But if you have an
  idea and I have an idea and we exchange these
  ideas, then each of us will have two ideas."

  George Bernard Shaw

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

Also forgot to mention tracker 0.6.3 is now in gutsy (its not in the beta) - this version is designed to work around the issues here as well as being much better optimised as far as disk access goes.

Revision history for this message
Jeff Fortin Tam (kiddo) wrote :

Nope. My desktop only has IDE drives, and so does my laptop, so not sata-related.

Revision history for this message
Tobias Oetiker (tobi-oetiker) wrote :

Jamie,

I run sata with lvm

cheers
tobi

Today Jamie McCracken wrote:

> Could this be sata related?
>
> Can everyone who has this problem indicate if this is so?
>
> just wondering if its related to
> https://bugs.launchpad.net/ubuntu/+source/linux-
> source-2.6.20/+bug/119730
>
>

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten
http://it.oetiker.ch <email address hidden> ++41 62 213 9902

Revision history for this message
Amit Kucheria (amitk) wrote :

As pointed out by Jeff above, can someone having the problems run trackerd with ionice.

e.g. ionice -c3 -p<pid of trackerd>

Revision history for this message
Tom Badran (tom-badran) wrote :

I had already tried the ionice in one of the bugs closed off as a dup, it
makes absolutely no difference whatsoever

On 02/10/2007, Amit Kucheria <email address hidden> wrote:
>
> As pointed out by Jeff above, can someone having the problems run
> trackerd with ionice.
>
> e.g. ionice -c3 -p<pid of trackerd>
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>

--
Tom Badran
http://badrunner.net

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

Amit,

trackerd uses the best effort 7 disk io schedule by default (it tries idle class first but as that needs root it will fail and default to BE 7)

note disk writes are not affected by the schedule as they are controlled by pdflush and heavy writing is where the problem lies (pdflush tends to go crazy)

tracker 0.6.3 mitigates the pdflush problems by intermittently calling fsync when merging indexes to prevent pdflush from taking over the disk and starving other apps

Revision history for this message
Bill Hand (fxwgbill-gmail) wrote :

I too had this problem two or three kernels ago in gutsy. Finally as a last resort I uninstalled tracker for the time being. The response time improved greatly.

I have followed updates since a month into gutsy developement and ever since. If that helps any... Everything for the most part seems to be working on my system, other then a couple problems that are already in the bug list. Things seem to go a lot better as far as upgrades go, if you do follow along with the process. Doing it this way, has gotten me through the last couple of releases, without a reinstall, (knock on wood). It does get bumpy at times, but I am in a position that it isn't as critical if things 'break' for a bit.

Bill

Revision history for this message
Lukas Kolbe (lukas-einfachkaffee) wrote :

With the latest updates in gutsy this problem seems to be gone for me. I
just did a dist-upgrade and nearly didn't notice it, my laptop just
worked without lagging much.

I'm hooked :)

--
Lukas

Revision history for this message
Bill Hand (fxwgbill-gmail) wrote :

Interesting... I may re-install it and see what the result is. It
would help to know that it's acting better before release. I'll do it...

I'll let yall know how it goes. Working on it now.

Bill

Lukas Kolbe wrote:
> With the latest updates in gutsy this problem seems to be gone for me. I
> just did a dist-upgrade and nearly didn't notice it, my laptop just
> worked without lagging much.
>
> I'm hooked :)
>
>

Revision history for this message
Bill Hand (fxwgbill-gmail) wrote :

OK... Installing tracker 0.6.3-0ubuntu2, which I am showing to be the
latest version. Also installing the tracker-search-tool.

and a reboot...

Initially.. Here we go... trackerd staying around 22 to 24% on Proc
monitor. high as 40 to 45%; almost seems like it's ramping up again.
I'll let it run for a while, see if it ever catches up.

OK... ain't been 7 to 10 minutes and it's dropped back to 0, with
periodic 'hits'

6:59.20 CPU time right now @ 23:31 local

ok... still sittin there. 2337 local.

OK.. I'll keep a good watch on it.

If anyone needs further info as to my set up... lemme know...

Bill

Lukas Kolbe wrote:
> With the latest updates in gutsy this problem seems to be gone for me. I
> just did a dist-upgrade and nearly didn't notice it, my laptop just
> worked without lagging much.
>
> I'm hooked :)
>
>

Revision history for this message
Martin (martin615) wrote :

I'm still seeing an unexceptable amount of I/O.

For instance. I thought timing a git clone with tracker on and then off would be a good test. So I did:

time git clone git://git.kernel.org/pub/scm/git/git.git

Only to realize that tracker didn't start indexing until after the clone was done. That's ok. So, I thought I'd do:

date; time git clone git://git.kernel.org/pub/scm/git/git.git; date

instead, just to see how long after the clone was done tracker would keep indexing. I didn't get that far though. I did a "rm -rf git" while git was indexing and punding the disk and thought I'd wait until it was done. It took several _minutes_ (I don't know... 5? 10?) during which the disk was working frantically all the time.

:(

Revision history for this message
Martin (martin615) wrote :

(The part of my brain responsible for my english is still sound asleep... ;)

Granted, the git repo is ~6MB of text. But I still find the disk trashing unacceptable.

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

Martin,

Im happy to add a performance option to do fast index merges instead of incremental ones (which currently uses intermittent fsync calls to stop pdflush hogging the disk but can cause thrashing)

the problem is that ext3 performance when doing lots of alternating read/writes is horrifically abysmal (probably buggy rather than design fault) so for best results we recommend mounting ~/.cache as reiser or some other fs other than ext2/3

Revision history for this message
Martin (martin615) wrote :

* An option isn't really an option (pun intended ;). This sort of thing should Just Work (TM).
* If the problem is ext3, fine. That's what's used by default though. So I don't really see tracker being enabled by default while that's the case.

<ignorant question>
Is there no way to do more work in memory (depending on how much memory is available and needed, of course... there's a balance here) before writing to disk?
</ignorant question>

(Sorry if I come of as a bit harsh, I totally appreciate the work you do on Tracker and I've been looking forward to start using it full time for quite a while.)

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

in 0.6.3 we have a 16mb buffer to do that but once indexed we need to flush to disk at some point

Martin, can you confirm which version you are using? 0.6.2 was really poor in this regard

Revision history for this message
Martin (martin615) wrote :

I'm using 0.6.3.

<more ignorant questions>
How does the size of the data written to this buffer relate to the size of what's being indexed? Is it written continuously to the disk? (What I'm really wondering is how it's used when indexing 6MB of source and text files. I mean, 6MB of text really ought to result in less than 6MB of "indexing data"... right? If it's split up in small parts, maybe it can be merged together?)

How much would increaring this buffer help? If it helps, perhaps one way forward would be to make the size dynamic, so when lots of data needs to be indexed, more memory means less trashing by ext3. Perhaps it's size could even depend on the filesystem being used?
</more ignorant questions>

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

martin,

it does not work like that

the 16MB hit buffer is sufficient for about 64MB of text (as we only store unique and valid words) so your 6mb text easily fits into it - it would only update the index once all the new stuiff is indexed or the buffer overflows

the problem is updating an existing index - each word (and you could have 100,000 + words that need updating) requires a seek and then a write. Ext3 performs really badly with such seek read seek write patterns

if we do it in one shot then pdflush could hog the disk and deny access to other apps but this would be the fastest way to update with the least thrashing

we currently do it incrementally 1000-5000 words at a time followed by fsync so it will take longer but should not delay access to disk to other apps for more than a few secs

At the moment we cannot really improve things here further until ext3 or whatever causes the bad performance is fixed.

Revision history for this message
Martin (martin615) wrote :

There's no way you can reorder the data to better cope with ext3:s deficiencies (or whatever it is that's causing problems)?

(Humm... Maybe it's better to spend energy on fixing the real issue. :)

Revision history for this message
Martin (martin615) wrote :

Btw, are there any kernel people looking in to this? (By "this" I mean "fixing ext3".)

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

Martin,

No Idea - I read ext4 will soon go into kernel but dont know if that fixes the issue

Revision history for this message
Martin (martin615) wrote :

Yeah. I thought about ext4 too. Perhaps a post to LKML would be in order?

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

Martin,

Actually I have done some more testing and discovered that when index merging no physical reads were done (they are all cache hits) so ext3 problem is handling a ton of small writes which it does very badly according to google (it fragments destroying performance and preventing contiguous writing to the index)

XFS in contrast does these very well with a lot of contiguous writing which results in almost no loss of speed thanks to its delayed allocation feature (http://en.wikipedia.org/wiki/Delayed_allocation) which tracker benefits from.

the good news is that this feature is under consideration for ext4 - https://ols2006.108.redhat.com/2007/Reprints/sato-Reprint.pdf

so fingers crossed!

Revision history for this message
Martin (martin615) wrote :

Delayed allocation is most definetely going in. :) The only question seems to be if it should go in ext4 or the VFS layer so it can be shared with XFS and other filesystems. See e.g.

Section 3.2 in https://ols2006.108.redhat.com/2007/Reprints/mathur-Reprint.pdf
http://ext4.wiki.kernel.org/index.php/OLS-bof-2007-minutes_OLS_2007
http://ext4.wiki.kernel.org/index.php/Minutes10-01-2007

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

"Currently Delalloc only works for writeback mode. Implementation for ordered mode would be tricky, because need to use bufferheads."

lets hope they get it fixed for ordered mode which is default for ext3/4.

If they dio this bug should be fixed for hardy

Revision history for this message
Martin (martin615) wrote :

/me starts lurking around at <email address hidden> :)

Another question. Do any of the other indexers out there work around this problem somehow? And if so, how?

I mean, the problem is many small write():s and blocks being allocated directly during write() instead of at page flush time, right? Can't you merge several write operations together? Or maybe you already do that "1000-5000 words at a time"? If not, could the on disk format be changed to allow for merging several write():s together?

Also, is it really necessary to fsync() so often?

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

I dont know about other indexers

Trackers indexer is a hash table so words are written at random locations - its not possible to write more than one word at a time nor do we know whether certain words are stored sequentially as a result.

We rely on the kernel to order the writes elevator fashion so they can be written in a contiguous fashion - sadly that does not happen on ext3 but does if ~/.cache/tracker is mounted on XFS

We call fdatasync after every 1000-5000 words written to prevent pdflush starving the disk from other apps (this starvation appears to be a recent problem in kernels since 2.6.20)

I will add an option for fast merges which is approx 50% quicker without any fsyncs but will hog the disk when doing so on ext3.

Revision history for this message
Jamie Lokier (jamie-shareable) wrote :
Download full text (9.0 KiB)

Jamie McCracken wrote:
> I dont know about other indexers

Someone should see what Beagle's like, I guess.

> Trackers indexer is a hash table so words are written at random
> locations - its not possible to write more than one word at a time nor
> do we know whether certain words are stored sequentially as a result.
>
> We rely on the kernel to order the writes elevator fashion so they can
> be written in a contiguous fashion - sadly that does not happen on ext3
> but does if ~/.cache/tracker is mounted on XFS
>
> We call fdatasync after every 1000-5000 words written to prevent pdflush
> starving the disk from other apps (this starvation appears to be a
> recent problem in kernels since 2.6.20)

Ew. Those both look like nasty kernel limitations when writing to a
file in a scattered fashion.

I guess this is also why producing multiple smaller index files, then
having a merge-fest to make the large index file when it's all done,
is faster than writing everything to one hash index as you go. That
would naturally decrease the _size_ of seeks and hence seek time, as
smaller files span less of the disk.

Coming back to this:

> Trackers indexer is a hash table so words are written at random
> locations - its not possible to write more than one word at a time nor
> do we know whether certain words are stored sequentially as a result.

That doesn't seem like a good way to write an index. I'll try to put
together a different idea. (I've thought a lot about indexes for
databases: I'm thinking of writing yet another database engine).

You've found that SQLite doesn't perform too well without
index-merging, and neither does the other db you tried.

But a _good_ database implementation obviously should perform better
writing everything to one big file, instead of to multiple smaller
files in sequence then merging them.

Think about it: a good database would do the optimisation you're doing
automatically, because it's quite a common requirement (and benchmark)
in databases to load loads of data with randomly ordered index keys.

The only different would be it would hide the "smaller files" as
allocated zones inside a single big file that it's apparently using.
That bigger file's size would grow in steps. The disk seeking and I/O
would still have similar properties.

There's quite a few different algorithms,to make the index (in a
general database engine) be always accessible while it grows, despite
the internal periodic reorganisation, and to keep the reorganisation
efficient with disk seeks.

One way is to store the index as two tables internally: one B-tree,
and one sequential table (because reads can easily check both), and
write updates (in your case, each new "word") fairly sequentially up
to a certain amount, as write-ahead logging, then periodically merging
the log into the main B-tree using a batched method. If two logs are
permitted, one being merged and one being written, writing doesn't
have to stall during merging.

(Aside:
  => I know for a fact that several databases do this, retain a
     separate sequential index and B-tree index, when there's a
     constant stream of updates.

     You might find PostgreSQL, some MySQL backend, F...

Read more...

Revision history for this message
Miguel Rodríguez (migrax) wrote :

It may not be related, but we were having speed issues with tracker in some computers here, and it was fixed after adding the relatime mount option to ext3 indexed partitions.

HTH.

Revision history for this message
nowshining (nowshining) wrote :

never had an issue myself however i never did an dist-upgrade, I just input the gutsy sources and downloaded from there.

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

Hi Jamie Lokier,

Just to clear up a few issues:

1) All the Dbs (btree/hashes/ berkeley) only have api for updating one key/Value at a time AFAIK

2) In our case, the final index merge is not updating anything as its creating a new index therefore the disk space for hits is contiguous so regardless of what word we start from, space is allocated on a first come first served basis (IE its appended) so whatever word order we choose basically. The buckets in the header are random of course but they are fixed at first 1MB of index (256,000 buckets at 32bits each)

3) all major indexers Lucene (Beagle/strigi) and google use index merges as updating a big index is slow + it helps remove deleted entries and fragmentation. Without merges no index would be scalable

4) we dont wanna use multiple tables and sql dbs are not appropriate as they store the word twice (once in index and once in table) hence bloating things up

5) The high end oracle RDBMs has support for clustered tables which allow storage of stuff in key order (normal tables are appended and only indexes are sorted). These are not practical as they are even more painful to update due to massive relocation (in fact its far quicker to append records then copy to new table in sorted order).

6) performance problems with existing merges dissappear on XFS (they merge in seconds as opposed to minutes on EXT3). If EXT4 gets similar delayed allocation then hopefully we will see same too

Revision history for this message
Jamie Lokier (jamie-shareable) wrote :

Miguel Rodríguez wrote:
> It may not be related, but we were having speed issues with tracker in
> some computers here, and it was fixed after adding the relatime mount
> option to ext3 indexed partitions.

Back when I first tried it, I added noatime (before I knew about
relatime), and it did indeed help a lot.

Without noatime, the disk seeked heavily and the disk I/O activity
(according to the Gnome System Monitor applet) was always high while
indexing.

With noatime, the disk seeked a lot less, and the disk I/O activity
appeared to be nearly zero.

When I saw such a dramatic change, I thought that would mean the
problems affecting destkop application performance would be fixed.

However, despite the lack of much accounted-for I/O, and less noise
from the disk, for some reason all applications still ran really
slowly.

So, yes: noatime/relatime makes a good and essential different on ext3
partitions, and probably others. On latest kernels and newer
Trackers, O_NOATIME is I think, so you don't need those mount options.
(But relatime is generally a good choice anyway).

But: it's not enough by itself, at least on some of our systems.

-- Jamie

Revision history for this message
M (asobi) wrote :

Trackerd consumes a lot of CPU mainly when I'm downloading something. I notice it most obviously when using KTorrent. It's as if trackerd sees the change to some blocks of the file and has to rescan the whole thing, over and over again. Since the file is continually modified, as long as anything is downloading, trackerd will grab 100% of my CPU. This happens on desktops, laptops, even while on battery. It is a major problem.

Revision history for this message
Arthur (moz-liebesgedichte) wrote :

I'm on an old Athlon XP 1700+ with 512 MB RAM and never had such extreme "response blockers" in Feisty before. I've upgraded to Gutsy somewhere around beta release time. I'm not using trackerd and beagle is usually deactivated. But when issuing an 'aptitude dist-upgrade' GUI programs sometimes don't react for something like 8 seconds which never happened when on feisty.

Revision history for this message
Alexey Borzenkov (snaury) wrote :

I can confirm that this is unrelated to trackerd at all (I had it uninstalled as soon as I was constantly running into it), and while observing memory consumption lately I can see that although I have 1GB or RAM, system monitor shows that only around 600-700MB of it are being used, and used swap is raising with time. Now I observe that it is 700MB of memory and 597MB swap occupied somehow, even while update-manager downloads packages my letters come out now with a very noticeable delay. The weirdest thing is that system monitor shows only one big consumer, firefox-bin (which it shows as 254MB only), and others are no bigger than 50MB. Top on the other hand shows unrealistically huge numbers for most applications (Xorg: 247m VIRT 184m SWAP, synaptic: 188m VIRT, 150m SWAP, firefox-bin: 915m VIRT, 661m SWAP, etc). I don't understand what's going on, but as far as I've seen before, as soon as synaptic starts installing updates it's just a complete showstopper for me. Disk thrashing and memory usage seem to suggest that my system is actually swapping BADLY (as if I don't touch my PC while it's updating and then come back all applications slowly get unswapped, even after disk activity is gone).

Could it be a memory leak somewhere? Or what is it, what can it be? It's disk activity related (copying big files, installing packages, other activities do trigger slowdowns, even on -rt kernel), but in my case it also seems to be memory related, at least the more time passes, the more it starts looking like swapping issue.

My current uptime is 4 days 13:39, that's why it's gone too bad.

Also I'm using linux-rt on amd64, though I'm thinking to move back to linux-general after the next reboot.

Anyone else noticed something strange with memory consumption?

P.S. When I first moved from WinXP to Feisty I was so laughing at WinXP as Feisty took just above only quarter of my memory. Why is it suddenly so big now?

Revision history for this message
Miguel Martinez (el-quark) wrote :

I subscribe Alexey's comments.

I've just started my laptop and the only things I've done are installing
today's and yesterday's updates (54), create a tarball via Nautilus, check
my e-mail and edit a LaTeX file (didn't compile it). This is the output of
free:

$ free -m
              total used free shared buffers cached
Mem: 503 451 52 0 51 233
-/+ buffers/cache: 166 336
Swap: 517 33 483

And the uptime is...

$ uptime
  10:20:05 up 23 min, 2 users, load average: 0.02, 1.02, 1.37

Fortunately, since my gutsy system is installed from scratch, I didn't get
the incredible slowdown in responsiveness I used to get on a dist-upgraded
machine.

Revision history for this message
Jeff Fortin Tam (kiddo) wrote :

seems like three different issues here:
- a kernel I/O bug (which is the one I'm interested about, according to this thread's title
- tracker indexing
- memory problems according to the newest comments?

I really think #2 is unrelated and #1 is the problem that needs to be fixed, otherwise the rest seems like band-aids.
There's only 2-3 days left before the 7.10 final release, will we dist-upgraders all be forced to clean-install? And will those dist-upgrading from feisty at that release date be bitten by that bug? I am worried.

Revision history for this message
b5baxter (robert-vanrenewable) wrote :

I am also experiencing disk thrashing with the following characteristics.
- Mouse becomes very slow to respond
- Keyboard becomes extremely slow or unresponsive
- screen becomes unresponsive
- often requires a power down
- Tracker has been disabled
- Compaq Presario R3000 with AMD 64
- Install of Gutsy 7.1 on new partition
- Dual boots to Windows XP with no disk problems
- Fedora 7 was previously installed on same machine with no disk problems.
- System Monitor (and Htop) shows 100% CPU but all listed tasks are only using a small fraction of the CPU. Swap is not being used. (But may not be accurate because often screen freezes when the thrashing starts).

Revision history for this message
pinepain (pinepain) wrote :

maybe hdd works in one of PIO mode? this mode use CPU to accelerate io operations (something like that =), look more in wikipedia) try to set one of udma

Revision history for this message
unggnu (unggnu) wrote :

I can confirm this issue with Gutsy. It lets even stop audio and video for one second after a short period. An easy solution was to killall trackerd but the problem seems to become more bad in Hardy. I have made a dist-upgrade of current packages and on some configuring even the mouse lags pretty much which seems to be much more bad as in Gutsy.
Btw. I have an pata controller but sda drives so libata is used I guess.

Revision history for this message
Ravindran K (ravindran-k) wrote :

I strongly think that this can be due another DMA related issue. I observe that DMA is not enabled on my HDDs and I'm unable to enable themusing hdparm. I have no idea how to accomplish the same using sdparm As a result, the drives are running slower.

Apparently, this due to the conversion of HDX to SDX (hda to sda) in recent Ubuntu releases. Not sure when this started, There are other bug reports related to this https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/110636
(I'm trying the Temp fix suggested in that ...).
Please check whether you all also observer something similiar.

Some info below on messages I get:
____________________________________________________

 hdparm -i /dev/sda

/dev/sda:

 Model=QUANTUM FIREBALLlct15 20 , FwRev=A01.0F00, SerialNo=613024132415
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=32256, SectSize=21298, ECCbytes=4
 BuffType=DualPortCache, BuffSize=418kB, MaxMultSect=16, MultSect=?16?
 CurCHS=17475/15/63, CurSects=16513875, LBA=yes, LBAsects=39876480
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4
 DMA modes: mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 *udma2 udma3 udma4
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: ATA/ATAPI-5 T13 1321D revision 1: ATA/ATAPI-1,2,3,4,5

 * signifies the current active mode

root@homeserver:~# hdparm -i /dev/sdb

/dev/sdb:

 Model=ST380011A , FwRev=8.01 , SerialNo=4JV1QKYD
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=2048kB, MaxMultSect=16, MultSect=?16?
 CurCHS=17475/15/63, CurSects=16513875, LBA=yes, LBAsects=156301488
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4
 DMA modes: mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 *udma2 udma3 udma4 udma5
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: ATA/ATAPI-6 T13 1410D revision 2: ATA/ATAPI-1,2,3,4,5,6

 * signifies the current active mode

root@homeserver:~#
____________________________________________________
root@homeserver:~# hdparm -d1 -X66 /dev/sda

/dev/sda:
 setting using_dma to 1 (on)
 HDIO_SET_DMA failed: Inappropriate ioctl for device
 setting xfermode to 66 (UltraDMA mode2)
SG_IO: bad/missing ATA_16 sense data:: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 HDIO_DRIVE_CMD(setxfermode) failed: Input/output error
root@homeserver:~# hdparm -d1 -X66 /dev/sdb

/dev/sdb:
 setting using_dma to 1 (on)
 HDIO_SET_DMA failed: Inappropriate ioctl for device
 setting xfermode to 66 (UltraDMA mode2)
SG_IO: bad/missing ATA_16 sense data:: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 HDIO_DRIVE_CMD(setxfermode) failed: Input/output error

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

I have made some tests on different notebooks. I have made a dist upgrade of the 32-Bit Version of feisty and a fresh install of the 32-Bit and 64-Bit version of Gusty. The disc performance is bad. It occurs, when an program accesses the hard drive heavily. It's not a tracker problem. The resume process of an 512MB RAM of the virtual machine (VMWare Workstation 6) last about 5 minutes, on Feisty about 1 minute. Sometimes the mouse freezes for two seconds on heavy disc access.

The problem occurs on my sata and pata machine.

ThinkPad R50p - Pentium M (1,7GHz) - 2GB – PATA
ThinkPad T61p – Core2 Duo 7700 (2,4GHz) – 4 GB – SATA

On my old machine Gusty is unusable, that's why I reinstalled Feisty. There are no problems under Feisty yet. Gusty on the new machine is like using Vista. Gnome needs a lot of time to start and I am missing the fast response I know from Feisty. On Gusty I starts Firefox and it takes about 8 second before the firefox window pops up. My old Pentium M seems to be faster (faster response) than my new machine with Gusty.

I think it's not an DMA problem, because I get good results with hdparm -tT /dev/hda.

/dev/sda:
 Timing cached reads: 8604 MB in 1.99 seconds = 4315.53 MB/sec
 Timing buffered disk reads: 136 MB in 3.01 seconds = 45.17 MB/sec

And I get good read and write results on the hard disc at 800MHz (lowest frequency), when there is no other disc access.

sudo dd if=/dev/sda1 of=/dev/null bs=1M count=1000
1048576000 Bytes (1,0 GB) kopiert, 18,7464 Sekunden, 55,9 MB/s

dd if=/dev/zero of=test bs=1M count=1000
1048576000 Bytes (1,0 GB) kopiert, 17,0126 Sekunden, 61,6 MB/s

The sync command last about one second.
I am using ext3 as file system and I don't think it's an ext3 problem, because there aren't any problems without concurrent access.

Filesystem features: has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file
Filesystem flags: signed directory hash

I am using now the newest kernel, but same problems on all other kernels.
2.6.22-14-generic #1 SMP Tue Dec 18 05:28:27 UTC 2007 x86_64 GNU/Linux

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

I have made some tests with feisty and gutsy.
I connected my SATA drive on the usb port and testet gutsy on the same machine. Now I get a bag read and write performance of about 20MB/s instead of 50MB/s. But the system is now faster. My system consumes much more CPU power, but even thought every program start faster and I get faster response times, even when the harddisk is heavily used and the cpu consume is about 100%.

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

I have copied two big files concurrent (~15% bigger than my memory) with dd form one ext3 (xfs) to another ext3 (xfs) partition (same hard disk). I tried a block size of 10 bytes and 100 bytes for the copy operation. Tracker is disabled. My disc performance is about 50 - 60 MB/s (USB2 ~20MB/s), when I copy one big file form /dev/zero to disc. The CPU consume is higher in Gutsy and Hardy. Perhaps there is a different caching algorithm implemented since gutsy?
I always used a fresh install with all updates.

Feisty:
bs=100 - xfs=~10MB/s - ext3=~9MB/s
bs=10 - xfs=~6MB/s - ext3=~6MB/s

Gusty:
bs=100 - xfs ~6MB/s - ext3 ~3MB/s (sometimes I get results about 9MB/s for both file systems)
bs=10 - xfs ~3MB/s - ext3 ~1MB/s

Hardy:
bs=100 - xfs=~7MB/s
bs=10 - xfs=~4MB/s

bs=100 - ext3(usb2)=~6MB/s
bs=10 ext3(usb2)=~2,5MB/s

Revision history for this message
pinepain (pinepain) wrote :

hi

strange, it is too slow even for coping from one partition to another on
the same hd. it's look like some hardware isn't configured properly.
maybe u should try to manually set one of UDMA mode (max from available,
but first try set max UDMA with software) (not PIO). it helps me on
feisty and also works nice in gutsy. btw, it works 4 me fine on ext2 as
well as on ext3.

also notice, u can have USB2 interface, but not true USB2 speed. btw, it
will took some resources while coping from /dev/zero.

good luck.

Revision history for this message
Alexey Borzenkov (snaury) wrote :

I wonder if Ben Collins can give us some status update on what is being done to fix this bug? Is he still working on this bug?

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

I have tried the vanilla kernel (2.6.22-14) on Gutsy. Now I get better disc performance, (ext3 / bs=10 ~4MB/s / bs=100 7MB/s), but desktop responsiveness becomes worse.
I have tried Hardy too for a while and recognise, that the responsiveness of the desktop is sometimes worse than under gutsy, even when there is no or light disc access only.
Sometimes the gnome menu needs about 2 seconds before it appears. And there are only logos and metadata to load. This are the problems as in gutsy and especially in gusty with 2.6.22 kernel.
Hdparm reposts, that UDMA is on. And there is only iowait cpu consume on lowest cpu frequency and a great disc performace, when coping a normal blocksize (>= 4k) with dd.
Compiz is disabled on every of my installation. I use a 64-bit version of gutsy and hardy.
I don't know the internal design of the linux kernel, so it's only a guess of mine. There must be a bottleneck, which is especially caused by high disc access, but also occurs on other activities. Perhaps interrupt handling. Powertop reports ~400 awaking of the keyboard while copying to files from one partition to another partition with a block size of 4k and writing this text. And I am writing only maximal two or three letters per second. And the count is only as high, when coping the files. Without disc access there are only about 200 awaking, while writing in the same speed.

With two writing and two reading disc access
  38,9% (445,7) <interrupt> : PS/2 keyboard/mouse/touchpad
  17,5% (200,7) <interrupt> : extra timer interrupt
  14,5% (165,8) <interrupt> : libata
   8,8% (100,5) <interrupt> : uhci_hcd:usb1, eth0

Without high io access
  31,2% (213,6) <interrupt> : extra timer interrupt
  25,6% (175,2) <interrupt> : PS/2 keyboard/mouse/touchpad
  14,7% (100,5) <interrupt> : uhci_hcd:usb1, eth0
   8,9% ( 60,8) <interrupt> : uhci_hcd:usb3, ahci, yenta, nvidia

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Thomas,

Care to comment which version of the Hardy kernel you are running (cat /proc/version_signature)? Also are you testing on a fully uptodate Hardy install or are you just running the hardy kernel from say a Gutsy install?

Note that we'll keep this report open against the actively developed kernel bug against 2.6.22 this will be closed. Thanks.

Changed in linux:
status: New → Incomplete
Changed in linux-source-2.6.22:
status: In Progress → Won't Fix
Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

Hi Leann,

I have tried a kernel from kernel.org under Gutsy, because I cannot work on my machine anymore.
The interrupt issue seems to be a problem of my kernel-build.

With two writing and two reading disc access under Hardy 2.6.24-11-generic:
  27,1% (173,7) <kernel IPI> : Rescheduling interrupts
  20,4% (130,4) USB device 1-1 : BCM2045B (Broadcom Corp)
  15,6% (100,0) <interrupt> : uhci_hcd:usb1
  10,3% ( 66,0) <interrupt> : libata
   9,5% ( 60,9) <interrupt> : uhci_hcd:usb3, yenta, nvidia
   5,7% ( 36,8) dd : blk_plug_device (blk_unplug_timeout)

Hardy is a fresh test installation and with all updates

The problem was there under the 2.6.24-10 kernel.
2.6.24-10-generic #1 SMP Fri Feb 22 18:26:06 UTC 2008 x86_64 GNU/Linux

Now I try the 2.6.24-11 kernel.
2.6.24-11-generic #1 SMP Fri Feb 29 21:26:31 UTC 2008 x86_64 GNU/Linux
(Ubuntu 2.6.24-11.17-generic)

I have only made some tests on this kernel, but there are hang-ups, when there are disc activities. E.g. switching a desktop last 2 seconds, the response of selecting icons on the desktops is executed after 2-4 seconds. Sometimes the main menu stays still there and freezes after a program has been started for many seconds. Firefox hangs for 20 - 30 seconds (i think it' s a firefox problem).
The problem does not occur regular and there are periods when the systems works smooth. But from time to time (20s - 10min) the problem occurs every few seconds.

Can I make something to help solve the problem?

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

Hi Leann,

I have made a mistake in the kernel versions. I am using the "Ubuntu 2.6.22-14.52-generic" kernel under gusty and tried the kernel linux-2.6.24.2 from kernel.org, but the desktop responsibility was not good (I think it was an configuration mistake of my kernel build).
I tried the linux-2.6.24.2 from kernel.org, because I had no desktop responsiveness problems under ArchLinux on the same machine and they use the 2.6.24.1 kernel (now it's 2.6.24.3). (http://www.archlinux.org/packages/13318/)

The result of the post from 2008-03-04 are made under gusty "Ubuntu 2.6.22-14.52-generic".

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

I can easy reproduce the problem using, when execute the following command on hardy (2.6.24-12.22-generic). All updated. I have not tested it under gutsy.

dd if=/dev/zero of=test1 bs=4k count=250000 & \
dd if=/dev/zero of=test2 bs=4k count=250000 & \
dd if=/dev/zero of=test3 bs=4k count=250000 & \
dd if=/dev/zero of=test4 bs=4k count=250000 & \
dd if=/dev/zero of=test5 bs=4k count=250000 & \
dd if=/dev/zero of=test6 bs=4k count=250000 & \
dd if=/dev/zero of=test7 bs=4k count=250000 & \
dd if=/dev/zero of=test8 bs=4k count=250000 &

The firefox freezes almost every time. Other application (evolution) can be stooped too, but it's not as easy as using firefox.
If I want to switch the desktop by key combination, sometimes it is executes after 10 seconds.
From time to time I can produce an complete freeze of gnome (input events?). All applets (like systemmonitor, cpu-frequency) are working correctly, but I cannot move any windows, using the mouse or keyboard. I have to switch to an console (Ctrl+Alt+F1) and kill all dd processes as root. After killing the dd processes, gnome and all input events work.
The average system load is continuesly climbing. While it reaches a value about 8, the klogd usages 100% of the cpu (I think the daemon crashes).

Changed in linux:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → Medium
status: Incomplete → Triaged
Revision history for this message
exactt (giesbert) wrote :

this looks like a dup of #43484

Revision history for this message
Francisco Borges (francisco-borges) wrote :

On Sun, Apr 13, 2008 at 8:19 PM, exactt <email address hidden> wrote:
> this looks like a dup of #43484

Perhaps it is.

But FWIW I would just like to point out that:

1. I have the same case as many others here (heavy disk IO -> poor
system responsiveness)

2. However, unlike every note on LP #43484, I am running reiserfs, and not ext3.

[...]

Is anybody experiencing this bug running with "data=writeback"?

Cheers,
--
Francisco

Revision history for this message
Dym (dmarszal) wrote :

Latest Hardy, while copying large file or other heavy I/O operation load peaks to 8.0 and desktop becomes unresponsive, You can see this well in Firefox.

uname -a
Linux dark-laptop 2.6.24-16-generic #1 SMP Thu Apr 10 13:23:42 UTC 2008 i686 GNU/Linux

hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads: 3918 MB in 2.00 seconds = 1962.06 MB/sec
 Timing buffered disk reads: 138 MB in 3.01 seconds = 45.88 MB/sec

Revision history for this message
Dym (dmarszal) wrote :

Booting Hardy with old 2.6.22 kernel fixes the problem. Load avg after copying over 4 GB no more than 3.3.

uname -a
Linux dark-laptop 2.6.22.7 #4 SMP Fri Oct 19 16:03:46 CEST 2007 i686 GNU/Linux

hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads: 6546 MB in 1.99 seconds = 3286.52 MB/sec
 Timing buffered disk reads: 136 MB in 3.01 seconds = 45.18 MB/sec

My config is HP 6710b laptop.

Revision history for this message
Jamie McCracken (jamiemcc-blueyonder) wrote :

The kernel needs to throttle the process thats doing a large heavy write rather than stalling every other process thats trying to read/write to disk

Im surprised this has not been fixed at the kernel yet ):

Revision history for this message
Sam Kimbrel (kimbrel) wrote :

I can confirm this on a fresh install of 8.04, kernel 2.6.24-16-generic.

Any sustained disk I/O causes other apps to become unresponsive.

I don’t get how this was not caught in beta, because it has the capability to render my system unusable.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Enable Hardy -proposed and install the -17.31 kernel which has SCHED_CGROUPS enabled. I believe it will have an effect on interactivity responsiveness.

Changed in linux:
status: Triaged → Confirmed
Revision history for this message
Francisco Borges (francisco-borges) wrote :

On Mon, May 5, 2008 at 4:02 PM, Tim Gardner <email address hidden> wrote:
> Enable Hardy -proposed and install the -17.31 kernel which has
> SCHED_CGROUPS enabled. I believe it will have an effect on interactivity
> responsiveness.

Just to help other people that perhaps were puzzled, like I was, by
Tim's comments.

The package he was talking about is this one:
https://launchpad.net/ubuntu/hardy/+source/linux/2.6.24-17.31

Revelant bugs that are probably related to this one:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/188226 (see the
description of this one)

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/218516

[...]

@Tim: thanks for the tip, I will be trying the package when I get home.

--
Francisco

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

The kernel 2.6.24-17 does not help. I think it's becomes even worse with the new kernel.
I tried Debian Lenny and Fedora Core 8. The problem exists in these distros too, but it's much better than in gutsy. I am using ext3 in journal mode. I tried xfs, but same problem. I had never performance problems with ext3 under feisty.

Hardy is unusable for me. It's fine for surfing, writing documents or reading pdfs. It's awful for working with vmware or disk intensive apps.

I just finished a backup (2.6.24-16) with rdiff-backup (pybackpack) on a local usb2 harddisk. I think it's related with this problem. Here is the log.

StartTime 1209878093.00 (Sun May 4 07:14:53 2008)
EndTime 1210019188.36 (Mon May 5 22:26:28 2008)
ElapsedTime 141095.36 (39 hours 11 minutes 35.36 seconds)
SourceFiles 530196
SourceFileSize 171720983690 (160 GB)
MirrorFiles 501408
MirrorFileSize 139162927405 (130 GB)
NewFiles 240477
NewFileSize 86693084491 (80.7 GB)
DeletedFiles 211689
DeletedFileSize 57296675851 (53.4 GB)
ChangedFiles 103582
ChangedSourceSize 33585642382 (31.3 GB)
ChangedMirrorSize 30423994737 (28.3 GB)
IncrementFiles 555748
IncrementFileSize 32637905299 (30.4 GB)
TotalDestinationSizeChange 65195961584 (60.7 GB)
Errors 66

The last backup under gutsy takes about two hours.

StartTime 1205655350.00 (Sun Mar 16 09:15:50 2008)
EndTime 1205663714.77 (Sun Mar 16 11:35:14 2008)
ElapsedTime 8364.77 (2 hours 19 minutes 24.77 seconds)
SourceFiles 501408
SourceFileSize 139162927405 (130 GB)
MirrorFiles 497434
MirrorFileSize 147995632923 (138 GB)
NewFiles 5124
NewFileSize 316728530 (302 MB)
DeletedFiles 1150
DeletedFileSize 9327300673 (8.69 GB)
ChangedFiles 1648
ChangedSourceSize 41242024637 (38.4 GB)
ChangedMirrorSize 41064158012 (38.2 GB)
IncrementFiles 7924
IncrementFileSize 343516700 (328 MB)
TotalDestinationSizeChange -8489188818 (-7.91 GB)
Errors 34

Revision history for this message
Rocko (rockorequin) wrote :

2.6.24-17 doesn't fix it for me either. The desktop still becomes unusable when copying large files, or after some time using vmware or virtualbox. It's unbelievable that this bug survived to release, especially an LTS release.

When's -18 coming out?

Revision history for this message
Gate (gatewarstrekme) wrote :

This is happening to me on Hardy with 2.6.24-16 when copying large numbers of files.

  Weird thing is that Compiz remains perfectly responsive (desktop switching) but every other application including Firefox and and the terminal emulator continue to remain unresponsive for me minutes to *hours* after the disk I/O has finished (copying a few thousand files from USB to HDD using cp).

   This happens despite processor and RAM usage both staying under 40%.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Guys,

If you are willling, just to see if it makes a difference, care to test the upcoming Intrepid Ibex 8.10 kernel? It was most recently rebased with the upstream 2.6.25 kernel and is currently available in the following PPA:

https://edge.launchpad.net/~kernel-ppa/+archive

If you are not familiar with how to install packages from a PPA basically do the following . . .

Create the file /etc/apt/sources.list.d/kernel-ppa.list to include the following two lines:

deb http://ppa.launchpad.net/kernel-ppa/ubuntu hardy main
deb-src http://ppa.launchpad.net/kernel-ppa/ubuntu hardy main

Then run the command: sudo apt-get update

You should then be able to install the linux-image-2.6.25 kernel package. After you've finished testing you can remove the kernel-ppa.list file and run 'sudo apt-get update' once more. Please let us know your results. Thanks

Revision history for this message
Reeve Yang (reeve-yang) wrote :

Though I'm not ubutu user, but I do have the same problem while upgrading vanilla kernel from 2.6.17.4 to 2.6.22.15. Therefore it should not be the ubutu specific problem. Here is bonnie++ test result on old and new kernel respectively. The disk throughput is improved 10% but CPU utilization shoot up to 95% from 34%, for the same amount of I/O. Could someone help to point to any patches, or fixes on this issue?

## ###############Linux 2.6.17 #1 SMP Tue May 6
##################################

Version 1.03c ------Sequential Output------ --Sequential Input- --Random-
                   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP
K/sec %CP /sec %CP
ib-10-34-68-2 2016M 23989 35 44123 6 16360 1 21823 28 43090 1 172.7 0
                   ------Sequential Create------ --------Random Create--------
                   -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
             files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
                16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
ib-10-34-68-2.infoblox.com,2016M,23989,35,44123,6,16360,1,21823,28,43090,1,172.7,0,16,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++

## ###############Linux 2.6.22.15 #1 SMP Tue May 6
##################################

bonnie++ -d /storage -s 2016M -u root
                   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP
K/sec %CP /sec %CP
ib-10-34-68-2 2016M 26078 94 52117 17 23596 5 27172 86 56402 4 160.2 0
                   ------Sequential Create------ --------Random Create--------
                   -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
             files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
                16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
ib-10-34-68-2.infoblox.com,2016M,26078,94,52117,17,23596,5,27172,86,56402,4,160.2,0,16,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++
###################################################################################

Revision history for this message
Rocko (rockorequin) wrote :

I booted into kernel 2.6.25.1, copied a 2.8 GB file from one internal partition to another and tried using the desktop during the copy.

I *think* responsiveness is improved. Firefox didn't grey out at all during the copy, and I could open other nautilus windows. But about 2GB into the copy FF started taking a long time to respond.

It's hard to be sure because a lot of new stuff in the new kernel was broken: in particular, USB works, but transfers data intermittently. So I couldn't use the USB mouse because it moved too jerkily. I also couldn't test desktop responsiveness while copying large files to an USB drive.

Revision history for this message
Ravindran K (ravindran-k) wrote :

Greetings to all,

I ran the below bonnie++ and my system was responding fine during the tests. Please see and let all know if the results make sense.

Linux ravi-desktop 2.6.24-17-server #1 SMP Thu May 1 15:05:55 UTC 2008 i686 GNU/Linux

Version 1.03b ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
ravi-desktop 8G 38301 58 38881 8 16309 4 18669 34 31416 4 91.5 0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
                 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
ravi-desktop,8G,38301,58,38881,8,16309,4,18669,34,31416,4,91.5,0,16,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++

Revision history for this message
exactt (giesbert) wrote :

@Rocko
maybe this (http://shaver.off.net/diary/2008/05/25/fsyncers-and-curveballs/ ) explains your firefox behaviour after the 2GB...?

Revision history for this message
Rocko (rockorequin) wrote :

@exactt: Yes, it could have been that. Interesting article.

On another note, after reading bug #188226, I tried the same test using the 2.6.24-17-server kernel and the desktop just flew! No pauses, no windows greying out, and FF3 was responsive all the way through. It's a pity that the server kernel doesn't configure the sound card etc, otherwise I'd just use it instead of the generic one.

And just to be sure there weren't any other changes that had improved things, I retried the test under 2.6.24-17-generic and FF3 still greyed out during the copy.

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

The ppa kernel does not fix the problem. It is only delayed. I have made some tests on a fresh hardy installation with the kernel Ubuntu 2.6.20-15.27-generic (feisty repository), Ubuntu 2.6.24-17.31-generic and Ubuntu 2.6.25-1.2ubuntu6-generic. I always restarted the system and started the tests directly after the login. To simulate a disc intensive application, I copied eight 1GB files from /dev/zero to the harddisk. It has a similar behavior as working with vmware and eclipse.
Every application was started only once on every tests (there was always a reboot between two gimp tests or two firefox tests). Sometimes it takes a lot of time to browse to /usr/share/backgrounds and to open a image or to select an tool under the 24 and 25 kernel. And I have to wait 5s - 10s after every mouse action. But this behavior is not deterministic, and occurs only every second or third test. You can see it on the firefox test.

Has someone recognized the poor desktop responsiveness on a SCSI-System?

And I don't think, that this problem is caused by the kernel only, because the desktop freezes occurs with the feisty kernel under hardy too. Perhaps something with xorg or gnome?

test results:

kernel
20 / 24 / 25

start gimp at load avg 6
10s / 30s / 30s

start gimp at load avg 8
10s / 23s / 40s

start firefox at load avg 10
17s / 44s / 44s
load four pages (saved session) after start
7s / 27s / 22s

start firefox as load avg 14
15s / 20s / 60s
load four pages (saved session) after start
7s / 20s / 20s

starting oowriter at load avg 12
15s / 20s / 40s

Revision history for this message
Anil (anilkumar-as) wrote :

Thomas, I did similar tests with hardy. I had the same results. But a fresh installation is working just fine. This happened only when I upgraded to Hardy. I wonder if it is related to libata bug 195221 (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/195221)
You can see Ravindran's comment (https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/131094/comments/760) It show udma2 being selected instead of udma4.
Can others confirm this ?

Revision history for this message
Rocko (rockorequin) wrote :

hdparm -i shows that Hardy is configuring the drives on both my laptops correctly for udma5 (100 MB/s), so I don't think that is the problem.

Revision history for this message
Francisco Borges (francisco-borges) wrote :

On Wed, May 28, 2008 at 8:42 AM, Rocko <email address hidden> wrote:
> hdparm -i shows that Hardy is configuring the drives on both my laptops
> correctly for udma5 (100 MB/s), so I don't think that is the problem.

Same here. My laptop runs with udma4 but still presents the
"responsiveness" problem.

--
Francisco

Revision history for this message
Anil (anilkumar-as) wrote :

ya, you are right, I added one of the patches given in the link which forced udma4 selection. But the problem still exists.
Can anybody suggest me an alternate other than fresh install. I have a laptop with no cdrom and limited internet.A kernel switch to 2.6.22 on hardy didn't solve the problem. Any older kernel that might work ? Does 2.6.20 work fine ?
Thomas's tests shows 2.6.20 working fine. Is it recommended to use this kernel on hardy ?

Revision history for this message
Ravindran K (ravindran-k) wrote :

Hi ppl.. Pls try the server kernels (eg. 2.6.24-17-server ) and check whether you have such issues.

Revision history for this message
Francisco Borges (francisco-borges) wrote :

On Thu, May 29, 2008 at 6:10 PM, Ravindran K <email address hidden> wrote:
> Hi ppl.. Pls try the server kernels (eg. 2.6.24-17-server ) and check
> whether you have such issues.

I just booted with 2.6.24-17. It appears to solve the problem.

My usual test is to start copying large files (to an external disk),
and try to show/hide Yakuake. Which to my surprise, doesn't freeze mid
way through the screen.

Making multiple threads read from /dev/zero (see below), with atop
reporting disk busy at 99%, I still have a responsive system.

Took this from an earlier email from Thomas Pi:

dd if=/dev/zero of=test1 bs=4k count=250000 & \
dd if=/dev/zero of=test2 bs=4k count=250000 & \
dd if=/dev/zero of=test3 bs=4k count=250000 & \
dd if=/dev/zero of=test4 bs=4k count=250000 & \
dd if=/dev/zero of=test5 bs=4k count=250000 & \
dd if=/dev/zero of=test6 bs=4k count=250000 & \
dd if=/dev/zero of=test7 bs=4k count=250000 & \
dd if=/dev/zero of=test8 bs=4k count=250000 &

--
Francisco

Revision history for this message
Anil (anilkumar-as) wrote :

It's working fine with 2.6.24-17-server. Even with udma2 selected, the responsiveness is great.
The one difference is saw in the config of generic and server kernel that might me affecting is this

CONFIG_DEFAULT_IOSCHED="cfq" CONFIG_DEFAULT_IOSCHED="deadline"

Revision history for this message
Tobias Oetiker (tobi-oetiker) wrote :

Anil,

Today Anil wrote:

> It's working fine with 2.6.24-17-server. Even with udma2 selected, the responsiveness is great.
> The one difference is saw in the config of generic and server kernel that might me affecting is this
>
> CONFIG_DEFAULT_IOSCHED="cfq"
> CONFIG_DEFAULT_IOSCHED="deadline"

you can switch the ioscheduler on the fly:

echo cfq >/sys/block/sda/queue/scheduler

echo deadline >/sys/block/sda/queue/scheduler

(instead of sda use the name of your disk devices)

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten
http://it.oetiker.ch <email address hidden> ++41 62 213 9902

Revision history for this message
Anil (anilkumar-as) wrote :

How do you know that scheduler is changed ?
I tried changing it, but didn't have much difference.

Revision history for this message
Tobias Oetiker (tobi-oetiker) wrote :

Anil

Do a cat on the file. the word in [...] is the active scheduler.

The reason I am interessted in this bug is that we are seeing
similar issues on file servers and have not been able to pin
them down reliably. We found tweaks here and there, but nothing
decisive. :-(

cheers
tobi

Today Anil wrote:

> How do you know that scheduler is changed ?
> I tried changing it, but didn't have much difference.
>
>

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten
http://it.oetiker.ch <email address hidden> ++41 62 213 9902

Revision history for this message
Anil (anilkumar-as) wrote :

Ok, so io scheduler not making any difference in both the kernels.
The new kernel made my touchpad useless now :(

Revision history for this message
Francisco Borges (francisco-borges) wrote :

On Thu, May 29, 2008 at 10:27 PM, Francisco Borges
<email address hidden> wrote:
> On Thu, May 29, 2008 at 6:10 PM, Ravindran K <email address hidden> wrote:
>> Hi ppl.. Pls try the server kernels (eg. 2.6.24-17-server ) and check
>> whether you have such issues.
>
> I just booted with 2.6.24-17. It appears to solve the problem.

I just saw that I wasn't entirely clear here. FWIW, what I meant to
say is that I had used

                                  2.6.24-17-server (server!)

and that It appeared to solve the problem.

Cheers,
--
Francisco

Revision history for this message
Ravindran K (ravindran-k) wrote :

hii ppl...greetings and TGIF..

my sincere apologies.. I forgot to mention 1 major boot setting which might influence. Please check the combined_mode= and elevator=deadline options enabled in my kernel. writeback option is for better write performance but i'm sure that it doesn't affect read performance.

title Ubuntu 8.04, kernel 2.6.24-17-server
root (hd0,0)
kernel /boot/vmlinuz-2.6.24-17-server root=UUID=b9f4c570-8b44-413d-bcc8-300f0a0890f9 ro combined_mode=libata clocksource=acpi_pm elevator=deadline rootflags=data=writeback splash vga=795
initrd /boot/initrd.img-2.6.24-17-server

Please try the same. Please note that to make the changes permanent and automatically for all future kernels.. you have modify:
# kopt=root=UUID=b9f4c570-8b44-413d-bcc8-300f0a0890f9 ro combined_mode=libata clocksource=acpi_pm elevator=deadline rootflags=data=writeback
and run update-grub

@tobiaz...anil.. thanks for helping me realize the same.

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

I tried the server (Ubuntu 2.6.24-17.31-server) kernel with all schedulers.
It's much better, but it's the performance like in gutsy. Working with vmware is still awful. Mouse freezes, text delays, long start times of apps. Feisty is two times faster on my old pentium-m machine, than my core2duo on gutsy or hardy (Ubuntu 2.6.24-17.31-server).

I have recognized, that this is not an hard drive issue only. I can reproduce the problem with high network io too.

Revision history for this message
Rocko (rockorequin) wrote :

I retried my 2.8GB copy tests on the -generic kernel after manually switching to deadline scheduling and back to cfq, and for me the desktop is definitely more responsive under deadline (note: I didn't try writeback or any other settings, just the scheduling).

In one test with deadline scheduling firefox didn't grey out at all, and in another it greyed out twice but only for half a second or so (and nautilus greyed out once for a couple of seconds). With cfq, firefox was greying out for five to ten seconds at a time. Irrespective of which scheduling I choose, the average throughput reported by nautilus is the same.

Thanks to Anil, Tobias, and Ravindran for the info about how to change the scheduling.

@Thomas Pi: I actually don't normally have problems with vmware-server 1.05 (XP runs at a similar speed to natively), but I find that occasionally the VM will get itself into a state where it thrashes the disk whenever I try to do something (even scrolling), and at this point it becomes unusable like you say. Sometimes vmware-server does this when I first boot the VM, in which case it takes forever to boot. Whenever it happens, a power reset from the VMWare menu fixes the problem. So maybe there's a separate bug in vmware that is making it look like this bug?

Revision history for this message
Anil (anilkumar-as) wrote :

Well is this some kinda problem with kjournlad ? It seems to have happened in past releases as well. Some googeling shows it has started with gutsy upgrade.

Can any body tell me why this bug is "Medium" ? This has made by system unusable. I think it is the same with you guys.

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

I build the generic kernel without the "Fair group CPU scheduler", "Tickless System (Dynamic Ticks)" and "High Resolution Timer Support". My systems seems to be faster. Can someone check it?
My application startup speed is now between feisty and gutsy/hardy. Firefox is still unusable on high io access.
I will try a 100Hz version and check some more options from the server kernel too.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Please notice also that Firefox 3 has a problem with fsync that makes it hang when I/O load is high, and this bug was not in Gutsy. So be careful when you compare slowness based only on Firefox.

Revision history for this message
Rocko (rockorequin) wrote :

Have there been any beneficial changes in the 2.6.24-18 kernel?

I installed it this morning (still using deadline scheduling) and just tried copying six 2GB files within the same ext3 partition. The desktop apps didn't slow down significantly or grey out at all (I was running FF3 RC1, a vmware VM, and Thunderbird). In fact, the "file operations" window greyed out briefly while I was opening a new nautilus window. In the past, the new nautilus window (and FF3 and Thunderbird) used to grey out instead.

So hopefully the new kernel is working better...

I also recently switched to the 32bit kernel, but I don't think the 32bit 2.6.24-17 kernel was any different from the amd64 one with respect to desktop responsiveness under disk I/O load.

Revision history for this message
Austin Lund (austin-lund) wrote :

Using the 2.6.24-19-generic in hardy-proposed with CFQ works fine for me now. Unzipping the kernel tarball and compiling the kernel don't affect responsiveness at all, and only very slightly in FF3.

Revision history for this message
Anil (anilkumar-as) wrote :

I have installed the new kernel 2.6.24-19-generic. Doesn't seem to make any difference, I still wait for a long time when i switch windows :(

Revision history for this message
Rocko (rockorequin) wrote :

I find that the desktop is much more responsive under heavy disk I/O with either 2.6.24-18-generic or 2.6.24-19-generic (64 bit and 32 bit) when you compare it to 2.6.24-17-generic and earlier. Both deadline and cfq scheduling work fine.

My test was to start my 2.8GB test file copy from one sda partition to another and then try opening lots of webpages in different tabs in FF3RC1, opening lots of new nautilus windows from the gnome 'Places' menu, and starting up a number of new apps. The desktop runs slower than if you aren't copying a large file, and once an open Thunderbird window greyed out briefly, but the desktop is definitely usable now, and it wasn't in the original Hardy release.

I do notice the following though: if you start the copy and leave everything alone for a few seconds (eg around 400MB, when the file operations window tells you its throughput) and then try to switch between windows, sometimes there's a delay in the desktop responding. It's just not as long as it used to be, and it only seems to happen once for me.

Revision history for this message
Tobias Oetiker (tobi-oetiker) wrote :

Not that this helps much for a laptop setup. But since I think that
the problem is more deeply rooted than this. I tried what happens
when the ext3 journal is kept on a fast external device ... It
seems to take the pressure of the vm, so that its fairness bugs do
not hurt much anymore.

http://insights.oetiker.ch/linux/external-journal-on-ssd.html

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten
http://it.oetiker.ch <email address hidden> ++41 62 213 9902

Revision history for this message
Ravindran K (ravindran-k) wrote :

Greetings..

I m convinced that I no longer face the issue.
However, after various simple tests I observe that I get best disk performance in kernel 2.6.24-18-server.
I guess the UI responsiveness should be good in this kernel as well.

In other kernels, either my SATA drives are faster but IDE drives run lot slower or vice versa. Attaching some logs.

Revision history for this message
Ravindran K (ravindran-k) wrote :
Revision history for this message
Ravindran K (ravindran-k) wrote :
Revision history for this message
Ravindran K (ravindran-k) wrote :
Revision history for this message
Ravindran K (ravindran-k) wrote :
Revision history for this message
Ravindran K (ravindran-k) wrote :

I get excellent performance in this custom kernel, but unfortunately unable to use VMWare 2.0 under this kernel. Sad :(

Revision history for this message
laga (laga) wrote :

Ravindran K schrieb:
> I get excellent performance in this custom kernel, but unfortunately
> unable to use VMWare 2.0 under this kernel. Sad :(
>
> ** Attachment added: "diskperf_2.6.25-rc8-custom0.txt Custom kernel"
> http://launchpadlibrarian.net/15828991/diskperf_2.6.25-rc8-custom0.txt
>
Can you post the .config? Or at least tell us what you changed in the
kernel?

Revision history for this message
Ravindran K (ravindran-k) wrote :

Yes. Sure.. Actually I tried to Customize the Kernel for my motherboard Intel DG33TL (+my older mobo ASUS p2bVT) removing all other unnecessary drivers.
here it is..

I was trying to enable 64-bit kernel and I have 4 Gb ram (older 32bit kernels show up only 3 GB). After I found the 2.6.24-x-server kernels, I stopped trying.
Moreover, as I said, the 2.6.25.x kernel, i'm unable to run VMWare.

Revision history for this message
Martin (martin615) wrote :

FWIW, delayed allocation was added for ext4 in the 2.6.27 merge window.

Revision history for this message
Tom Badran (tom-badran) wrote :

Does this mean this will likely ship with intrepid?

On Fri, Jul 25, 2008 at 5:34 PM, Martin <email address hidden> wrote:

> FWIW, delayed allocation was added for ext4 in the 2.6.27 merge window.
>
>

Revision history for this message
tomaszr (tomasz-rosinski) wrote :

I confirm, this is very hard life when disk i/o i all time occupied :(
i was copied some files (dvd.iso 4GB) and i cannot make nothing :( hard life....

what can i change or when this can be done?

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Ravindran K (ravindran-k) wrote :

I think the desktop reponsiveness is OK atleast for me

OT: OLD IDE performance IO still has come down in new kernel (2.6.27-1-server) [it was upto 45 MB/s in 2.6.26-5-server]

2.6.27-1-server

Date & Time:
Sat Aug 30 07:55:04 IST 2008
----------------------------------------------------------------------------------------------
SATA 250 GB HDD

/dev/sda:
 Timing cached reads: 8118 MB in 2.00 seconds = 4066.33 MB/sec
 Timing buffered disk reads: 252 MB in 3.02 seconds = 83.35 MB/sec
----------------------------------------------------------------------------------------------
IDE 160 GB HDD

/dev/sdb:
 Timing cached reads: 7034 MB in 2.00 seconds = 3523.05 MB/sec
 Timing buffered disk reads: 96 MB in 3.03 seconds = 31.71 MB/sec
----------------------------------------------------------------------------------------------
IDE 250 GB HDD

/dev/sdc:
 Timing cached reads: 6638 MB in 2.00 seconds = 3324.28 MB/sec
 Timing buffered disk reads: 230 MB in 3.02 seconds = 76.04 MB/sec
----------------------------------------------------------------------------------------------
USB 160 GB HDD

/dev/sdd:
 Timing cached reads: 6438 MB in 2.00 seconds = 3223.58 MB/sec
 Timing buffered disk reads: 100 MB in 3.02 seconds = 33.06 MB/sec

*************************************************************
2.6.26-5-server
Date & Time:
Wed Aug 27 20:42:10 IST 2008
----------------------------------------------------------------------------------------------
SATA 250 GB HDD

/dev/sda:
 Timing cached reads: 6746 MB in 2.00 seconds = 3378.94 MB/sec
 Timing buffered disk reads: 244 MB in 3.01 seconds = 80.97 MB/sec
----------------------------------------------------------------------------------------------
IDE 160 GB HDD

/dev/sdb:
 Timing cached reads: 6494 MB in 2.00 seconds = 3252.63 MB/sec
 Timing buffered disk reads: 138 MB in 3.01 seconds = 45.79 MB/sec
----------------------------------------------------------------------------------------------
IDE 250 GB HDD

/dev/sdc:
 Timing cached reads: 6600 MB in 2.00 seconds = 3305.97 MB/sec
 Timing buffered disk reads: 230 MB in 3.00 seconds = 76.57 MB/sec

******************************************************************

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

I was not able to test the alpha 5 on my notebook. I will start another try soon.

But I have a workaround for all, who cannot work on their systems. I am currently using Fedora 9 with the RHEL kernel (CentOS 2.6.18-92.1.10.el5) and have a speed up of 10 and more. It's great to have all advantage of a up to date user interface and tools, and a stable and fast kernel. Everything works fine on my one year old T61p. I have no sound problems with vmware. I can even use firefox at load average of 12 and more.

I think hardy users can use the debian kernel as well. There should be all modules available for the debian kernel too. At least as third party repository.

Perhaps the ubuntu team can put a kernel option with an older kernel in their repositories, as long as the problem is not solved.

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

Now I have made some tests with itrepid. The io wait time is lower and the throughput is higher with and without concurrent disc access as in hardy or gutsy. But the desktop responsiveness problem still exists.
The overall throughput of concurrent disc access is about 30% lower than on my 2.6.18 kernel.

When writing eight 2GB files concurrent on the disc, there is difference between the written data during the operation is up to 500 MB. All writing operations start at the same time. This difference is much lower (~200MB) under the 2.6.18 kernel

The kernel signature is Ubuntu 2.6.27-2.3-generic.

My tests results are only simulated once, because I was not able to use vmware or virtualbox and create a real working environment.

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

I have some new information on this topic. I tried to bypass the problem by using a fast SSD, but the desktop responsiveness becomes horrible. I think it's because I get only a write throughput of 20MB/s on sequential write access on the block devices. After some research, I got some new information. The problem is caused because there is no fair scheduling between read and write access.

https://fcp.surfsite.org/modules/newbb/viewtopic.php?viewmode=thread&topic_id=52598&forum=10&post_id=247938

After some more tests. I got these results.

# dd if=/dev/zero of=/dev/sda6 bs=1M count=1500
1572864000 Bytes (1,6 GB) kopiert, 57,5956 s, 27,3 MB/s
And poor desktop responsiveness.

# dd if=/dev/zero of=/dev/sda6 bs=1M count=1500 oflag=direct
1572864000 Bytes (1,6 GB) kopiert, 20,9958 s, 74,9 MB/s
And even firefox does not freeze.

Revision history for this message
Jeffery Davis (heavensblade23) wrote :

Still very much present in Ibex as of 10-22-2008.

Revision history for this message
Jeffery Davis (heavensblade23) wrote :

Changing to the deadline scheduler appears to alleviate this bug for the most part.

Revision history for this message
Bálint Magyar (balintm) wrote :

I can confirm that running Intrepid on a notebook with 512MB of RAM is much, much tolerable with elevator=deadline, mostly getting rid of the long pauses the heavy swapping caused.

Revision history for this message
isecore (isecore) wrote :

Running Intrepid Ibex, same issue as in Hardy. Desktop goes numb when heavy disk I/O occurs. Changing scheduler to deadline makes it slightly more tolerable at the cost of applications and desktop feeling slower. Unacceptable. Changing scheduler to elevator=as makes system intolerably sluggish.

Revision history for this message
Jeffery Davis (heavensblade23) wrote :

Things I can reasonably confirm are not the cause:
-Hardware drivers (I've had this problem across several different machines)
-Versions of Ubuntu (I've been having this bug at least since Hardy and I believe before that)
-Different distros (I found a forum thread where people were having the same issue on Fedora)
-Dist-upgrade (I always install fresh)
-Search Indexing (Problem occurs even with indexing completely disabled...if there is an issue, it's a symptom and not a cause)
-Firefox versions (Fsync bug was fixed a long time ago, and people have tried reverting to Firefox 2 without success)
-Filesystem in use (People have reported the problem on ReiserFS as well as ext3)

Reasons I think it's the scheduler:
-It was reported switching schedulers helped this problem on Fedora which also uses CFQ.
-I've tried both deadline and anticipatory schedulers and then copied large numbers of files and the system was 95% more responsive.
-People first started reporting this problem on the Ubuntu forums around the time CFQ was switched to the default scheduler, which I believe was around the Edgy/Feisty timeframe.
-People have reported the problem doesn't exist on the 'server' version of Ubuntu, which uses the deadline scheduler instead of CFQ.

There may be other issues at work that cause similar unresponsiveness, but I think the scheduler thing is it for most people.

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

I don't think, that it is not a scheduler problem only. Switching the scheduler does not result in significant changes. The desktop responsiveness is still bad with all schedulers. The server kernel is less affected, but the desktop responsiveness is still bad.

Fedora and ArchLinux are less affected, but the problem still exists. It's perhaps like using the server kernel in Ubuntu.

For me the problem first appears in Gutsy. Feisty has a great desktop responsiveness on heavy io on my old machine with the cfq scheduler. CentOS with the 2.6.18 kernel and the cfq scheduler works really great.

Once I have produced the same issue with a crash of network manager while transferring a big file through a wlan connection. I think there was not logging io, but I am not sure.

Are there only a few people affected by the problem? It makes my system nearby unusable, while e.g. updating Intrepid. There are sometimes desktop freezes for more than 20 seconds on my machine.

Revision history for this message
AvitarX (ddwornik) wrote :

I have been looking at this and running informal (read sloppy) tests today.

For the people upgrading from older systems perhaps the problem is with relatime in the fstab being missing.

I was testing by running "vi 123" while "dd if=/dev/zero of=~/test2 bs=2M count=2048&"

I tried all 4 schedulers and all took a long while (30 secs+) to an active vi screen.

Now it is below 10.

I tried different schedulers and they were all slow. Now CFQ (default?) is what I am using (have not compared if less than 10 seconds will drop to less than 5).

The system-cleaner (Applications --> System Tools --> System Cleaner) identified the lack of relatime for me.

It could also simply be rebooting that caused the speed up too though.

It is only speculation that the upgrade missed that, but I may have removed it myself by accident at some point.

To change schedulers without rebooting:

as root (run "sudo bash", or someone correct how to redirect while using sudo)

echo "scheduler name" > /sys/block/sdx/queue/scheduler (where x is the drive e.g. /sda).

for a list of schedulers and what is selected run cat on the same file

example:
$cat /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq]

I am not at my desktop, so I can't confirm weather this is simply for command line and I will see similar problems to Jamie above.

In a few days I can follow-up if long uptime (for a desktop) is the problem.

Revision history for this message
Irrlicht (irrlicht) wrote :

irrlicht@home:~$ cat /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq]
irrlicht@home:~$ sudo echo deadline > /sys/block/sda/queue/scheduler
bash: /sys/block/sda/queue/scheduler: Permission denied

I am copying a dvd full of data to my local drive via cp. My mouse and all Desktop apps are lagging like if I would use glx without the proper drivers (slow responsiveness and screen update). Using Kubuntu 8.10 amd64... Can I help in any way? Or someone has an idea how to change the scheduler in my case?

Cheers,
Daniel

Revision history for this message
AvitarX (ddwornik) wrote :

I don't know how to redirect with sudo.

I had to run:
sudo bash

this gave a root prompt, then I could run:
echo deadline > /sys/block/sda/queue/scheduler

without running sudo again (running bash with sudo makes the prompt root until you type exit).

I am curious about your fstab though, that was what really changed things for me.

Also, to you have lots of small files or a few big ones? I can try testing. It was primarily running updates, and the daily file indexing that killed me.

Revision history for this message
Irrlicht (irrlicht) wrote :

No there are only big files on the DVD. 4.5 GB of files ~250 MB. I changed to all available schedulers now, it doesn't change anything. I noticed this the first time today, so I searched Google and found this bug.

What did you change in your fstab? Is it fixed for you?

My /etc/fstab:
root@home:~/# cat /etc/fstab
# /etc/fstab: static file system information.
#
# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc defaults 0 0
# /dev/sdb2
UUID=3a3e3337-0416-46f4-9336-253bd7dfbeac / ext3 relatime,errors=remount-ro 0 1
# /dev/sdb1
UUID=f3eba35a-325c-41db-bebe-03680a0b1f89 /boot ext3 relatime 0 2
# /dev/sda1
UUID=a665a132-3023-4b70-b1f6-c38120307a6a /home ext3 relatime 0 2
# /dev/sdb3
UUID=27c83b47-7ec7-4781-a2c5-d900291b92d4 none swap sw 0 0
/dev/scd0 /media/cdrom0 udf,iso9660 user,noauto,exec,utf8 0 0

Cheers,
Daniel

Revision history for this message
John (john-m-lang) wrote :

You should always use 'sudo -i' to get a root prompt rather than 'sudo
bash', 'sudo su -', or any other method.

On Mon, Oct 27, 2008 at 4:02 PM, Irrlicht <email address hidden> wrote:

> No there are only big files on the DVD. 4.5 GB of files ~250 MB. I
> changed to all available schedulers now, it doesn't change anything. I
> noticed this the first time today, so I searched Google and found this
> bug.
>
> What did you change in your fstab? Is it fixed for you?
>
> My /etc/fstab:
> root@home:~/# cat /etc/fstab
> # /etc/fstab: static file system information.
> #
> # <file system> <mount point> <type> <options> <dump> <pass>
> proc /proc proc defaults 0 0
> # /dev/sdb2
> UUID=3a3e3337-0416-46f4-9336-253bd7dfbeac / ext3
> relatime,errors=remount-ro 0 1
> # /dev/sdb1
> UUID=f3eba35a-325c-41db-bebe-03680a0b1f89 /boot ext3 relatime
> 0 2
> # /dev/sda1
> UUID=a665a132-3023-4b70-b1f6-c38120307a6a /home ext3 relatime
> 0 2
> # /dev/sdb3
> UUID=27c83b47-7ec7-4781-a2c5-d900291b92d4 none swap sw
> 0 0
> /dev/scd0 /media/cdrom0 udf,iso9660 user,noauto,exec,utf8 0 0
>
> Cheers,
> Daniel
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

There is a new article on Phoronix, which compares the performance of different ubuntu versions (Feisty - Intrepid).
(see http://www.phoronix.com/scan.php?page=article&item=ubuntu_bench_2008&num=1 )

There is a huge difference between Feisty and the following version in the "Ram sequential read" test ( 3100 MB/s for Feisty and about 1800 MB/s for the other version). Perhaps the poor desktop performance is related with the issue.
(see http://www.phoronix.com/scan.php?page=article&item=ubuntu_bench_2008&num=3 )

Revision history for this message
dr4cul4 (dr4cul4) wrote :

Using current build of Intrepid Ibex kernel (at the moment of writing) problem still exist. I have 3 machines running Ubintu, on allof them there is high unresponcivenes during disk activity. 2 machines are laptops with sis and intel chipsets with IDE drives. Big machine is VIA with SATA and IDE drive. Last one has issues only during havy disk activity, but laptops have this issue all the time, For example each right click on desktop takes about 4 seconds to show up menu, and a lot of disk reading (using noatime partly solved that... only first time takes long, next are fast). This issue is driving me crazy, will someone deaply investigate it?

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

I have uses three different hard drives on the same machine. A Seagate Momentus 7200.2 with a throughput about 70MB/s, a Western Digital Scorpio WD2500BEVE with throughput about 60MB/s and a OCZ CoreSeries 64GB with throughput about 75MB/s and real write performance of 25MB/s.

The desktop responsiveness with the Seagate is bad, with WD is awful and unusable with the OCZ.

What's wrong with the current kernel versions? Has someone equal problems on a SCSI system?

Revision history for this message
Luka Renko (lure) wrote :

Thomas Pi, first I need to thank you for very detailed testing you have performed. I can more or less confirm the same on my HP nw8440: feisty was the last version that worked nicely on my laptop. Even latest intrepid release did not help.

I notice that when machine get's unresponsive, that most of CPU (and this means both cores here) is occupied by io-wait. You can best see this with "htop" utility - just install it from the repository.
Also, during this "storm of load", CPU load gets to 8-10, resulting in high heat of my laptop and heavy work on the fan. "acpi -t" shows temperature around 80C, which is much more than I want to see on average.

Since I expect that this is not just scheduler related, but may be also something with HW, I would like to know what kind of IDE/SATA controllers do you have. I have ICH7.

I agree that priority should be increased and this should get more attention from kernel developers.

Revision history for this message
exactt (giesbert) wrote :

As you are asking for IDE/SATA controllers. I have a
SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA .
I first experienced the problem when I enabled AHCI mode in BIOS.

I am running Intrepid AMD64

dmesg | grep ahci
[ 2.480262] ahci 0000:00:12.0: version 3.0
[ 2.480288] ahci 0000:00:12.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
[ 2.480315] ahci 0000:00:12.0: controller can't do 64bit DMA, forcing 32bit
[ 2.480415] ahci 0000:00:12.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode
[ 2.480418] ahci 0000:00:12.0: flags: ncq sntf ilck led clo pmp pio

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

If you have some ideas about the cause of this bug, I suggest you file a report directly on bugzilla.kernel.org, you'll get attention from people that have (relatively more) time to work on it. This is not likely to be a bug specific to Ubuntu.

Revision history for this message
Luka Renko (lure) wrote :

Two upstream bug reports that may be related:
1. http://bugzilla.kernel.org/show_bug.cgi?id=7372
2. http://bugzilla.kernel.org/show_bug.cgi?id=12072

First one has very similar pattern of when regression was first detected. Second one has similar symptoms, but not clear if older kernels were ok.

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

I was using a ICH4-M chipset and currently a ICH8-M system. The I/O wait is at 100% when, the problem occurs.
This issues occurs on my desktop machine with an AMD 790G chipset too. I know someone, who uses a VIA KT800 and is affected too by this bug. But as he uses his computer only for office work, it does not appear as often as on my machines.

At http://bugzilla.kernel.org/show_bug.cgi?id=7372 there are at least three different bugs. The initial bugs should be solved (see https://bugzilla.redhat.com/show_bug.cgi?id=444759).

The 64 bit problem, cannot cause the desktop responsiveness issue, because I have used a Pentium M, when I have switched from Feisty to Gutsy.

And it is right, that this is not a bug specific to Ubuntu, but Ubuntu seems to be most affected by the bug. The desktop responsiveness with Fedora 9 and 10 is much better on my machine.

I have done some research and tests. Here are some interesting results.

As the desktop performance in centos becomes "worse", when updating the from kernel-2.6.18-92.1.10.el5 to kernel-2.6.18-92.1.13.el5, I have checked the differences. One patch "linux-2.6-fs-dio-use-kzalloc-to-zero-out-struct-dio.patch" was applied to the Gusty kernel the first time. I know, that the patch should not have affected the performance.
I have unpatched the fs/direct-io.c file in my Hardy test installation (790G chipset) (kernel 2.6.24-22-generic) and made some tests.
Compared to the pachted kernel, I have >>mostly<< startup speedups of applications up to 10. The boot process takes 45 seconds instead of 60 seconds. During heavy write IOs and loadavg of 18, I could sometimes "use" even firefox (maximal 2-3 seconds of freezes). The start of gimp takes sometimes only 30 seconds instead of 140 seconds, but mostly. I have made these tests more than five times and used an equal process for testing the two kernels. Always these differences.
But when copying a file, instead of executing eight concurrent dd writing operation. Firefox freezes immediately and it takes about a minute to connect over ssh. Although there is a load avg of only 2. This should be the problem, which is caused by not fair scheduling between read and write access. It was described somewhere in the thread, but i cannot access the thread anymore.
https://fcp.surfsite.org/modules/newbb/viewtopic.php?viewmode=thread&topic_id=52598&forum=10&post_id=247938

The I tried to simulate a slower harddisk (ICH8-M), and installed hardy on a full encrypted disc. Which reduces the write speed to 40MB/s. The were no more differences between the patched and unpatched kernels. Both were unusable.

The bug is affected by different timings. As there must be an threshold of drive speed, at which the system switches from bad to unusable. That's must be the reason, why ubuntu kernels are more affected as the fedora onces.

This bug is annoying. Please help the kernel team to solve this bug.

Revision history for this message
Luka Renko (lure) wrote :

Interesting that you mention it: I have started to notice this slowdown much more with hardy, and this is also the time when I switched to almost fully encrypted disk on my laptop. It may be that kcryptd is making it worse...

Revision history for this message
yarly (ih8junkmai1) wrote :

Luka,

I've noticed general sluggish performance when using dm-crypt/kcryptd for a fully encrypted disk (minus boot partition).

Similar post...

https://lists.linux-foundation.org/pipermail/bugme-new/2008-May/018830.html

Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Revision history for this message
yarly (ih8junkmai1) wrote :

Launchpad Janitor,

How about breaking that down in understandable terms instead of vagueness.

If Ubuntu Kernel Team is not going to address this issue then who is?

Revision history for this message
peddy (peddy22) wrote :

While I agree with yarly, it should be noted that Launchpad Janitor is in fact a bot and is not capable of interpreting and acting on comments posted here.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Don't worry, this is just a matter of convention, it does not affect the work by the team.

If somebody could provide a kernel package (for old versions like 2.6.20 or for patched versions) usable in Intrepid, we could make other tests to find the root cause of the problem. This would allow to check the nice informations provided by Thomas Pi. Else nothing will be done by anybody, I guess.

Revision history for this message
Ben Gamari (bgamari) wrote :

I opened a new kernel.org bug, #12309 ( http://bugzilla.kernel.org/show_bug.cgi?id=12309 ), to replace #7372. Hopefully this one will be a little more productive.

Changed in linux:
status: Unknown → Confirmed
Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

After comparing some kernel code, I have found come really interesting fact. I think the poor desktop responsiveness is affected by the changed process scheduler (e.g. tickless kernel / high resolution timer ...) and not by the disc scheduler. I have written a test program (sorry for the dirty code), which enforces the problem and allows to measure it.

Here some fact. I have executed the tests in recovery mode (kernel parameter single) once with 20 processes * 1.000.000 messages and once with 100 processes * 100.000 messages. The result values are echo time of ~80-90% of the messages / longest echo time and test duration.

CentOS
2.6.18-92.el5 - 20/1M 4µs / 1s / 38,4s - 100/1k 4µs / 1s / 18,7s
Ubuntu 6.04 - 8.10
2.6.15-53 - 20/1M 3-33µs / 1s / 33,6s - 100/1k 3-40µs / 1s / 17,7s
2.6.20-17 - 20/1M 3µs / 1s / 32s - 100/1k 3-9µs / 1s / 16,0s
2.6.22-16 - 20/1M 3-4µs / 7s / 51,5s - 100/1k 4µs / 1s / 25,9s
2.6.24-23 - 20/1M 53µs / 64s / 73ms - 100/1k 77-250µs / 41ms / 32,0s
2.6.27-9 - 20/1M 120-200µs / 120ms / s - 100/1k 500-1000µs / 1s / 84s

While executing the test with 100/1M under xorg/Gnome, the problem is enforced. There are no problems on CentOS and Feisty. I could not test it on Ubuntu 6.06. And had heavy responsiveness problems with Hardy, Intrepid and Fedora 10. With 2.6.22 (installed in Feisty) the problem sometimes occurs and sometimes not.

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

2.6.27-9 - 20/1M 120-200µs / 120ms / 159s - 100/1k 500-1000µs / 1s / 84s

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

What can you conclude? What I can see is that 2.6.27 is much worse than previous kernels, and that 2.6.24 is not really good either. 2.6.22 seems to be worse than before too, though it's less visible.

Can you give these informations to the kernel developers in the upstream bug report? We need to find what change introduced the problem. What is strange is that the movement is progressive, being worse with each kernel...

Revision history for this message
Andy Whitcroft (apw) wrote : Re: [Bug 131094] Re: Heavy Disk I/O harms desktop responsiveness

Any chance you could test with the latest Jaunty (2.6.28 based) kernel
as well? You should be able to put that kernel on an Intrepid base for
the purposes of a test. Be interesting to see if the problem is still
there. If its truly a cpu scheduler issue then we can point the
scheduler developers at the stats.

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

Ubuntu 2.6.28-4.9-generic
20/1M 5-7µs / 1s / 81,2 - 100/1k 5-7µs / 1s / 40,2s
The new kernel freezes the system while executing my test in a normal runtime environment.

Revision history for this message
theparanoidone (theparanoidone) wrote :

Greetings~

I am not sure if my team is suffering from the problems described in this thread... but we've been having very strange i/o problems.

We have also found a slight solution:

Compile the kernel with:
CONFIG_HZ_1000=y
CONFIG_HZ=1000

(as opposed to CONFIG_HZ_100=y CONFIG_HZ=100)

I say slight because things run *much* better... however, I don't think it's the complete fix. This has sped things up quite a bit in our test cases. (I have yet to run the ProcessSchedulerTest.cpp attached to this thread, but I will do this asap and report back our findings). Feel free to post your results if you beat me to it.

(For those interested in the scenario we have been facing... you can reference my forum post here... but I think it would be best if people keep their feedback here at bugs.launchpad.net... here's the scenario:
http://ubuntuforums.org/showthread.php?t=1039476
(I'll post feedback about our sysbench test on a 2.6.15 or early kernel as soon as I can)

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

Can someone check, if clocksource=jiffies as kernel boot parameter helps?
No Intel IGM users, as xorg does not start.

Revision history for this message
Søren Holm (sgh) wrote :

Hi

I have currently the following running.

2 x "bzip2 -9 -c /dev/urandom >/dev/null" since I have 2 cores
and one "dd if=/dev/zero of=test.10g bs=1M count=10000"

And only small lockups happenend during that time, which was about 9 minuttes
Bu small locoups I mean a couple of seconds.

After the dd-command had finished the lockups where still occuring but they
where generally much shorrter.

For me it is definetly a fix.

Revision history for this message
Bogdan Gribincea (bogdan-gribincea) wrote :

I'm running a kernel compiled with:
CONFIG_HZ_1000=y
CONFIG_HZ=1000

Also using "elevator=anticipatory" boot param.
Everything is running much smoother now. Still getting some random stuttering but only under very heavy loads and they only last about 500ms or thereabouts.
Until yesterday I had the default Jaunty kernel, just using the anticipatory elevator and the desktop was much snappier than using CFQ under heavy I/O loads.
The 1000hz option doesn't seem to help as much but I have yet to test CFQ with 1000hz

Revision history for this message
Bogdan Gribincea (bogdan-gribincea) wrote :

Hmm, I was wrong. Running 4 x "bzip2 -9 -c /dev/urandom >/dev/null" (quad core) and one "dd if=/dev/zero of=test.10g bs=1M count=10000" like the comment above still gets me more stuttering than I would like. With CFQ it is MUCH worse though

Revision history for this message
gururise (gururise) wrote :

I can confirm this bug. Running on 64-bit Hardy 8.04 LTS with the latest updates on a Quad Core Q6600 Processor with 4GB of Ram, and 1TB SATA drive.

Un-raring/un-zipping large files or any prolonged Disk I/O such as viewing a directory of many thumbnails in Nautilus will bring my system to a halt, with the mouse either freezing or stuttering until Disk I/O is complete.

Used to run older versions of Ubuntu on a much humbler machine and had no such problems.

Revision history for this message
Vadim Peretokin (vperetokin) wrote :

Confirmed. Deleting 80mb rendered Firefox unusable and more programs a delayed response.

Revision history for this message
KhaaL (khaal) wrote :

I'm running 64bit of jaunty with up-to-date packages and i still have this issue. my computer is using SATA harddrives if that is of any help.

Revision history for this message
KhaaL (khaal) wrote :

Argh, the lack of a edit button... I forgot to mention that I'm running ext4, I'll have to try the 2.6.29 kernel and see if there is any improvement with that.

Revision history for this message
Gary Trakhman (gary-trakhman) wrote :

I thought I was affected by this bug, but as it turns out, it was another one. My swap file had gone missing due to an invalid UUID, which is another bug. This caused unresponsiveness when ram got filled up. Maybe others have this problem.

Revision history for this message
Vadim Peretokin (vperetokin) wrote :

That's definitely not the case here

Revision history for this message
Peter Hoeg (peterhoeg) wrote :

I am experiencing the exact same problems on my 64-bit intrepid laptop but strangely enough not on my 64-bit intrepid desktop - both use SATA drives.

But can somebody also confirm a similar problem when writing to USB disks? As an example, copying a large music collection to an ipod has the same effect as 'normal' disk io.

Revision history for this message
Søren Holm (sgh) wrote :

sure the problem exists there too, but it also relates to a faulty accounting
of the number of pages in the write-cache causing huge ammounts of memory
being used when the disk is slow.

Revision history for this message
Tobias Oetiker (tobi-oetiker) wrote :

The whole issue finally got the attention of the kernel developers
...

see http://www.gossamer-threads.com/lists/linux/kernel/1053130?do=post_view_threaded#1053130
and related ...

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch <email address hidden> ++41 62 775 9902 / sb: -9900

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

I have tried the two block_write_full_page patches with ext4, but still no improvement.

The only "working" patch is the "mm fix page writeback accounting to fix oom condition under heavy I/O" from Mathieu Desnoyers, which does not fix the problem, but makes it sufferable for me.

I am currently using the 2.6.29 kernel, in which (a part of) the fsync bug was fixed. At least Firefox works smooth for me, without any interruption.
(see SQLite-Test at http://global.phoronix-test-suite.com/?k=profile&u=ebird-3722-22013-9288 )
I think it's the 78f707bfc723552e8309b7c38a8d0cc51012e813 commit, as it reverts a parts from 2.6.26 commit 18ce3751ccd488c78d3827e9f6bf54e6322676fb and it fits with my benchmark results.
(see 2.6.29 Changelog http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.29 )

It should be reverted in the Ubuntu 9.04 kernel too.

Revision history for this message
Jarkko Lietolahti (jarkko-jab) wrote :

I'm also suffering from the same issue (slow hd i/o).

I also run the SQLLite-tests. Results are here http://global.phoronix-test-suite.com/index.php?k=profile&u=jarkko-20379-5630-13562

Later I noticed that I's using xfs filesystem, so the results are not comparable. But with deadline scheduler and nobarrier the results are similar to ext3.

What's more worrying is that with cfq scheduler and barrier the SQLLite results are extremly bad. cfq/barrier = 1603.34 seconds vs. deadline/nobarrier = ~155 seconds.

Kernel version didn't seem to matter, 2.6.29-020629-generic with deadline/nobarrier was about the same as 2.6.28-11-generic with deadline/nobarrier.

Revision history for this message
Rocko (rockorequin) wrote :

I experience this bug whenever the system hits my swap memory in Jaunty. I have 4GB of RAM and 4GB of swap and I'm using ext4 with default delayed allocation so I shouldn't be running into ext3 fsync problems. The desktop normally responds 'reasonably' fast, but if I (eg) try loading two 1.8GB VMs in VirtualBox, once the system hits swap the desktop stops responding *completely* (both times I tried this I gave up after 15 minutes and hard reset the PC) apart from jerky mouse cursor movements.

Revision history for this message
Tobias Oetiker (tobi-oetiker) wrote :

it could be that the "Give kjournald a IOPRIO_CLASS_RT io priority" patch http://lwn.net/Articles/301467/ could help. It was integrated in 2.6.29.1 but is is so simple that it should work for other releases too I guess.

Revision history for this message
unggnu (unggnu) wrote :

Anyone who couldn't confirm this bug just use dm-crypt with LUKS for your system and home partition and copy many big files from one to another or an encrypted usb device. You will have fun.

Music doesn't stop but sometimes a click is accepted three seconds later while no core has heavy usage.

Revision history for this message
Hendrik van den Boogaard (chasake) wrote :

I just installed a fresh Jaunty and I am experiencing the same problem. When I'm copying large files the responsiveness of the system goes down dramatically. You practically cannot use the system any more. It takes ages for the top menu to load (its icons) and when you start a simple program like the terminal it also takes ages for the terminal to appear. I have never experienced this on the Intrepid/Hardy/Gutsy/Feisty/Dapper releases before.

The only difference from my Intrepid install is that I did a fresh install on a SATA harddisk where my older releases were on an IDE drive.

I can try to install the Intrepid kernel on this Jaunty install and see if the problem persists.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Hendrik: your situation can be interesting to debug SATA problems. Could you explain more what has changed from Intrepid? I mean: did you only change the disk, keeping the same machine? Please try using the Intrepid kernel, and if you still have the problem, it would be nice to report these information upstream, on the bug linked above, where people are tracking down the causes of this problem. Else, if the problem only occurs with Jaunty's kernel, opening a separate bug will help.

Revision history for this message
Tobias Oetiker (tobi-oetiker) wrote :

Hi Hendrick,

try 2.6.29.1 or 2.6.30rcX both kernels should have anumber of fixes
for this problem which has been affecting a wide range of systems.

cheers
tobi

Today Hendrik van den Boogaard wrote:

> I just installed a fresh Jaunty and I am experiencing the same problem.
> When I'm copying large files the responsiveness of the system goes down
> dramatically. You practically cannot use the system any more. It takes
> ages for the top menu to load (its icons) and when you start a simple
> program like the terminal it also takes ages for the terminal to appear.
> I have never experienced this on the Intrepid/Hardy/Gutsy/Feisty/Dapper
> releases before.
>
> The only difference from my Intrepid install is that I did a fresh
> install on a SATA harddisk where my older releases were on an IDE drive.
>
> I can try to install the Intrepid kernel on this Jaunty install and see
> if the problem persists.
>
>

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch <email address hidden> ++41 62 775 9902 / sb: -9900

Revision history for this message
Amit Kucheria (amitk) wrote :

You can find .deb kernel packages for upstream kernels conveniently packaged at http://kernel.ubuntu.com/~kernel-ppa/mainline/

Note however, that these are not supported kernels. They are meant for users that are looking to test if their HW works on newer kernels.

On Sun, Apr 19, 2009 at 05:23:39PM -0000, Tobias Oetiker wrote:
> Hi Hendrick,
>
> try 2.6.29.1 or 2.6.30rcX both kernels should have anumber of fixes
> for this problem which has been affecting a wide range of systems.

Revision history for this message
Hendrik van den Boogaard (chasake) wrote :

Milan: I did a fresh install of Jaunty on an empty SATA harddisk. The rest of the machine is the same and I kept my PATA harddisk in to copy the old files. That's when I noticed that while copying the interface was sluggish. Something I always cursed Windows for and was one of my true beliefs Linux was capable of doing - true multitasking and not starving something important (or something unimportant for that matter) as the interface, just because of copying some bytes to the disk (why did they invent DMA in the first place? ;)).

While copying some large files you can see that the MB/sec remains at about the same level when trying to open a console window, where in Intrepid the MB/sec surely drops and performance is divided between the copy-task and the open-task in favour of showing the drop down menu and starting the program you want to access.

Some hardware specs of the machine:
* AMD Athlon(tm) 64 X2 Dual Core Processor 5200+
* 6 GB RAM (2x2GB + 2x1GB)
* Asus A8N-VM CSM motherboard running on nVidia GeForce 6150 nForce 430
* 1x PATA Samsung SP1614N
* 1x SATA Seagate ST3500841AS

I was copying from an XFS partition on the PATA disk to an XFS partition on the SATA disk.

I will now restart the machine using the Intrepid kernel and then I can try 'linux-image-2.6.29-02062901-generic_2.6.29-02062901_amd64.deb' from the kernel ppa mentioned above (thanks Tobi/Amit).

Is there some tool to measure the responsiveness or something? I would like to get some objective results instead of 'it feel sluggish'.

Revision history for this message
KhaaL (khaal) wrote :

Hendrik, you should comment and suscribe to the bug here: http://bugzilla.kernel.org/show_bug.cgi?id=12309 since it's a upstream bug and nothing that the ubuntu kernel team thats working on. AFAIK there's no objective ways to measure desktop responsiveness - yet. if you'd find a improvement with .29 or .30 kernel, please let me know as i have a quite similiar setup to yours.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Thanks for these detailed informations. So you suggest that the IO scheduler is not giving enough priority to tasks other than the file copy; that's an interesting way of finding the cause of the problem, indeed! I suggest you try the new kernel, and if it's not fixed, go to the upstream report and explain them your case. You can find many different scripts to test the system's responsiveness there, and you'll notice that's really tricky (long thread...).

Now, another test would be interesting: copy the same file from the SATA disk to itself, and see if there's any difference with what you've already done (in terms of responsiveness, not speed, because it will be different, obviously).

Revision history for this message
Jamie Lokier (jamie-shareable) wrote :

Milan wrote:
> Thanks for these detailed informations. So you suggest that the IO
> scheduler is not giving enough priority to tasks other than the file
> copy; that's an interesting way of finding the cause of the problem,
> indeed! I suggest you try the new kernel, and if it's not fixed, go to
> the upstream report and explain them your case. You can find many
> different scripts to test the system's responsiveness there, and you'll
> notice that's really tricky (long thread...).

I suggest it's not just about giving I/O priority to tasks other than
the copy, but also giving them enough "anticipatory" time so there
isn't a pair of head seeks for every I/O operation by the non-copying
tasks.

It has to look like this to be efficient:

    I/O for copy
    I/O for copy
    I/O for copy
    I/O for copy
    I/O for copy
                <head seeks>
                           I/O for other thing
                           I/O for other thing
                           I/O for other thing
                           I/O for other thing
                           I/O for other thing
                <head seeks>
    I/O for copy
    I/O for copy
    I/O for copy
    I/O for copy
    I/O for copy
                <head seeks>
                           I/O for other thing
                           I/O for other thing
                           I/O for other thing
                etc.

And not

    I/O for copy
                <head seeks>
                           I/O for other thing
                <head seeks>
    I/O for copy
                <head seeks>
                           I/O for other thing
                <head seeks>
    I/O for copy
                <head seeks>
                           I/O for other thing
                <head seeks>
                etc.

Revision history for this message
Hendrik van den Boogaard (chasake) wrote :

KhaaL: thanks for pointing to that thread, but there is so much information there that I cannot really separate all the issues at hand. The test suite does not give any satisfactory results which I could interpret. For now my best test is just to copy big files from PATA to SATA disk, however the XFS allocation strategy may scatter the files (or parts of every file) on the disk which makes I/O performance a bit different every time.

I've tried that using the Intrepid default kernel and the Jaunty default kernel. It seems that the Jaunty kernel is a lot more sluggish. From the PPA I downloaded 2.6.29 which seems to be no good either. Some people have mentioned slow fsync behviour in 2.6.29 just as in 2.6.28 which is the Jaunty default.

Currently I am running 2.6.30 from http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.30-rc2/linux-image-2.6.30-020630rc2-generic_2.6.30-020630rc2_amd64.deb in which I don't really notice slow GUI behaviour. While copying the large files I can still open a terminal and do stuff. I will reboot and try this again running with a 'mem=512m' boot option to make sure there is not too much caching going.

Revision history for this message
Hendrik van den Boogaard (chasake) wrote :

Update: running with 'mem=512m' seems to make performance just as bad as on older/other kernels. Then I recalled some information about native command queueing (thanks for the hint on your picture Jamie :)) and how this could trash your disk on some older drive models. I already disabled it on my fileserver a couple of months/years ago and now tried the following:

Copy the large files in one window (PATA -> SATA) and do a 'find /' (SATA drive) in the other window. I found out that the 'find' command is a *lot* more responsive pushing file names to the screen when I put the NCQ buffer on 1 item (effectively disabling NCQ). The mouse and cursor still lock up often for less than a second and performance is still sluggish but the machine is not completely unusable. While I am typing this I put the NCQ setting back to 31 as it was on before but now I cannot even type this complete sentence without seeing it appear on the screen. As far as I can remember somewhere in the 2.6.1x kernels NCQ was added together with all the new libata stuff (that's for me when a lot of trouble started; a bad sector on a disk created kernel-oopses on another NVidia based computer with a 4TB software Raid5 Array). This might also explain why I have not seen this problem before as my old 160G drive is PATA and has no NCQ.

I am wondering if anyone else tried this or can verify this against my experiences.

Another interesting observation: When I 'cat * > /de/null' from the large files directory on the SATA disk performance is sluggish, tabbing through windows is slow to unbearably slow. When I do the 'cat * > /de/null' on the PATA drive there is no sluggishness AT ALL! The swap was turned off to make sure that alt-tabbing would not load paged-out data from the SATA disk, but even after turning swap over to the PATA disk, performance stayed the same when catting on the PATA disk. Same for xfs_fsr defragmentation on SATA: slow and sluggish (also, but a bit less with NCQ off) and on PATA my GUI just remains responsive!

So afterall this really seems to SATA related!

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

I don't think this bug is SATA related. I have noticed this bug, while switching from 2.6.20 to 2.6.22 (Feisty to Gutsy) on a PATA drive. People recognized the while switching from 6.06 to 8.10.

Can you try to execute "time cat * > /dev/null" on the SATA with and without NCQ and on PATA drive?

Revision history for this message
KhaaL (khaal) wrote :

@Hendrik Try to post your information there as it will catch more attention of those working with this bug. AFAIK this bug came somewhere around 2.6.16 or 2.6.17, when a new I/O scheduler was introduced (i belive). I've tried with differend schedulers (AS, CFQ and deadline) and they didn't improve anything. right now I mount my partitions with noatime and writeback in order to minimize I/O operations. I'd like to participate in your testing but you'll have to tell me what NCQ is and how i can set its values :-)

@Thomas i think you're right, but i can't confirm since all my connectors are SATA. most of the people who have reported this problem are those with sata though.

Thank you both for your engagement in this bug.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Hendrik: The fact that work on your SATA drive makes the system sluggish, contrary to the PATA one, is normal since your system files are on that drive. Schedulers only deal with processes competing for the same drive access. If the problem is actually with SATA, the only proof we have is that you only changed your drive to SATA, and nothing else.

Anyway, we must be very prudent here since there may be very different issues that affect all users, or only some hardware models. But please go there and report, that can be useful:
http://bugzilla.kernel.org/show_bug.cgi?id=12309

Revision history for this message
Jamie Lokier (jamie-shareable) wrote :

Hendrik van den Boogaard wrote:
> Copy the large files in one window (PATA -> SATA) and do a 'find /'
> (SATA drive) in the other window. I found out that the 'find' command is
> a *lot* more responsive pushing file names to the screen when I put the
> NCQ buffer on 1 item (effectively disabling NCQ). The mouse and cursor
> still lock up often for less than a second and performance is still
> sluggish but the machine is not completely unusable. While I am typing
> this I put the NCQ setting back to 31 as it was on before but now I
> cannot even type this complete sentence without seeing it appear on the
> screen.

That's quite surprising: NCQ should in theory always make it faster,
unless you have a terrible drive implementation.

> As far as I can remember somewhere in the 2.6.1x kernels NCQ was
> added together with all the new libata stuff (that's for me when a lot
> of trouble started; a bad sector on a disk created kernel-oopses on
> another NVidia based computer with a 4TB software Raid5 Array). This
> might also explain why I have not seen this problem before as my old
> 160G drive is PATA and has no NCQ.

Both of these things (the oopses and the NCQ reduction in performance)
ought to be reported to Linux's libata maintainer...

-- Jamie

Revision history for this message
Hendrik van den Boogaard (chasake) wrote :
Download full text (3.5 KiB)

@Thomas, I can try the 'time cat..' line later but I don't think it will reveal information other than that catting the SATA harddisk is probably faster, because the drive is generally faster (higher capacity per platter, more cache).

@KhaaL, I must say that for the last test I used Anticipatory as queueing mechanism, I found that out jsut before rebooting, but both the PATA and SATA harddisks were set on Anticipatory, and both had the same amount of read ahead set, at that time 4096.

Native Command Queueing can be changed by changing the value of 'queue_depth' for a specific drive. You can find it like this:

cd /sys
find |grep queue_depth

Now the system will report a file like
./devices/pci0000:00/0000:00:0e.0/host2/target2:0:0/2:0:0:0/queue_depth

I will post my findings in

If you look inside that file you can see the value it is on (just 'cat' it) and you can change the value by something like

echo 1 > ./devices/pci0000:00/0000:00:0e.0/host2/target2:0:0/2:0:0:0/queue_depth

If it is already 1, your drive might not support NCQ.

I will post my findings in the other thread later. What I can also try is to make an exact copy of the contents of the SATA drive to an other SATA or PATA drive laying around and see if the sluggishness is different.

@Milan: you might think that the system on SATA is more sluggish than on PATA as it contains the system files, but even just alt-tabbing through windows is sluggish. If I do this with no I/O load on the drive this tabbing through programs does not interact with the disk as all programs are in memory and swap is turned off. When I start the load on the PATA drive the system remains responsive and windows just appear when I alt-tab, perhaps with a short delay, but that is ok. When I start the load on the SATA drive however, the alt-tab may take seconds to complete before the selected window appears. That's strange isn't it? To make sure I want to do the test mentioned above by 'dd-ing' the full SATA drive to the PATA or some other PATA/SATA drive and do the same test on the drive the system boots from.

@Jamie: NCQ might be horrible on the drive, afaik it is one of the first 500 GB 7200.10 drive from Seagate. I can try to update its firmware but if that removes the problem I cannot help out anymore ;). On my 7200.11 1 TB drives (also one of the first 1 TBs on the market) I also disabled NCQ because I found some thread that it might kill the contents of the file system. If you want I can lookup where I found that. I had a problem with my RAID 5 array in a server with these 1 TB disks and contacted the libata maintainers but I had to restore the bad sector before my array was destroyed by not having a spare disk, so after fixing the bad sector I could not reproduce the problem anymore. In the meantime I switched that server to an older kernel, probably a stock kernel from Gutsy or Feisty.

Hendrik: The fact that work on your SATA drive makes the system sluggish, contrary to the PATA one, is normal since your system files are on that drive. Schedulers only deal with processes competing for the same drive access. If the problem is actually with SATA, the only proof we have is that you only changed ...

Read more...

Revision history for this message
Jamie Lokier (jamie-shareable) wrote :

Hendrik van den Boogaard wrote:
> @Jamie:
> On my 7200.11 1 TB drives (also one of the first 1 TBs on the
> market) I also disabled NCQ because I found some thread that it
> might kill the contents of the file system. If you want I can lookup
> where I found that.

I'm guessing it's barriers not being implemented or enabled properly
in the kernel? (See ext3 "barrier=1" off by default, controversial
threads about it..) Even if the filesystem does barriers, Linux
software RAID does not. If it's not that, I'm interested in why NCQ
should be disabled. In principle, if used right, it should always be
an improvement or about the same, and it would be quite bad firmware
to fail at that.

> This machine however has a SATA disk and I never experienced any
> sluggishness on this machine, so far running Hardy and Intrepid. So the
> sluggishness may even be drive specific?

It might. There are tools, such as blktrace, which can help diagnose
if it's the drive if you know how to read the output. It is quite
outside what I have time for though :-)

-- Jamie

Revision history for this message
Rocko (rockorequin) wrote :

I get the same behaviour as I reported earlier (https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/131094/comments/192) with the vanilla kernel 2.6.30.rc3. Once the system tries to use swap memory, the disk starts thrashing heavily and X basically freezes.

Is this behaviour related to this particular bug, or is it something else? I'm finding that X becomes unusable and I have to hard reset the PC.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

That must be the same bug, or at least member of that family of bugs, because we still don't know how many different issues there are. See http://bugzilla.kernel.org/show_bug.cgi?id=12309#c316 for example.

Changed in linux:
status: Confirmed → Invalid
Revision history for this message
daneel (daneel) wrote :

I found a (very dirty) workaround: Install an old kernel (yes, its posible, im using Intrepid with Feisty kernel).

I instaled this one:

Image: http://www.michaelhallquist.com/ubuntu-cfs/linux-image-2.6.20.16-ubuntu-cfs-v20.4_Custom_i386.deb
Headers: http://www.michaelhallquist.com/ubuntu-cfs/linux-headers-2.6.20.16-ubuntu-cfs-v20.4_Custom_i386.deb

I found it here: http://ubuntuforums.org/showthread.php?t=538068
Its working very good but i have some problems with the nvidia driver (cant use dkms).
If you don't have sound try:

sudo chmod 777 /dev/snd/*

This nasty bug should be fixed. Thanks for keep working hard guys!

Revision history for this message
Hendrik van den Boogaard (chasake) wrote :

For me the problem disappeared completely after a fresh install of Jaunty. I think this is very strange, but two things were different from my original install, where my previous posts were about.

* I did a fresh install of the final version of Jaunty AMD64, not the release candidate
* The first time I installed Jaunty from inside a virtual machine running under Intrepid with Virtualbox, where I actually installed Jaunty on to a real hard disk, from which I rebooted after installation (I did this because I didn't have a blank CD available at that time and this way allowed me to install from the ISO image)
* I now formatted the root/boot partition where Jaunty was installed top, to ext4

Did something in the kernel or kernel settings change between the release candidate and the final version? Is it possible that when installing from within a virtual machine some default settings are different than when installing directly (I can imagine some timer settings are different in a virtual machine, and in a system in virtualbox the CPU is recognized as single core only).

In Windows I can imagine the systems parameters during installation are critical for running the system later, but I though that when booting Linux everything (all hardware) is recognized during startup so I does not matter on which host it is running (as long as the architecture is the same).

I used the exact same hardware and installed to the exact same partition as the first time. I don't think changing from ext3 to ext4 is the key here, because the slowdowns appeared when catting files from an xfs partition (however on the same physical disk as the root partition). When I do this 'cat * > /dev/null' on the large files the machine is not slow anymore, and everything just seems normal and works as it does in 8.10.

So on one hand I am a happy user now, because everything is normal again, but on the other hand I would like to know what the cause of all this was.

Revision history for this message
Paulo J. S. Silva (pjssilva) wrote :

In my machine I found a workaround after reading many threads in the subject. If you are using ext3, try changing the data mode. The ext3 filesystem has three modes. The default one is "ordered", the other two are "writeback" and "journal". They differ basically by the amount of information that is written to the journal before the real write to disk, the more information better the recovery from a system crash. The safest mode is journal, followed by ordered and then writeback.

In my machine, if I change the mode from ordered to journal or writeback the slowness under heavy load becomes much more bearable (it is not completely gone, but acceptable). In my case journal mode is the best, even though it is supposed to be the slowest mode (but the safest). I can now use tracker again.

To change the mode of your disk partitions (you need to do it for each partition) use tune2fs. For example

sudo tune2fs -o journal_data /dev/sda6

changes the mode to journal in partition sda6. To change the mode to writeback try

sudo tune2fs -o journal_data_writeback /dev/sda6

and to ordered (the default in Ubuntu)

sudo tune2fs -o journal_data_ordered /dev/sda6

After using tune2fs you need to reboot.

Obs: It seems that writeback may become the default mode in future kernels (or maybe they will use a new mode called guarded). The new kernels are supposed to have lots of fixes in this issue.

Revision history for this message
cornbread (corn13read) wrote :

This is happening to me and I have a fresh install of jaunty x64. Is Paulo's solution working for others? Is it safe to try?

I do a lot of large movie transfers. I didn't notice this issue in 8.10 x64 but after 9.04 install performance during large transfer is unbearable. Reminds me of dialup days but for local transfers.

Revision history for this message
KhaaL (khaal) wrote :

Changing to writeback mode is not harmful, however it did not help in my
case. I got improved performance but i still have stutters during I/O
activity, the more intense the lower response from the GUI

On Thu, May 14, 2009 at 04:23, cornbread <email address hidden> wrote:

> This is happening to me and I have a fresh install of jaunty x64. Is
> Paulo's solution working for others? Is it safe to try?
>
> I do a lot of large movie transfers. I didn't notice this issue in 8.10
> x64 but after 9.04 install performance during large transfer is
> unbearable. Reminds me of dialup days but for local transfers.
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Invalid
> Status in “linux” source package in Ubuntu: Confirmed
> Status in “linux-source-2.6.22” source package in Ubuntu: Won't Fix
>
> Bug description:
> Binary package hint: linux-source-2.6.22
>
> When compared with 2.6.15 in feisty, heavy disk I/O causes increased iowait
> times and affects desktop responsiveness in 2.6.22
>
> this appears to be a regression from 2.6.15 where iowait is much lower and
> desktop responsiveness is unaffected with the same I/O load
>
> Easy to reproduce with tracker - index the same set of files with 2.6.15
> kernel and 2.6.22 kernel and the difference in desktop responsiveness is
> massive
>
> I have not confirmed if a non-tracker process which does heavy disk i/o
> (especially writing) replicates this yet - will do further investigation
> soon
>

Revision history for this message
cornbread (corn13read) wrote :

For the first time since coming to ubuntu with 7.04 I am thinking of switching distros. I do a lot of large file copying and I might as well go get coffee during copies of large data. For the entire duration my computer is so slow it is unusable (even to browse the web)

core2duo e8400 3.0ghz with 4GB ram is brought to its knees when copying a simple 1GB+ file.

Revision history for this message
Ben Gamari (bgamari) wrote :

@cornbread

Comments like that really don't help. Moreover, this is a kernel issue that is affecting all distributions across the board; I recently came to Ubuntu from Fedora where it was just as bad.

However, things are looking pretty good for getting this fixed by 2.6.31, which as it stands will be in Karmic. Last month there was a set of patches[1] posted to the LKML reworking the page eviction code to give executable code priority over streaming pages, which should help the thrashing situation quite a bit.

Secondly, there is the Jens Axboe's per-bdi flusher threads which seem to be kicking some serious ass in initial testing[2].

Lastly, there was very recently a breakthrough on the kernel.org incarnation of this bug where Thomas Pilarski's tireless efforts in bisecting the issue finally resulted in some measurably regressing commits. Jens has already looked at the commits in question and it looks very promising that we'll see at least some improvement come of this.

All in all, don't fret, things are looking up. It's certainly a frustrating bug for users and developers alike, but I think the efforts of the community may about to pay off.

- Ben

[1] http://thread.gmane.org/gmane.linux.kernel.mm/33818
[2] http://lkml.org/lkml/2009/5/28/164
[3] http://bugzilla.kernel.org/show_bug.cgi?id=12309#c360

Revision history for this message
pinepain (pinepain) wrote :

no solution fore a about 2 years!!! wow! this really cool. but latest
ubuntu distros hung nice without need to copy large data. they just
hangs (windozz way?). sorry.

emm.... maybe i'm wrong, but this bug appears only (or almost always)
in ubuntu-based distros, isn't it?

has anyone tried to reproduce this bug on other distros?

what do u think, may hdd (or other device from/to we try to copy data)
firmware affect on low performance???

Revision history for this message
Bryan Wu (cooloney) wrote :

Unfortunately it seems this bug is still an issue. Can you confirm this issue exists with the most recent Jaunty Jackalope 9.04 release - http://www.ubuntu.com/news/ubuntu-9.04-desktop . If the issue remains in Jaunty, please test the latest upstream kernel build - https://wiki.ubuntu.com/KernelMainlineBuilds . Let us know your results. Thanks.

-Bryan

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
KhaaL (khaal) wrote :

Bryan, this bug is still alive and kicking in Jaunty with 2.6.30 kernel. I've been following this bug on kernel.orgs bugzilla and there seems to have been a breakthrough lately, see this comment: http://bugzilla.kernel.org/show_bug.cgi?id=12309#c366.

Unfortunatly, since I don't know how to compile a kernel, or even how to apply patches, i cannot test it. If someone can guide me through the process or even provide prebuilt kernels, I'd be grateful..

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Bryan, please check what a report is about before asking stock questions. This one is very hard to work out, and work is going on upstream to find where the regression can have been introduced. Asking people to check if it's in Jaunty doesn't make sense, we're not even sure everybody here is experiencing the same problem.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
qwerty (escalantea) wrote :

Just an idea ... try tunning pdflush parameters ( ... must be root) :

echo 200 > /proc/sys/vm/dirty_writeback_centisecs
echo 400 > /proc/sys/vm/dirty_expire_centisecs

If it works, make the changes permanent by editing /etc/sysctl.conf

Revision history for this message
Bryan Wu (cooloney) wrote :

@Milan, I followed this thread for a long time and try to help here. Although I used the stock questions, I want to make sure everyone here know this issue is still remain in all the release. This is a long story, so we need add some checkpoint to let people understand this issue. I will try to provide 2.6.30 kernel + Jen's patches and call for testing.

Thanks

-Bryan

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Providing a testing kernel package would really be great! Then it would make sense to ask people to confirm the bug is still here. Though we have learned that I/O and responsiveness are very difficult to measure - not sure we'll be able to clearly confirm anything on a so large thread... ;-)

Revision history for this message
Ben Gamari (bgamari) wrote :

@Bryan, @Milan, It is unlikely that Jens' bdi patches will substantially affect the issue. It appears that the problem is in large part due to poor eviction choices on the part of the VM system. There are some patches in mm to fix this. See my previous comment. If you are going to put together a testing kernel you would be much better off trying these I believe.

Revision history for this message
Jamie Lokier (jamie-shareable) wrote :

Milan Bouchet-Valat wrote:
> Providing a testing kernel package would really be great! Then it would
> make sense to ask people to confirm the bug is still here. Though we
> have learned that I/O and responsiveness are very difficult to measure -
> not sure we'll be able to clearly confirm anything on a so large
> thread... ;-)

There are several reports that copying a large file always make the
desktop very slow, so that should be simple to test - on those systems.

Revision history for this message
Jared Wiltshire (jaredwiltshire) wrote :

Is this the same bug that is being talked about here?
http://ubuntuforums.org/showthread.php?t=1152176

People in that thread indicate they only have problems when using AHCI mode SATA disks.

Revision history for this message
Igor Lautar (igorl) wrote :

Just changed from AHCI to IDE in BIOS (HP 8530w).

Initial feeling is that it makes a huge difference (for the better).

Revision history for this message
exactt (giesbert) wrote :

for me also the problem appeared when turning on AHCI.

check out https://bugs.launchpad.net/bugs/343371 .

Revision history for this message
Petter (pettno) wrote :

Jared, Igor and Exactt and others with problems introduced in recent Ubuntu versions: This bug is related to something introduced in Gutsy (version 7.10). I still (!) have problems related to this. Please move along, follow bug #343371 or create your own bug reports.

Revision history for this message
ReneS (mail-03146f06) wrote :

AHCI on/off does not make a difference. When copying a 10GB file on disc, the machine becomes unresponsive. Top shows high IO up to 80%. Application do not start until the copy operation has finished.

Running Ubuntu 9.04 with Linux 2.6.30-020630-generic #020630 SMP Wed Jun 10 09:04:38 UTC 2009 x86_64 GNU/Linux

Revision history for this message
Jim Lieb (lieb) wrote :

This regression has been around since about the 2.6.18 timeframe and has eluded a lot of testing to isolate the root cause. The most promising fix is in the VM subsystem (mm) where the LRU scan has been changed to favor keeping executable pages active longer. Most of these symptoms come down to VM thrashing to make room for I/O pages. The key change/commit is ab4754d24a0f2e05920170c845bd84472814c6, "vmscan: make mapped executable pages the first class citizen". For those interested in the details and are familiar with 'git', the commit changelog entry has a complete description of the problem and the fix. You can find this in either the ubuntu-karmic git repository or on kernel.org.

This change was merged into the 2.6.31r1 kernel. The Karmic Alpha 3 snapshot, currently scheduled for the last week of July, will have a 2.6.31 kernel containing this change. Please test this version and report back whether your latency issues have been resolved. There is no guarantee that this change will solve the latency problems in any particular workload so as much testing in a variety of machines and workloads is important.

Thank you.

NOTE: This new version of the Karmic kernel will also have the new KMS patches to match the upgrade of the Xorg server. Since most of the latency complaints center around GUI latencies, this adds a new set of variables. There are mainline kernel packages available now for those who cannot wait for the Alpha 3 release that can run on either Jaunty or Karmic (alpha) but they may have problems with the older Xorg server. If you find X related problems with these kernels, please wait for the Alpha 3 release and not bother to report X problems unless they are also present in the A3 release.

Backporting note:
The commit mentioned above is just one change in the VM subsystem. Backporting this and the number of its associated patches back to a Jaunty (2.6.28) or earlier kernels would probably not be productive and may create new stability problems of their own given the amount of change between the two versions.

Changed in linux (Ubuntu):
assignee: nobody → Jim Lieb (lieb)
status: Confirmed → In Progress
Revision history for this message
Ben Gamari (bgamari) wrote :

While there are certainly a lot of considerations here, I fail to see how KMS (kernel mode setting) could ever even _possibly_ affect desktop responsiveness. Most sessions changes modes once, if that. Once the mode is set and framebuffer is setup KMS is entirely out of the picture. Let's not pretend there are more variables than there really are.

Revision history for this message
Jim Lieb (lieb) wrote :

Sorry, I did not make myself clear. KMS only enters into this picture because there have been some reports that during this transition period to KMS, the kernel and Xorg have not played well together. This is simply a "heads up" until the next Alpha appears. There have been plenty of side issues in the history of this and other reports already. I intended to mention KMS as a side issue up front to keep testing focused on the I/O + Latency issue, not on something that might be broken in pre-release packages. Consider it, "Warning, do not step here."

Revision history for this message
JQ (bazs111) wrote :

can latencytop help with anything?

Revision history for this message
cornbread (corn13read) wrote :

+1 for making this a critical bug.

I have to warn ever ubuntu user I set up that this will cause them issues when doing large data transfers. Quite harmful to user friendliness.

Canonical should make this one of their higher priorities to get fixed.

Revision history for this message
cl333r (cl333r) wrote :

The importance is (already) marked as "high". I wonder, is it difficult to fix or reproduce? Some ppl claim this is a regression since kernel 2.6.15.

Anyway, the bug is real, I (like many others) tested it on my dual booting computer, and the Ubuntu desktop feels significantly less responsive than winXP when doing file I/O in the background, it's also a bit annoying that the bug has been filed back in 2007 and there still seems to be certain uncertainty around it.

Revision history for this message
Paulo J. S. Silva (pjssilva) wrote :

Did anyone try koala alpha 3?

I did and have some interesting behavior.

My machine (a intel motherboard with a G33 chipset) and Core 2 Duo E6600 processor. It has 2 SATA drives under AHCI and I use a 64bit kernel.

I can easily reproduce the problem using the fsync-tester program that you can get in the kernel bug #12309 (http://bugzilla.kernel.org/show_bug.cgi?id=12309). All I need to do is to run the program with a dd that creates
very large file with the command line:

dd if=/dev/zero of=./bigfile bs=1M count=15000 & ./fsync-tester

In jaunty (ext3) the desktop becomes unresponsive right away. The times given by fsync-tester look like:

fsync time: 0.5422
fsync time: 6.3691
fsync time: 8.5983
fsync time: 0.7820
fsync time: 0.7695
fsync time: 4.6577
fsync time: 5.6024
fsync time: 9.4238
fsync time: 10.8609

I tried some variations of koala. Here are my findings:

1) 64bit Koala in default configuration (ext4).

The desktop is more responsive, but not really good yet. The fsync-tester gives:

fsync time: 0.1351
fsync time: 0.9104
fsync time: 9.1311
fsync time: 1.9133
fsync time: 11.9529
fsync time: 1.6751
fsync time: 2.7171
fsync time: 8.4801

So the better responsiveness is not related to the fsync times, probably it is related to other
changes in the kernel. There is no clear change if I go from turn AHCI off in the bios.

2) If I change the journal mode to writeback, the fsync times improve a lot (and the responsiveness improve):

fsync time: 0.0781
fsync time: 0.0581
fsync time: 1.7040
fsync time: 1.4743
fsync time: 1.5957
fsync time: 1.7751
fsync time: 1.9164
fsync time: 1.4886
fsync time: 1.3991
fsync time: 1.8332

3) If on the other hand, I use a i386 kernel with the default ordered mode the times are also much better (as good as writeback + amd64):

fsync time: 0.0677
fsync time: 3.8825
fsync time: 1.4467
fsync time: 2.7759
fsync time: 1.5819
fsync time: 3.2423
fsync time: 3.4318
fsync time: 1.5432
fsync time: 1.3225

4) i386 + writeback gets a little better:

fsync time: 0.0946
fsync time: 1.3528
fsync time: 1.4029
fsync time: 1.3787
fsync time: 1.0880
fsync time: 1.2656
fsync time: 0.9047
fsync time: 0.8842
fsync time: 0.8008
fsync time: 1.4933
fsync time: 1.3645

(There are know 3s delays as above)

5) Now comes the interesting surprise. If I install amd64 koala using ext3 with its default mode (which I think is
writeback, but I am not sure). I get very good responsiveness and times:

fsync time: 0.0329
fsync time: 0.0156
fsync time: 0.2369
fsync time: 0.1274
fsync time: 0.2285
fsync time: 0.2196
fsync time: 0.2563
fsync time: 0.2147
fsync time: 0.2968
fsync time: 0.2602
fsync time: 0.1131
fsync time: 0.2348

I didn't try i386+ext3. I ran out of time and patience :-)

So, in my computer the problem seems to be related with many factors: amd64 X i386, writeback X ordered mode, and ext4 X ext3.

It would be very interesting if anyone can reproduce my little experiment.

And remember, if you are annoyed by this bug you may want to stick with ext3 in koala for now.

Revision history for this message
cornbread (corn13read) wrote :

Bug exists for me with 9.10 Alpha 3 64 bit

Revision history for this message
Ben Gamari (bgamari) wrote :

Can we please stop referring to this as a bug? It may be a problem, it may be
the product of a collection of bugs, but it is almost certainly not one bug.
This report has to-date accumulated almost 250 comments, including numerous
incomparable benchmarks, dozens of descriptions of subtly different problems,
countless flawed workarounds, and yet not a single bisection attempt.

In fact, this report is in far worse shape than the kernel.org report which was
closed months ago due to lack of focus. I strongly believe that this report
should see the same end. So far the patchset which was most likely to fix this
has already been merged (8cab4754: vmscan: make mapped executable pages the
first class citizen). Since this clearly hasn't improved things, it is time
that we go back to the drawing board.

At this point, the only responsible course forward is to close this bug and
start from scratch, this timing taking greater care to keep independent bugs in
separate reports; otherwise we will end up in the same situation as we
currently find ourselves. In general, I believe that Ubuntu's bug tracker
really isn't an appropriate forum for discussing what is demonstrably a
cross-distribution kernel issue. While we can certainly have a tracker here,
true technical discussion belongs on the kernel.org report.

As has been demonstrated in the past, this bug is quite difficult to pin-down.
A responsive desktop is the product of interactions between components in all
layers of the stack, including (perhaps) most importantly the memory management
and block layers. We must avoid convoluting things any more than they already
are by tying together matters which are fundamentally independent (no more
driver references; this has been shown to be a largely hardware-independent
bug, treat it as such).

Anyways, despite all of these considerations, I am hopeful that a solution will
be found. As a first order of business, someone with the proper permissions
must put this bug out of its long-lived misery. Then perhaps we can move
forward to isolating the true cause of this issue.

Revision history for this message
Sam Davies (seivadmas) wrote :

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/427210

^^ Bug report that actually discusses one particular possible solution for this problem with an easily reproducible test.

Revision history for this message
Rocko (rockorequin) wrote :

In case it's relevant for Karmic, this email mentions latency issues reintroduced in kernel 2.6.30/31 but apparently fixed now in 2.6.32: http://lkml.org/lkml/2009/10/5/247

Revision history for this message
LGB [Gábor Lénárt] (lgb) wrote :

Hmm, Rocko, sounds so good ;) I'm wondering if it's possible to backport these patch(es?) into the kernel used by karmic (or is the change to big?), since it seems that's a serious problem for most desktop users, even my friends and relatives who are using ubuntu said, that ubuntu went totally unusable because of performance (responsiveness under even smaller loads) problems :( So I think at least, it's a quite serious issue for desktop users.

Revision history for this message
Tobias Oetiker (tobi-oetiker) wrote : FsOpBench shows only data=writeback with cfq works

I have just completed extensive benchmarking on 2.6.31.2 and 2.6.24
with a new benchmark program I have written, to measure real-world
io performance in high load situations where readers and writers
are compeeting.

I am looking at HW RAID setups as well a normal single harddrive
situations. The bottom line is that on a single harddrive only

  data=writeback with cfq scheduler

has decent performance when readers and writers are in competition
and even then, there will be huge outliers of many seconds happeing
every now and then. See

  http://insights.oetiker.ch/linux/fsopbench/

for details.

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch <email address hidden> ++41 62 775 9902 / sb: -9900

Revision history for this message
Jamie Lokier (jamie-shareable) wrote : Re: [Bug 131094] FsOpBench shows only data=writeback with cfq works

Tobias Oetiker wrote:
> I have just completed extensive benchmarking on 2.6.31.2 and 2.6.24
> with a new benchmark program I have written, to measure real-world
> io performance in high load situations where readers and writers
> are compeeting.
>
> I am looking at HW RAID setups as well a normal single harddrive
> situations. The bottom line is that on a single harddrive only
>
> data=writeback with cfq scheduler
>
> has decent performance when readers and writers are in competition
> and even then, there will be huge outliers of many seconds happeing
> every now and then. See
>
> http://insights.oetiker.ch/linux/fsopbench/
>
> for details.

Is that with ext4 or ext3?

Revision history for this message
Tobias Oetiker (tobi-oetiker) wrote :

Today Jamie Lokier wrote:

> Tobias Oetiker wrote:
> > I have just completed extensive benchmarking on 2.6.31.2 and 2.6.24
> > with a new benchmark program I have written, to measure real-world
> > io performance in high load situations where readers and writers
> > are compeeting.
> >
> > I am looking at HW RAID setups as well a normal single harddrive
> > situations. The bottom line is that on a single harddrive only
> >
> > data=writeback with cfq scheduler
> >
> > has decent performance when readers and writers are in competition
> > and even then, there will be huge outliers of many seconds happeing
> > every now and then. See
> >
> > http://insights.oetiker.ch/linux/fsopbench/
> >
> > for details.
>
> Is that with ext4 or ext3?

I have tested with ext3

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch <email address hidden> ++41 62 775 9902 / sb: -9900

Revision history for this message
Yan Li (yanli) wrote :

Tobi, your testing and results are great and very useful. It would be better if you can run those tests on ext4. Thank you.

Revision history for this message
Tobias Oetiker (tobi-oetiker) wrote : Re: [Bug 131094] Re: Heavy Disk I/O harms desktop responsiveness

Hi Yan,

Yesterday Yan Li wrote:

> Tobi, your testing and results are great and very useful. It would be
> better if you can run those tests on ext4. Thank you.

I have now also put ext4 through the paces ... its overall
behaviour seems to be that same than with ext3, the same settings
render the best performance. Overall, the single reader scenario
seems to suffer a performance drop of 20% to 30% while the three
reader scenario gains about 30%. Large maximum latencies have
become bigger if anything.

I have updated the report on

http://insights.oetiker.ch/linux/fsopbench/

including the detailed results ...

The cfq scheduler seems todo a pretty good job at being fair. The
main problem (no one seems to be talking about) in my eyes, is the
hangups, where suddently an mkdir call takes up 19 seconds to
complete. This makes all the rest seem like minor issues.

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch <email address hidden> ++41 62 775 9902 / sb: -9900

Revision history for this message
Yan Li (yanli) wrote :

Tobias Oetiker:

Thank you very much for the update. I'm a bit surprised to see the single-reader on ext4 is worse than that on ext3. I'm to postpone the upgrade of my systems to ext4. I dare not using data=writeback yet.

I'm a bit confused about why you ran this on a RAID6 system. The RAID card/driver might affected the performance in a way yet to be understood. IMHO the less layers between Linux kernel and hard drive, the better we can understand the kernel I/O scheduler/fs etc.

Again, thank you for the great data.

Revision history for this message
Tobias Oetiker (tobi-oetiker) wrote :

Today Yan Li wrote:

> Tobias Oetiker:
>
> Thank you very much for the update. I'm a bit surprised to see the
> single-reader on ext4 is worse than that on ext3. I'm to postpone the
> upgrade of my systems to ext4. I dare not using data=writeback yet.
>
> I'm a bit confused about why you ran this on a RAID6 system. The RAID
> card/driver might affected the performance in a way yet to be
> understood. IMHO the less layers between Linux kernel and hard drive,
> the better we can understand the kernel I/O scheduler/fs etc.
>
> Again, thank you for the great data.

As you can see from the results, running the test on a RAID6 gives
vastly different results. Fact is that for reliability we are
running all our servers on RAID6, so this is the configuration I am
most interested to see working well ... good performance on a
single disk does not help me much ... (I am glad to see that it is
even worse to some extent, than my RAID6 performance).

I think at the heart of the problem lies the fact benchmarks focus
on single aspects of subsystems which then get optimized without
looking at the overall impact.

cheers
tobi

>
>

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch <email address hidden> ++41 62 775 9902 / sb: -9900

Revision history for this message
cornbread (corn13read) wrote :

Fixed for me on my desktop with karmic beta!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Revision history for this message
Rocko (rockorequin) wrote :

@cornbread: I think they've put the 'no new fair sleepers' patch into ubuntu's 2.6.31 kernel, which does help with responsiveness (though it slows down some high-CPU tasks like games). This might explain it.

However, I noticed a massive drop in responsiveness yesterday while copying 14GB to a flash drive, including a frozen mouse cursor for up to thirty seconds. Although atop showed that my internal hard drive (/dev/sda) was only being used very lightly (as you'd expect when copying to flash), it was thrashing away constantly. So there are still some issues.

Revision history for this message
Rocko (rockorequin) wrote :

An update on my comment above: when doing heavy I/O from an external drive (/dev/sdb) to a slow external flash key (/dev/sda), I can hear my internal hard drive (/dev/sda) thrashing away constantly, even though its light indicates no disk read/write operations. So something in the kernel must be making it constantly seek, and this is affecting /dev/sda access and hence desktop responsiveness.

The desktop is now much faster for applications that are already loaded, but anything that has to access the disk still experiences long wait times.

Revision history for this message
Rocko (rockorequin) wrote :

I opened http://bugzilla.kernel.org/show_bug.cgi?id=14491 for this disk thrashing issue. I couldn't get 2.6.32-rc5 to do it, but I can repeat it in 2.6.31.5 when the PC RAM is near full.

Revision history for this message
Geoffrey Pursell (geoffp) wrote :

For me, apps become sporadically unresponsive (one or another will actually freeze solid for a few seconds at a time) during a simple file copy from one hard drive to another. The source drive is an older IDE drive (ext3) and the destination is a newer SATA drive (ext4). It's a music collection, with files varying in size from 3 MB to 20; the transfer goes about 27MB/sec.

This is with a stock Ubuntu 2.6.31-14 kernel on a fresh AMD64 Karmic.

Revision history for this message
Hans van den Bogert (hbogert) wrote :

I've noticed something very weird, when running tiobench, When run on the root filesystem I'm experiencing same as everyone else, When the exact same test is run/written from/to something else than the root fs, io wait is much lower, responsiveness is excellent and data rate is excellent too.

I can't seem to find any paramaters which are different across filesystems, ruled out lvm, and it's on the same disk.

Revision history for this message
LGB [Gábor Lénárt] (lgb) wrote :

I've just had a very annoying experience: on notebook (running up-to-date ubuntu 9.10, 32 bit) I wanted to copy a large file onto a pendrive. Normally, - I think - it shouldn't affect the I/O performance too much, since that pendrive is quite slow, so I doubt that it can make the hdd (I was copying from) or any other I/O subsystem too busy other than the pendrive itself. However it almost killed the notebook: I couldn't change windows, alt-tab required 10 minutes (!) to react. Without any I/O of course I have no usability problems with it. Nothing interesting in the kernel log ...

Revision history for this message
Jonathan Bower (jonathanbower) wrote :

LGB, Yes, and this is why I can't really recommend Ubuntu to my friends. unfortunately.

Revision history for this message
tankdriver (stoneraider-deactivatedaccount) wrote :

I found out something very interesting:
Test case: Ubuntu Karmic, external USB HDD, Usb stick
1. Copy a lot of data via nautilus to external Usb
2. During copying, plug in the USB Stick.
Nothing happens.
3. When copying is finished, (e.g.. after 20minutes) the USB-Stick suddenly appears on the desktop.

I can confirm this testcase with jaunty&karmic, 32&64-bit on HP Laptops, Asrock&Gigabyte Desktops with multiple USB-Sticks and HDDs (and even with 2 USB-Sticks in the Test case)
Can anyone confirm this? Is this a Kernel (this Bug) or eventually a nautilus issue?

Revision history for this message
Rocko (rockorequin) wrote :

@LGB and tankdriver: I've noticed both these problems, but kernel 2.6.32 fixes them for me. The easiest way to try it is to get the header and image deb files (32 or 64 bit as appropriate) from http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.32.2/ and install them with "sudo dpkg -i".

Revision history for this message
tankdriver (stoneraider-deactivatedaccount) wrote :

@Rocko: 2.6.32 does not change anything for me (except nvidia doesn't compile ;-) )
For me, I think its an non-kernel issue (nautilus?) because in dmesg the sdc1 sdd .. stuff will show up every time shout after i plug in a drive, but on the (gnome)-Desktop, nothing appears.
I will try Kubuntu for testing.

Revision history for this message
Rocko (rockorequin) wrote :

@tankdriver: you are absolutely right; I found I can reproduce this if I copy a large file to a USB flash device (I was using USB hard drives when I tested it before). I've opened a new bug for it (see bug #504113) because I don't think it's relevant to this bug.

However, 2.6.32 does fix (for me) the unresponsive desktop problem when copying a large file to a slow device that LGB reported.

Changed in linux (Ubuntu):
assignee: Jim Lieb (lieb) → Ubuntu Kernel Team (ubuntu-kernel-team)
status: In Progress → Confirmed
Revision history for this message
LGB [Gábor Lénárt] (lgb) wrote :

@Rocko: I have problems with non-slow drives too :) I have another hdd where I usually have big I/O (let's call it "data hdd"), the system, my home, swap everything is on another one ("system hdd"). When I have the heavy I/O on data hdd, I/O on the system hdd is horrible too, even if they are the same, fast disks, and nothing is copied etc etc. Anyway I will try the newer kernel too just I need nvidia's "binary blob" so I can't use my desktop system if the newer kernel won't play nice with the nvidia driver :(

Revision history for this message
Rocko (rockorequin) wrote :

@LGB: To use nvidia in the latest kernel, I normally download the latest nvidia driver from either:

64 bit: ftp://download.nvidia.com/XFree86/Linux-x86_64/

32 bit: ftp://download.nvidia.com/XFree86/Linux-x86/

You want the file with the highest number (...pkg1.run or ...pkg2.run).

Then install it manually (instructions here are for the 195.30 beta driver, which works fine on my PC):

1. Remove any existing nvidia drivers, eg the restricted Ubuntu modules.

2. Either reboot into recovery mode; or get a tty console (eg CTRL-ATL-F1), log in, and do "sudo stop gdm" to kill X (make sure you save any data first!).

3. Install by executing the file you downloaded, eg "sudo sh NVIDIA-Linux-x86_64-195.30-pkg2.run". If 64 bit, tell it to install the 32 bit libraries as well or 32 bit games won't work. If you already have /etc/X11/xorg.conf set up for nvidia, there's no need to let the installer alter it.

4. Reboot.

An optional last step if it works fine is to install to dkms with (eg) "sudo sh installdkms.sh 195.30" (195.30 is the nvidia version in this example, and the script is attached). Then it recompiles the nvidia module automatically whenever you install a new kernel (and it will compile it for the stock 2.6.31 kernel, too).

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

The heavy io problem is partial a ubuntu problem. While copying some files from one hard disk to another with ubuntu (Karmic), my system becomes completely unresponsive with the ppa (32) and the 2.6.31 kernel. The same copy operation with Fedora 12 and the kernel 2.6.32 produces no freezes.

Revision history for this message
djr013 (djr013) wrote :

The latest Lucid kernel ("2.6.32-12.16", 64bit) seems to have fixed this longstanding problem for me. Also, for this reason or another during the update, the initial RAM usage appeared to go down a good ~64MB. Before, this machine was mysteriously less usable than an otherwise older and slower one (even after accounting for this having half the RAM). Of course, this bug is fairly general, and could have a few potential causes, so my case may not apply to all who reported this one.

Revision history for this message
LGB [Gábor Lénárt] (lgb) wrote :

I've upgraded to lucid. Otherwise it runs quite well, but this I/O problem is still here or even worse! Now a single apt dist-upgrade (with 30 packages to update or so) made my X frozen (no mouse can be moved, the music player just repeats a single second, I guess from some kind of buffer only, etc) for long minutes. I've managed to switch to text console (CTRL-ALT-F1) and after login, according to top, the system load was above 11, and almost all CPU was in "wait" state.

Linux oxygene 2.6.32-12-generic #17-Ubuntu SMP Fri Feb 5 08:14:39 UTC 2010 i686 GNU/Linux

After end of apt-get, everything returned into the normal way of working ...

Revision history for this message
LGB [Gábor Lénárt] (lgb) wrote :
Download full text (14.4 KiB)

During those "cannot-do-anything" times, I even got kernel messages. One example:

[998882.032583] INFO: task gvfsd-trash:2063 blocked for more than 120 seconds.
[998882.032589] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[998882.032594] gvfsd-trash D 0001ae75 0 2063 1 0x00000000
[998882.032602] d59f7da0 00000086 00000000 0001ae75 00000000 c087c6c0 d5825c5c c087c6c0
[998882.032614] 7a06b2df 00038c42 c087c6c0 c087c6c0 d5825c5c c087c6c0 c087c6c0 dd11a200
[998882.032628] 7a04da23 00038c42 d58259b0 cb0f4078 cb0f407c ffffffff d59f7dcc c05a48a6
[998882.032637] Call Trace:
[998882.032650] [<c05a48a6>] __mutex_lock_slowpath+0xc6/0x130
[998882.032654] [<c05a47c5>] mutex_lock+0x25/0x40
[998882.032660] [<c020e1fe>] real_lookup+0x2e/0x110
[998882.032664] [<c020fc05>] do_lookup+0x95/0xc0
[998882.032669] [<c021043d>] __link_path_walk+0x54d/0xb60
[998882.032673] [<c020f246>] ? path_to_nameidata+0x36/0x50
[998882.032677] [<c0210bf6>] path_walk+0x46/0xa0
[998882.032681] [<c0210d59>] do_path_lookup+0x59/0x90
[998882.032685] [<c02118a1>] user_path_at+0x41/0x80
[998882.032691] [<c01304cc>] ? kmap_atomic_prot+0x4c/0xf0
[998882.032696] [<c01e2f60>] ? __do_fault+0x3a0/0x4b0
[998882.032702] [<c02099ca>] vfs_fstatat+0x3a/0x70
[998882.032706] [<c0209a60>] vfs_lstat+0x20/0x30
[998882.032710] [<c0209a89>] sys_lstat64+0x19/0x30
[998882.032715] [<c05a7f2b>] ? do_page_fault+0x19b/0x380
[998882.032721] [<c0237492>] ? sys_inotify_add_watch+0xc2/0x100
[998882.032727] [<c010344c>] syscall_call+0x7/0xb
[999002.032195] INFO: task gvfsd-trash:2063 blocked for more than 120 seconds.
[999002.032202] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[999002.032208] gvfsd-trash D 0001ae75 0 2063 1 0x00000000
[999002.032218] d59f7da0 00000086 00000000 0001ae75 00000000 c087c6c0 d5825c5c c087c6c0
[999002.032232] 7a06b2df 00038c42 c087c6c0 c087c6c0 d5825c5c c087c6c0 c087c6c0 dd11a200
[999002.032245] 7a04da23 00038c42 d58259b0 cb0f4078 cb0f407c ffffffff d59f7dcc c05a48a6
[999002.032258] Call Trace:
[999002.032274] [<c05a48a6>] __mutex_lock_slowpath+0xc6/0x130
[999002.032283] [<c05a47c5>] mutex_lock+0x25/0x40
[999002.032291] [<c020e1fe>] real_lookup+0x2e/0x110
[999002.032298] [<c020fc05>] do_lookup+0x95/0xc0
[999002.032304] [<c021043d>] __link_path_walk+0x54d/0xb60
[999002.032312] [<c020f246>] ? path_to_nameidata+0x36/0x50
[999002.032318] [<c0210bf6>] path_walk+0x46/0xa0
[999002.032324] [<c0210d59>] do_path_lookup+0x59/0x90
[999002.032331] [<c02118a1>] user_path_at+0x41/0x80
[999002.032338] [<c01304cc>] ? kmap_atomic_prot+0x4c/0xf0
[999002.032348] [<c01e2f60>] ? __do_fault+0x3a0/0x4b0
[999002.032354] [<c02099ca>] vfs_fstatat+0x3a/0x70
[999002.032358] [<c0209a60>] vfs_lstat+0x20/0x30
[999002.032362] [<c0209a89>] sys_lstat64+0x19/0x30
[999002.032367] [<c05a7f2b>] ? do_page_fault+0x19b/0x380
[999002.032374] [<c0237492>] ? sys_inotify_add_watch+0xc2/0x100
[999002.032379] [<c010344c>] syscall_call+0x7/0xb
[999122.032597] INFO: task gvfsd-trash:2063 blocked for more than 120 seconds.
[999122.032603] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[99912...

Revision history for this message
psypher (psypher246) wrote :

Please could the severity/importance of this bug be raised. I would consider a bug which has been prevalent for almost 3 years, which caused extreme slowdown of the entire desktop when there is any kind of high disk activity, to be pretty serious. This drastically affect my usage and productivity of the desktop on a daily basis. Up until this time I have always thought it's just how linux is and the benefit of all the other great features outweighs a bit of slow down. Well it seems to be getting worse and as I start to use even more of the potential of my PC's I am getting a little tired of it. Recently been testing ubuntuone extensively and I suspect a big portion of my extremely slow index and read issues, of the thousands of files I have in my ubuntuone folder, is actually caused by this bug. Although there are still some improvements which could be made in that process which the ubuntuone team are actively working on and doing some great work. There at least someone is working on big issues but sadly it seems only due to the commercial potential of ubuntuone.

This seems to be a kernel issue as per this bug report: https://bugzilla.kernel.org/show_bug.cgi?id=12309
But the status of that bug is confusing as it's marked as closed due to insufficient information.

Does anyone know who is working this? I have attempted to contact the person assigned as well as Ben Gamari, who I see is subscribed to this bug on launchpad as well, for some clarification. I will patiently wait their response. What testing needs to be done and what can be done from a non-developers perspective to fix this.

IMO this is the worst bug in linux right now and I think a bit more attention must be brought to it, more than just a slashdot article: http://it.slashdot.org/article.pl?sid=09/01/15/049201

I would not recommend Ubuntu or Linux to any new users until this bug is fixed. It is embarrassing when trying to praise all the benefits of using linux to a Windows user when their entire desktop locks up when trying to do simple things like run a backup or unrar a file.

In the past I have attributed slowness issues in several other applications as application specific problems. There have been bugs logged for:

Unison, Firefox, downthemall FF plugin, unrar, ubuntuone, flash, gnome, VMWare, Virtualbox, qemu and kvm etc etc.

I think all of these issues could be attributed to this one problem. There are too many apps that experience slowness and grey outs for it not to be related. this happens between disks or on the same disk, between disk type like usb, ide or sata. So we must not confuse the problem. If there is any kind of disk IO the machine freezes. As the bug above says: Large I/O operations result in slow performance and high iowait times. That is the problem as far I can tell.

I offer any help required to fix this bug. Just let me know what i need to do. But please lets raise the importance. This is a critical issue, not medium.

Thanks guys, keep up the great work, linux still rocks :)

Revision history for this message
LGB [Gábor Lénárt] (lgb) wrote :

I totally agree. It seems to be a long standing issue and it seems it's just getting more and more worse every time. Now when I want to do anything reasonable which involves disk I/O (even just copying a CD image from one disk to another, or dist-upgrade) I let the machine to do, and I go for a coffee, since the machine is simply unusable during it. I noticed it earlier but as I've mentioned it seems it's getting more and more worse with every new kernels. I still remember the good old days with debian (maybe with 2.4 kernels or at least early 2.6 ones) when even the hardest I/O tasks can't interrupt even playing audio from the same disk: now I have to pray not to be interrupted even when the machine is idle and got some a very short I/O task from some process ... And btw I had much weaker hw before ...

Revision history for this message
John Baptist (jepst79) wrote :

I believe that this problem can be alleviated by using Con Kolivas's BFS scheduler instead of the stock scheduler. There are PPAs on Launchpad where you can get kernels with the BFS scheduler for Lucid and Karmic. On my system, it really seems to make the system much more reponsive. I hope that the Linux kernel team consider including the BFS scheduler as an option in future kernel releases, and until then I think the Ubuntu team should consider making a BFS kernel the default for the desktop version of Ubuntu.

Revision history for this message
Paulo J. S. Silva (pjssilva) wrote :

Actually, there is this PPA:

https://launchpad.net/~darxus/+archive/bfsbfq

That have the BFS scheduler and BFQ I/O scheduler which may also play
an interesting role here. I have not tried it yet but I should try it
soon.

best,

Paulo

On Thu, Mar 25, 2010 at 8:50 AM, Jeff Epstein <email address hidden> wrote:
> I believe that this problem can be alleviated by using Con Kolivas's BFS
> scheduler instead of the stock scheduler. There are PPAs on Launchpad
> where you can get kernels with the BFS scheduler for Lucid and Karmic.
> On my system, it really seems to make the system much more reponsive. I
> hope that the Linux kernel team consider including the BFS scheduler as
> an option in future kernel releases, and until then I think the Ubuntu
> team should consider making a BFS kernel the default for the desktop
> version of Ubuntu.
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>

--
Paulo José da Silva e Silva
Professor Associado, Dep. de Ciência da Computação
(Associate Professor, Computer Science Dept.)
Universidade de São Paulo - Brazil

e-mail: <email address hidden> Web: http://www.ime.usp.br/~pjssilva

Revision history for this message
Alessio Igor Bogani (abogani) wrote :

Hi,

Anyone have already tested if linux-rt mitigate the issue?

Thanks!

Revision history for this message
daneel (daneel) wrote :

Im using bfsbfq kernel. Its just a little better than the generic kernel.

2010/3/25 Paulo J. S. Silva <email address hidden>:
> Actually, there is this PPA:
>
> https://launchpad.net/~darxus/+archive/bfsbfq
>
> That have the BFS scheduler and BFQ I/O scheduler which may also play
> an interesting role here.  I have not tried it yet but I should try it
> soon.
>
> best,
>
> Paulo
>
>
> On Thu, Mar 25, 2010 at 8:50 AM, Jeff Epstein <email address hidden> wrote:
>> I believe that this problem can be alleviated by using Con Kolivas's BFS
>> scheduler instead of the stock scheduler. There are PPAs on Launchpad
>> where you can get kernels with the BFS scheduler for Lucid and Karmic.
>> On my system, it really seems to make the system much more reponsive. I
>> hope that the Linux kernel team consider including the BFS scheduler as
>> an option in future kernel releases, and until then I think the Ubuntu
>> team should consider making a BFS kernel the default for the desktop
>> version of Ubuntu.
>>
>> --
>> Heavy Disk I/O harms desktop responsiveness
>> https://bugs.launchpad.net/bugs/131094
>> You received this bug notification because you are a direct subscriber
>> of the bug.
>>
>
>
> --
> Paulo José da Silva e Silva
> Professor Associado, Dep. de Ciência da Computação
> (Associate Professor, Computer Science Dept.)
> Universidade de São Paulo - Brazil
>
> e-mail: <email address hidden>         Web: http://www.ime.usp.br/~pjssilva
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Invalid
> Status in “linux” package in Ubuntu: Confirmed
> Status in “linux-source-2.6.22” package in Ubuntu: Won't Fix
>
> Bug description:
> Binary package hint: linux-source-2.6.22
>
> When compared with 2.6.15 in feisty, heavy disk I/O causes increased iowait times and affects desktop responsiveness in 2.6.22
>
> this appears to be a regression from 2.6.15 where iowait is much lower and desktop responsiveness is unaffected with the same I/O load
>
> Easy to reproduce with tracker - index the same set of files with 2.6.15 kernel and 2.6.22 kernel and the difference in desktop responsiveness is massive
>
> I have not confirmed if a non-tracker process which does heavy disk i/o (especially writing) replicates this yet - will do further investigation soon
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/131094/+subscribe
>

Revision history for this message
Brian Takita (brian-takita) wrote :

I filed a possibly related bug at:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/566841

I think this bug is pretty severe, because it makes my powerful laptop (quad core Dell M4400) act like it is > 10 years old for about 30 minutes after the period of heavy I/O.

Revision history for this message
Brian Takita (brian-takita) wrote :

Basically when I run hdparm -T /dev/sda, I get ~ 6500 MB/sec before the heavy I/O process.

After the heavy I/O process, hdparm -T /dev/sda is ~ 350 MB/sec.

What is strange is during the heavy I/O process, it is ~ 2000 MB/sec. Performance continues to degrade after the process finishes.

Revision history for this message
Brian Takita (brian-takita) wrote :

Also attempting to clear the buffer cache does not help at all.

sync && sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'

This issue is really mysterious. I would also go a workaround, such as clearing some sort of buffer.

Revision history for this message
Brian Takita (brian-takita) wrote :

I also want to say that for me, this is one of the most severe bugs in Ubuntu (and Linux in general).

Revision history for this message
Brian Takita (brian-takita) wrote :

What is also strange is that the issue still persists even after a reboot, though a bit less severe.

Now running sudo hdparm -T /dev/sda yields:

saturn:~ $ sudo hdparm -T /dev/sda
/dev/sda:
 Timing cached reads: 2152 MB in 2.00 seconds = 1078.08 MB/sec

I am able to get hdparm back to ~ 6500 MB/sec if I wait for around 30 minutes. Locking the system also seems to help.

I am also able to get it working faster after shutting down gdm and stopping some processes.

Could there be some persistent buffer out there that is blocking the entire system?

Revision history for this message
Brian Takita (brian-takita) wrote :

I ran inotifywatch on all of the top level directories, and got the following results.

I also tried

> sudo rm -rf /tmp/* && sync && sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'

but the performance problem persisted.

Revision history for this message
Alex Wauck (awauck) wrote :

Brian: what happens if instead of simply rebooting, you power the machine down completely, wait a bit, then start it up again?

Revision history for this message
Brian Takita (brian-takita) wrote :

Hey Alex, I'll go ahead and try it in a second.

After the screen locked (after 10 minutes of inactivity), then logging back in, my performance fully came back (hdparm -T yielded 6500 MB/s).

I ran inotifywatch again and got the following results.

Revision history for this message
Brian Takita (brian-takita) wrote :

Alex: Powering down completely and waiting seems to work too.

I also notice that when the performance is compromised, the hard drive seems to be constantly spinning.

Revision history for this message
Brian Takita (brian-takita) wrote :

Another data point. My work machine does not have this issue.

It is a Mac pro 8-core desktop with a Raptor 10k drive. I wonder if this is a laptop Sata controller (or power saving related) issue.
I can try this test of my other Dell laptop tonight.

Revision history for this message
KhaaL (khaal) wrote :

I appriciate you digging so deep into this bug Brian, I've given up personally!

I have it on my desktop machine where the harddisks are connected through SATA interface. It happenes still in lucid...

Revision history for this message
Brian Takita (brian-takita) wrote :

Thanks KhaaL. Juan Flynn suggested that I check the temperature to make sure no throttling is taking place. I'll give that a shot tonight.

So far, the possibilities are some driver issue or some sort of hardware throttling issue.

I ran hdparm -i /dev/sda on my work machine (which does not have this issue).

I'll post the results from my laptop next.

Revision history for this message
Brian Takita (brian-takita) wrote :

Here are the results from my laptop.

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Brian: if you really w

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Brian: If you really want to tackle this, you need time and skills: open a new report on bugzilla.kernel.org, and they will help you to find out what's going on. Here you won't find people that work on the precise I/O issues this bug is about. See the example of the upstream task above. But that's not an easy task, be warned! ;-)

Revision history for this message
Brian Takita (brian-takita) wrote :

That sounds good. I signed for an account and I'll do as you suggest. I'm curious about pretty curious about this anyways. Thanks for the warning also. Hopefully it will not be insurmountable. :-)

Revision history for this message
Brian Takita (brian-takita) wrote :

I installed GKrellM and noticed a strong correlation between the temperature and performance of the system. As the temperature went up, the value of hdparm -T went down.

The funny thing is I have a Typematrix keyboard, which I placed on top of my laptop. There is air flow going through the keys of the keyboard, so it seems that the M4400 uses the keyboard as part of the cooling system.

I assume this is because the hardware system is protecting itself against overheating by throttling performance, which makes sense. For me I think the issue may be solved. Thanks for the help. Sorry about the noise caused. I hope this helps somebody.

Revision history for this message
LGB [Gábor Lénárt] (lgb) wrote :

I am not using tracker and such since it almost renders the machine unusable (since it gives some I/O) ... As far as I remember I had no problem with I/O some years ago, since then it just becomes more and more worse ... Even with "better" hardware. Maybe I had PATA disk and controller that time (also SCSI once) but I am using SATA since then, can it be the problem, I wonder? But on PATA disks I can see the problems now, but can it be caused by the libata usage over the old ide driver? I have no idea :(

Revision history for this message
Chase Douglas (chasedouglas) wrote :

There's a tunable parameter called 'swappiness' that determines whether pages of memory are swapped out to hard disk in lieu of using the memory for buffers or not. A value of 0 means memory pages will never be swapped out in favor of buffers. A value of 100 is the opposite. The default value in the kernel is 60. I would be interested to know if anyone sees desktop responsiveness improve if the value is lowered.

Theoretically, if you lower the value your desktop applications should respond better as you switch between them or on return after leaving the machine for a period of time. However, if you are doing a large amount of data transfers or other intensive tasks you may see a drop in overall performance. For a desktop system, many users prefer responsiveness over a small drop in performance.

Please see this page for more details, an overview of what you may expect if you decrease or increase the value, and how to change the value: http://www.pythian.com/news/1913/

Thanks

Revision history for this message
Juan Flynn (juan-launchpad) wrote :

Chase Douglas wrote:
> There's a tunable parameter called 'swappiness' that determines whether
> pages of memory are swapped out to hard disk in lieu of using the memory
> for buffers or not. A value of 0 means memory pages will never be
> swapped out in favor of buffers. A value of 100 is the opposite. The
> default value in the kernel is 60. I would be interested to know if
> anyone sees desktop responsiveness improve if the value is lowered.

I would suggest that swap should be turned off completely (and that the
system should have sufficient RAM to operate comfortably) while
isolating the cause of poor responsiveness in a desktop system. If we
can replicate slow responsiveness under heavy IO conditions where the
heavy IO is not caused by swapping it might be easier to pin down the
reason why the problem is occurring as intensive swapping itself is a
cause of heavy IO.

Juan

Revision history for this message
ReneS (mail-03146f06) wrote :
Download full text (3.3 KiB)

Just to add my recent observations.

I turned swap off because I felt that 4GB main memory is plenty of space. Machine was working ok but during the day, without even doing anything special, just typing along, browsing a little, and opening some windows, but not starting programs, the machine started to block. X was unresponsive up to 15 minutes, atop displayed heavy page scanning, and the load factor went up to 8. Disk was very active. Interestingly, there is always plenty of free physical memory available. Cache up to 1 GB and 512mb free or even more. So no reason to do anything with memory pages at all.

This repeats over and over again. Sometimes just for a minute, sometimes for two. No fixed time periods, no fixed recurrence pattern. But it is always the same picture, load goes up, page scanning starts, IO is heavy (no swap configured) - maybe in a different order.

Was checking /var/log/messages during this and without swap, I got memory allocation failures messages from pulseaudio. Scanning and disk activity seem to start at this point in time. With swap added, these allocation failures are gone, but the overall behavior is the same. So I guess that an allocation failure of some kind (no messages), causes the rescanning of all pages and therefore the heavy disk activity.

I noticed, that committed virtual memory is of course bigger than the available real memory. Some programs, such as nautilus have a virtual size of up to 700mb while physical memory usage is around 60mb. Firefox virtual around 1gb, real 200mb... skype 200 to 30mb. Just to add my recent observations.

I turned swap off because I felt that 4GB main memory is plenty of space. Machine was working ok but during the day, without even doing anything special, just typing along, browsing a little, and opening some windows, but not starting programs, the machine started to block. X was unresponsive up to 15 minutes, atop displayed heavy page scanning, and the load factor went up to 8. Disk was very active. Interestingly, there is still plenty of free physical memory available. Cache up to 1 GB and 512mb free or even more. So no reason to do anything with pages at all.

This repeats over and over again. Sometimes just for a minute, sometimes for two. No fixed time periods. But it is always the same picture, load goes up, page scanning starts, IO is heavy (no swap configured). Maybe the order of events is different, but because all programs seem to halt for a moment, not all information can be seen.

I noticed, that committed virtual memory is of course bigger than the available real memory. Some programs, such as nautilus have a virtual size of up to 700mb while physical memory usage is around 60mb. Firefox virtual around 1gb, real 200mb... skype 200 to 30mb.

I understand that virtual size is bigger, but it seems to the sum of all virtual sizes going over the limit of the physical memory, causes very frequent and hefty memory page scans. Strange is, that without swap, the disk is used heavily. I am not an expert, but I assume that this is related to relocating/dropping program code that was mapped into virtual memory.

Only observations. Now, I updated my BIOS and bought a n...

Read more...

Revision history for this message
KhaaL (khaal) wrote : Khalid Rashid wants to stay in touch on LinkedIn

LinkedIn
------------

Bug,

I'd like to add you to my professional network on LinkedIn.

- Khalid Rashid

Khalid Rashid
Introductionary secretary at Göteborgs Stad
Sweden

Confirm that you know Khalid Rashid
https://www.linkedin.com/e/isd/1334905321/Y36CthRY/

------
(c) 2010, LinkedIn Corporation

Revision history for this message
jhfhlkjlj (fdsuufijjejejejej-deactivatedaccount) wrote :

While I must say that this spam was pretty funny, should it be removed on account of that link?

Revision history for this message
KhaaL (khaal) wrote : Re: [Bug 131094] Re: Heavy Disk I/O harms desktop responsiveness

Yes, please :-)

I have myself to blame for clicking randomly while being on the phone.
-------------------------
Khalid Rashid

- "In the middle of every difficulty lies opportunity", Albert Einstein.

On Wed, May 26, 2010 at 19:31, Chauncellor <email address hidden> wrote:

> While I must say that this spam was pretty funny, should it be removed
> on account of that link?
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Invalid
> Status in “linux” package in Ubuntu: Confirmed
> Status in “linux-source-2.6.22” package in Ubuntu: Won't Fix
>
> Bug description:
> Binary package hint: linux-source-2.6.22
>
> When compared with 2.6.15 in feisty, heavy disk I/O causes increased iowait
> times and affects desktop responsiveness in 2.6.22
>
> this appears to be a regression from 2.6.15 where iowait is much lower and
> desktop responsiveness is unaffected with the same I/O load
>
> Easy to reproduce with tracker - index the same set of files with 2.6.15
> kernel and 2.6.22 kernel and the difference in desktop responsiveness is
> massive
>
> I have not confirmed if a non-tracker process which does heavy disk i/o
> (especially writing) replicates this yet - will do further investigation
> soon
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/131094/+subscribe
>

Revision history for this message
jhfhlkjlj (fdsuufijjejejejej-deactivatedaccount) wrote :

I hear that the very recent 2.6.35 is helpful in reducing the impact of this bug.

https://bugzilla.kernel.org/show_bug.cgi?id=12309

On the flip side....

http://www.phoronix.com/scan.php?page=article&item=linux_2635_fail&num=1

Revision history for this message
Brian Rogers (brian-rogers) wrote :

The Phoronix link can be ignored. It was a regression that has since been fixed, and their coverage was quite sensationalist to begin with.

Revision history for this message
Johannes H. Jensen (joh) wrote :

I experience the same issue on my ThinkPad X61 running Lucid 64-bit on both 2.6.32-23 and 2.6.34 mainline from http://kernel.ubuntu.com/~kernel-ppa/mainline/.

A simple `dd if=/dev/zero of=big.file bs=1M count=1500' reproduces the problem.

What's interesting is that applications are responsive up until the point where the memory is filled up with cache, after which applications becomes unresponsive and the system extremely slow. I've tried to reduce the swappiness to 20 without any noticeable results.

I'm going to give 2.6.35-rc1 a try now, to see if there are any improvements.

Revision history for this message
Johannes H. Jensen (joh) wrote :

Same issues with 2.6.35, unfortunately. After a few seconds of the dd command, empathy freezes and keyboard input starts to lag heavily. The overall responsiveness is horrible. I wonder if switching to 32-bit might help...?

Revision history for this message
KhaaL (khaal) wrote :

Johannes, I've tried both the 64 and 32-bit versions of the whole ubuntu
distro, and it's not by any means any less present in the 32-bit version.
-------------------------
Khalid Rashid

- "In the middle of every difficulty lies opportunity", Albert Einstein.

Revision history for this message
cornbread (corn13read) wrote :

Is the issue limited to ubuntu? Is debian or mint affected?

"KhaaL" <email address hidden> wrote:

>Johannes, I've tried both the 64 and 32-bit versions of the whole ubuntu
>distro, and it's not by any means any less present in the 32-bit version.
>-------------------------
>Khalid Rashid
>
>- "In the middle of every difficulty lies opportunity", Albert Einstein.
>
>--
>Heavy Disk I/O harms desktop responsiveness
>https://bugs.launchpad.net/bugs/131094
>You received this bug notification because you are a direct subscriber
>of the bug.

--
Sent from my EVO 4G

Revision history for this message
KhaaL (khaal) wrote :

In my testing i've experienced this issue on opensuse aswell. This is most
likely a kernel bug, propably this one:
https://bugzilla.kernel.org/show_bug.cgi?id=12309
-------------------------
Khalid Rashid

- "In the middle of every difficulty lies opportunity", Albert Einstein.

Revision history for this message
Johannes H. Jensen (joh) wrote :

Yeah, unfortunately kernel bug #12309 is a complete mess of different
symptoms and problems, and thus completely useless. We should really
submit a new upstream bug regarding this exact issue and link this bug
against it.

FWIW, `stress -d 1' also reproduces the issue here.

- Johannes

On Wed, Jun 23, 2010 at 8:12 AM, KhaaL <email address hidden> wrote:
> In my testing i've experienced this issue on opensuse aswell. This is most
> likely a kernel bug, propably this one:
> https://bugzilla.kernel.org/show_bug.cgi?id=12309
> -------------------------
> Khalid Rashid
>
> - "In the middle of every difficulty lies opportunity", Albert Einstein.
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
Johannes H. Jensen (joh) wrote :

I just tested with the anticipatory scheduler on the stock Ubuntu 2.6.32:

# echo anticipatory > /sys/block/sda/queue/scheduler

This did not seem to have any effect - the problem was still very much present.

Revision history for this message
Peter Hoeg (peterhoeg) wrote :

Have you tried mounting the filesystems with writeback instead of ordered?

/peter

On Wed, Jun 23, 2010 at 15:42, Johannes H. Jensen <email address hidden> wrote:
> I just tested with the anticipatory scheduler on the stock Ubuntu
> 2.6.32:
>
> # echo anticipatory > /sys/block/sda/queue/scheduler
>
> This did not seem to have any effect - the problem was still very much
> present.
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
Johannes H. Jensen (joh) wrote :

I haven't tried writeback, no. Is it possible to remount with this
option, or do I need to modify fstab and reboot?

- Johannes

On Wed, Jun 23, 2010 at 10:00 AM, Peter Hoeg <email address hidden> wrote:
> Have you tried mounting the filesystems with writeback instead of
> ordered?
>
> /peter
>
> On Wed, Jun 23, 2010 at 15:42, Johannes H. Jensen <email address hidden> wrote:
>> I just tested with the anticipatory scheduler on the stock Ubuntu
>> 2.6.32:
>>
>> # echo anticipatory > /sys/block/sda/queue/scheduler
>>
>> This did not seem to have any effect - the problem was still very much
>> present.
>>

Revision history for this message
Ritesh Raj Sarraf (rrs) wrote :

On Wednesday 23 Jun 2010 15:22:04 you wrote:
> I haven't tried writeback, no. Is it possible to remount with this
> option, or do I need to modify fstab and reboot?

On the fly remount of the data= mode was denied. And then, setting
data=writeback into /etc/fstab ended up with a read-only rootfs.

Unless someone confirms, don't do it. My VMs are gone now.

--
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."

Revision history for this message
Thomas Pilarski (thomas.pi) wrote :

I cannot reproduce this bug in every configuration, especially on my fresh test installation. My assumption. This bugs depends on the maximum transactions/s of the setup. Fragmentation of the disc have a negative effect on this bug. Especially my encrypted lvm volumes drops after a lot of usage. Turning on snapshots for a backup increases the effect as well.
Has anyone tried to reproduce this bug with a Postville G2 or a SSD with SandForce controller?

Revision history for this message
Peter Hoeg (peterhoeg) wrote :

Ritesh.

ext3 has supported writeback mode since at least 2001 (look here:
http://www.ibm.com/developerworks/library/l-fs8.html), so I hardly
think this could have caused any damage. If you have lost some VMs it
must be because something else is terribly wrong with your setup.

/peter

On Wed, Jun 23, 2010 at 18:15, Ritesh Raj Sarraf <email address hidden> wrote:
> On Wednesday 23 Jun 2010 15:22:04 you wrote:
>> I haven't tried writeback, no. Is it possible to remount with this
>> option, or do I need to modify fstab and reboot?
>
> On the fly remount of the data= mode was denied. And then, setting
> data=writeback into /etc/fstab ended up with a read-only rootfs.
>
> Unless someone confirms, don't do it. My VMs are gone now.
>
>
> --
> Ritesh Raj Sarraf
> RESEARCHUT - http://www.researchut.com
> "Necessity is the mother of invention."
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
Ritesh Raj Sarraf (rrs) wrote :

On Wednesday 23 Jun 2010 21:22:43 Peter Hoeg wrote:
> ext3 has supported writeback mode since at least 2001 (look here:
> http://www.ibm.com/developerworks/library/l-fs8.html), so I hardly
> think this could have caused any damage. If you have lost some VMs it
> must be because something else is terribly wrong with your setup.

Okay! Did data=writeback work for you ? Was there a performance change ?

--
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."

Revision history for this message
Johannes H. Jensen (joh) wrote :

So I just tested writeback on my desktop computer which exhibits the
same problems. I mounted both the root filesystem and /home with
data=writeback (ext3).

So far the difference is *huge*! The system is much more responsive -
I'm writing this while 'stress -d 4' is running in the background. The
same applies to the dd test - all apps respond almost instantly with
writeback, as opposed to sluggish and hanging with ordered.
Applications open much faster as well....

I'll do some more testing to confirm - mainly writeback only on /home
vs root and also on my laptop. Is this a bug in ext3 then, or is
ordered mode supposed to be so slow / problematic on desktop systems?
What problems might occur when using writeback mode? I'm a bit
concerned about the following comment from the mount manual:

It guarantees internal filesystem integrity, however it can
allow old data to appear in files after a crash and journal recovery.

By the way, to use writeback on the root filesystem, setting
data=writeback in fstab only is not sufficient. As 'man mount' states:

To use modes other than ordered on the root filesystem, pass the
mode to the kernel as boot parameter, e.g. rootflags=data=journal.

- Johannes

On Wed, Jun 23, 2010 at 11:52 AM, Johannes H. Jensen
<email address hidden> wrote:
> I haven't tried writeback, no. Is it possible to remount with this
> option, or do I need to modify fstab and reboot?
>
> - Johannes
>
>
> On Wed, Jun 23, 2010 at 10:00 AM, Peter Hoeg <email address hidden> wrote:
>> Have you tried mounting the filesystems with writeback instead of
>> ordered?
>>
>> /peter
>>
>> On Wed, Jun 23, 2010 at 15:42, Johannes H. Jensen <email address hidden> wrote:
>>> I just tested with the anticipatory scheduler on the stock Ubuntu
>>> 2.6.32:
>>>
>>> # echo anticipatory > /sys/block/sda/queue/scheduler
>>>
>>> This did not seem to have any effect - the problem was still very much
>>> present.
>>>
>

Revision history for this message
Ravindran K (ravindran-k) wrote :
Download full text (3.4 KiB)

On Thu, Jun 24, 2010 at 1:48 AM, Johannes H. Jensen
<email address hidden>wrote:

> So I just tested writeback on my desktop computer which exhibits the
> same problems. I mounted both the root filesystem and /home with
> data=writeback (ext3).
>
> So far the difference is *huge*! The system is much more responsive -
> I'm writing this while 'stress -d 4' is running in the background. The
> same applies to the dd test - all apps respond almost instantly with
> writeback, as opposed to sluggish and hanging with ordered.
> Applications open much faster as well....
>
> I'll do some more testing to confirm - mainly writeback only on /home
> vs root and also on my laptop. Is this a bug in ext3 then, or is
> ordered mode supposed to be so slow / problematic on desktop systems?
> What problems might occur when using writeback mode? I'm a bit
> concerned about the following comment from the mount manual:
>
> It guarantees internal filesystem integrity, however it can
> allow old data to appear in files after a crash and journal recovery.
>
> By the way, to use writeback on the root filesystem, setting
> data=writeback in fstab only is not sufficient. As 'man mount' states:
>
> To use modes other than ordered on the root filesystem, pass the
> mode to the kernel as boot parameter, e.g. rootflags=data=journal.
>
> - Johannes
>
>
> On Wed, Jun 23, 2010 at 11:52 AM, Johannes H. Jensen
> <email address hidden> wrote:
> > I haven't tried writeback, no. Is it possible to remount with this
> > option, or do I need to modify fstab and reboot?
> >
> > - Johannes
> >
> >
> > On Wed, Jun 23, 2010 at 10:00 AM, Peter Hoeg <email address hidden> wrote:
> >> Have you tried mounting the filesystems with writeback instead of
> >> ordered?
> >>
> >> /peter
> >>
> >> On Wed, Jun 23, 2010 at 15:42, Johannes H. Jensen <
> <email address hidden>> wrote:
> >>> I just tested with the anticipatory scheduler on the stock Ubuntu
> >>> 2.6.32:
> >>>
> >>> # echo anticipatory > /sys/block/sda/queue/scheduler
> >>>
> >>> This did not seem to have any effect - the problem was still very much
> >>> present.
> >>>
> >
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Invalid
> Status in “linux” package in Ubuntu: Confirmed
> Status in “linux-source-2.6.22” package in Ubuntu: Won't Fix
>
> Bug description:
> Binary package hint: linux-source-2.6.22
>
> When compared with 2.6.15 in feisty, heavy disk I/O causes increased iowait
> times and affects desktop responsiveness in 2.6.22
>
> this appears to be a regression from 2.6.15 where iowait is much lower and
> desktop responsiveness is unaffected with the same I/O load
>
> Easy to reproduce with tracker - index the same set of files with 2.6.15
> kernel and 2.6.22 kernel and the difference in desktop responsiveness is
> massive
>
> I have not confirmed if a non-tracker process which does heavy disk i/o
> (especially writing) replicates this yet - will do further investigation
> soon
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/131094/+s...

Read more...

Revision history for this message
Ritesh Raj Sarraf (rrs) wrote :

On Thursday 24 Jun 2010 11:15:56 Ravindran K wrote:
> I'm using Ext4 and when I try to use data=writeback for my root partiton
> (it was ext3 and converted to ext4), I get a error while booting which
> indicates "unable to change mode from ordered to writeback while
> remounting".. I think it is another bug.. Anyone else seeing this?

Yes, I had seen the same bug. My setup was a fresh install with ext4 file
system.

--
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."

Revision history for this message
Peter Hoeg (peterhoeg) wrote :
Download full text (3.9 KiB)

Ravindran,

please see Johannes' comment

> To use modes other than ordered on  the  root filesystem,  pass the
> mode to the kernel as boot parameter, e.g. rootflags=data=journal.

/peter

On Thu, Jun 24, 2010 at 13:45, Ravindran K <email address hidden> wrote:
> On Thu, Jun 24, 2010 at 1:48 AM, Johannes H. Jensen
> <email address hidden>wrote:
>
>> So I just tested writeback on my desktop computer which exhibits the
>> same problems. I mounted both the root filesystem and /home with
>> data=writeback (ext3).
>>
>> So far the difference is *huge*! The system is much more responsive -
>> I'm writing this while 'stress -d 4' is running in the background. The
>> same applies to the dd test - all apps respond almost instantly with
>> writeback, as opposed to sluggish and hanging with ordered.
>> Applications open much faster as well....
>>
>> I'll do some more testing to confirm - mainly writeback only on /home
>> vs root and also on my laptop. Is this a bug in ext3 then, or is
>> ordered mode supposed to be so slow / problematic on desktop systems?
>> What problems might occur when using writeback mode? I'm a bit
>> concerned about the following comment from the mount manual:
>>
>> It  guarantees  internal  filesystem integrity,  however  it  can
>> allow old data to appear in files after a crash and journal recovery.
>>
>> By the way, to use writeback on the root filesystem, setting
>> data=writeback in fstab only is not sufficient. As 'man mount' states:
>>
>> To use modes other than ordered on  the  root filesystem,  pass the
>> mode to the kernel as boot parameter, e.g. rootflags=data=journal.
>>
>> - Johannes
>>
>>
>> On Wed, Jun 23, 2010 at 11:52 AM, Johannes H. Jensen
>>  <email address hidden> wrote:
>> > I haven't tried writeback, no. Is it possible to remount with this
>> > option, or do I need to modify fstab and reboot?
>> >
>> > - Johannes
>> >
>> >
>> > On Wed, Jun 23, 2010 at 10:00 AM, Peter Hoeg <email address hidden> wrote:
>> >> Have you tried mounting the filesystems with writeback instead of
>> >> ordered?
>> >>
>> >> /peter
>> >>
>> >> On Wed, Jun 23, 2010 at 15:42, Johannes H. Jensen <
>> <email address hidden>> wrote:
>> >>> I just tested with the anticipatory scheduler on the stock Ubuntu
>> >>> 2.6.32:
>> >>>
>> >>> # echo anticipatory > /sys/block/sda/queue/scheduler
>> >>>
>> >>> This did not seem to have any effect - the problem was still very much
>> >>> present.
>> >>>
>> >
>>
>> --
>> Heavy Disk I/O harms desktop responsiveness
>> https://bugs.launchpad.net/bugs/131094
>> You received this bug notification because you are a direct subscriber
>> of the bug.
>>
>> Status in The Linux Kernel: Invalid
>> Status in “linux” package in Ubuntu: Confirmed
>> Status in “linux-source-2.6.22” package in Ubuntu: Won't Fix
>>
>> Bug description:
>> Binary package hint: linux-source-2.6.22
>>
>> When compared with 2.6.15 in feisty, heavy disk I/O causes increased iowait
>> times and affects desktop responsiveness in 2.6.22
>>
>> this appears to be a regression from 2.6.15 where iowait is much lower and
>> desktop responsiveness is unaffected with the same I/O load
>>
>> Easy to reproduce with tracker - index the same set of files wi...

Read more...

Revision history for this message
John Baptist (jepst79) wrote :

My understanding of the writeback option is that it greatly decreases the file system's tolerance for crashes, and could result in the file system being put in an inconsistent state, with resulting data loss. This seems a high price to pay, so this is clearly not a good solution.

I would also like to point out that performance degrades even if I am only doing heavy reads, but not writes. In that case, the writeback option shouldn't improve performance at all.

Revision history for this message
Johannes H. Jensen (joh) wrote :
Download full text (3.8 KiB)

Ravindran,

Did you boot with the kernel parameter rootflags=data=writeback?

- Johannes

On Thu, Jun 24, 2010 at 7:45 AM, Ravindran K <email address hidden> wrote:
> On Thu, Jun 24, 2010 at 1:48 AM, Johannes H. Jensen
> <email address hidden>wrote:
>
>> So I just tested writeback on my desktop computer which exhibits the
>> same problems. I mounted both the root filesystem and /home with
>> data=writeback (ext3).
>>
>> So far the difference is *huge*! The system is much more responsive -
>> I'm writing this while 'stress -d 4' is running in the background. The
>> same applies to the dd test - all apps respond almost instantly with
>> writeback, as opposed to sluggish and hanging with ordered.
>> Applications open much faster as well....
>>
>> I'll do some more testing to confirm - mainly writeback only on /home
>> vs root and also on my laptop. Is this a bug in ext3 then, or is
>> ordered mode supposed to be so slow / problematic on desktop systems?
>> What problems might occur when using writeback mode? I'm a bit
>> concerned about the following comment from the mount manual:
>>
>> It  guarantees  internal  filesystem integrity,  however  it  can
>> allow old data to appear in files after a crash and journal recovery.
>>
>> By the way, to use writeback on the root filesystem, setting
>> data=writeback in fstab only is not sufficient. As 'man mount' states:
>>
>> To use modes other than ordered on  the  root filesystem,  pass the
>> mode to the kernel as boot parameter, e.g. rootflags=data=journal.
>>
>> - Johannes
>>
>>
>> On Wed, Jun 23, 2010 at 11:52 AM, Johannes H. Jensen
>>  <email address hidden> wrote:
>> > I haven't tried writeback, no. Is it possible to remount with this
>> > option, or do I need to modify fstab and reboot?
>> >
>> > - Johannes
>> >
>> >
>> > On Wed, Jun 23, 2010 at 10:00 AM, Peter Hoeg <email address hidden> wrote:
>> >> Have you tried mounting the filesystems with writeback instead of
>> >> ordered?
>> >>
>> >> /peter
>> >>
>> >> On Wed, Jun 23, 2010 at 15:42, Johannes H. Jensen <
>> <email address hidden>> wrote:
>> >>> I just tested with the anticipatory scheduler on the stock Ubuntu
>> >>> 2.6.32:
>> >>>
>> >>> # echo anticipatory > /sys/block/sda/queue/scheduler
>> >>>
>> >>> This did not seem to have any effect - the problem was still very much
>> >>> present.
>> >>>
>> >
>>
>> --
>> Heavy Disk I/O harms desktop responsiveness
>> https://bugs.launchpad.net/bugs/131094
>> You received this bug notification because you are a direct subscriber
>> of the bug.
>>
>> Status in The Linux Kernel: Invalid
>> Status in “linux” package in Ubuntu: Confirmed
>> Status in “linux-source-2.6.22” package in Ubuntu: Won't Fix
>>
>> Bug description:
>> Binary package hint: linux-source-2.6.22
>>
>> When compared with 2.6.15 in feisty, heavy disk I/O causes increased iowait
>> times and affects desktop responsiveness in 2.6.22
>>
>> this appears to be a regression from 2.6.15 where iowait is much lower and
>> desktop responsiveness is unaffected with the same I/O load
>>
>> Easy to reproduce with tracker - index the same set of files with 2.6.15
>> kernel and 2.6.22 kernel and the difference in desktop responsiveness is
>> massive
...

Read more...

Revision history for this message
Johannes H. Jensen (joh) wrote :

Unfortunately, this does not seem to be the case on my ThinkPad X61. I
did not see any noticeable difference between writeback and ordered
mode. With writeback, interactivity is still sluggish during disk
writes. Applications hang, interfaces slow to respond etc. So clearly
this cannot be the main issue...

- Johannes

On Wed, Jun 23, 2010 at 10:18 PM, Johannes H. Jensen
<email address hidden> wrote:
> So I just tested writeback on my desktop computer which exhibits the
> same problems. I mounted both the root filesystem and /home with
> data=writeback (ext3).
>
> So far the difference is *huge*! The system is much more responsive -
> I'm writing this while 'stress -d 4' is running in the background. The
> same applies to the dd test - all apps respond almost instantly with
> writeback, as opposed to sluggish and hanging with ordered.
> Applications open much faster as well....
>
> I'll do some more testing to confirm - mainly writeback only on /home
> vs root and also on my laptop. Is this a bug in ext3 then, or is
> ordered mode supposed to be so slow / problematic on desktop systems?
> What problems might occur when using writeback mode? I'm a bit
> concerned about the following comment from the mount manual:
>
> It  guarantees  internal  filesystem integrity,  however  it  can
> allow old data to appear in files after a crash and journal recovery.
>
> By the way, to use writeback on the root filesystem, setting
> data=writeback in fstab only is not sufficient. As 'man mount' states:
>
> To use modes other than ordered on  the  root filesystem,  pass the
> mode to the kernel as boot parameter, e.g. rootflags=data=journal.
>
> - Johannes
>
>
> On Wed, Jun 23, 2010 at 11:52 AM, Johannes H. Jensen
> <email address hidden> wrote:
>> I haven't tried writeback, no. Is it possible to remount with this
>> option, or do I need to modify fstab and reboot?
>>
>> - Johannes
>>
>>
>> On Wed, Jun 23, 2010 at 10:00 AM, Peter Hoeg <email address hidden> wrote:
>>> Have you tried mounting the filesystems with writeback instead of
>>> ordered?
>>>
>>> /peter
>>>
>>> On Wed, Jun 23, 2010 at 15:42, Johannes H. Jensen <email address hidden> wrote:
>>>> I just tested with the anticipatory scheduler on the stock Ubuntu
>>>> 2.6.32:
>>>>
>>>> # echo anticipatory > /sys/block/sda/queue/scheduler
>>>>
>>>> This did not seem to have any effect - the problem was still very much
>>>> present.
>>>>
>>
>

Revision history for this message
Olivier Gagnon (procule) wrote :

Why is this bug only at "Medium" importance ? It should be critical. It makes Ubuntu almost non usable. When doing an apt-get upgrade the packages, I can't even do anything else since the system freezes and the windows greys out. I suggest we change that bug to critical.

Revision history for this message
psypher (psypher246) wrote :

I have also tried writeback and journal mode. Writeback provides very minimal improvement, not enough to make it worth my while to run always. Changing between ATA and AHCI mode makes no difference as well as changing the scheduler from cfg to anticipatory or deadline.

I am testing this on a Dell Precision M6300 Laptop with SATA drive, but I have experienced this issue on all my various types of PC's since at least Gusty or Intrepid.

If this thread has become too large to be of any use what is the best way to proceed? If this is a collection of bugs can we at least make a list of the separate bugs to track and get upstream focus on?

Thanks

Revision history for this message
psypher (psypher246) wrote :

Launching a virtual machine and having my pc hang again has prompted me to come back her. Please can someone suggest what is the next steps. There is no activity on this thread, has new bugs been logged, what the story?

Thanks

Revision history for this message
Exquisite Dead Guy (ben-forlent) wrote :

I feel your pain. This bug affects me just about every day. My computer which is a fast system with plenty of RAM starts out lightning fast and over the course of the day gets slower and slower until it's so unusable I have to reboot. When it slows down 'top' doesn't really show anything but a really high io wait (usually about 80%). I always have plenty of free RAM. I thought it was a bad harddrive, so I bought a new one and the problem is still there.

I found if I create a very small swap file (like 32 megabytes) just so the system can see some swap space it will freeze slightly less often (I can sometimes go two days on this configuration), but the problem is still there.

I'm having to reboot daily like I'm running Windows ME or something :/

Revision history for this message
Jeremy Nickurak (nickurak) wrote :

I'm having this issue too, with x86-64 Lucid.

I'm on a 1.83Ghz Core2 Duo with 1.5gigs of ram, 2 gigs of swap, and a fast SATA hard drive.

This feels very much what would happen with an old computer when DMA was disabled... but of course this is a SATA hard drive, and I don't know how to confirm if it's configured properly.

It's plenty fast after a reboot, but at some point, it just gets barely usable. At apt-get upgrade will generally trigger it. Once it's there, it seems like a little hard drive IO and CPU just don't mix any more (as if DMA was disabled)

Revision history for this message
Charles Cazabon (charlesc-web-register-launchpad-net) wrote :

I used to be affected quite badly by this problem, through all of Intrepid and into Jaunty. My system's no longer affected.

What eventually appeared to resolve the problem for me was a combination of newer kernels (somewhere in Jaunty's updates, sorry I don't have a version to reference) and a motherboard BIOS update - everything related to SATA disk access got noticeably more stable with newer BIOSes. Jaunty became quite usable, and I haven't had any problems in Karmic.

So to anyone currently still experiencing this problem: see if there is a BIOS update available for your motherboard, and ensure you're running the latest kernel in Jaunty/Karmic. If you're still pre-Jaunty, consider upgrading.

Just my $0.02.

Revision history for this message
Olivier Gagnon (procule) wrote :

For me too every time I do an apt-get upgrade, I have to let the
machine there for a while because it becomes unusable. The mouse is
always freezing and everything is lagging with the iowait at avoir
80%.

I have a AMD Athlon 64 with 2 gigs of RAM and a SATA drive. Changed
the harddrive too and the problem is still there.

Very frustating.

On Thu, Jul 22, 2010 at 11:01 AM, Jeremy Nickurak
<email address hidden> wrote:
> I'm having this issue too, with x86-64 Lucid.
>
> I'm on a 1.83Ghz Core2 Duo with 1.5gigs of ram, 2 gigs of swap, and a
> fast SATA hard drive.
>
> This feels very much what would happen with an old computer when DMA was
> disabled... but of course this is a SATA hard drive, and I don't know
> how to confirm if it's configured properly.
>
> It's plenty fast after a reboot, but at some point, it just gets barely
> usable. At apt-get upgrade will generally trigger it. Once it's there,
> it seems like a little hard drive IO and CPU just don't mix any more (as
> if DMA was disabled)
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Invalid
> Status in “linux” package in Ubuntu: Confirmed
> Status in “linux-source-2.6.22” package in Ubuntu: Won't Fix
>
> Bug description:
> Binary package hint: linux-source-2.6.22
>
> When compared with 2.6.15 in feisty, heavy disk I/O causes increased iowait times and affects desktop responsiveness in 2.6.22
>
> this appears to be a regression from 2.6.15 where iowait is much lower and desktop responsiveness is unaffected with the same I/O load
>
> Easy to reproduce with tracker - index the same set of files with 2.6.15 kernel and 2.6.22 kernel and the difference in desktop responsiveness is massive
>
> I have not confirmed if a non-tracker process which does heavy disk i/o (especially writing) replicates this yet - will do further investigation soon
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/131094/+subscribe
>

Revision history for this message
AvitarX (ddwornik) wrote :

I just had to restore some data from a bad HD (read errors on my laptop).

When copying from the NTFS partition to my new NTFS partition (files, not image, and in Linux), I got about 7-8 MB/sec of throughput, and a responsive system.

When doing from EXT4 to EXT4, I was getting about 20MB/sec and terrible responsiveness.

I will say that overall it is less problematic than it has been though, no total lock, and I can run my updates and surf the web fairly effectively.

This is using 10.10, so it looks like the bug is still around.

Revision history for this message
psypher (psypher246) wrote :

Hi All,

Seems that there is quite a bit of life again on https://bugzilla.kernel.org/show_bug.cgi?id=12309 which seems to be the right bug for this issue.

Some guys are getting good results when turning off swap completely.

Please try the following and report back if this improves your system responsiveness:

sudo apt-get install stress
sudo swapoff -a
stress -d 1

Now go use you machine.

swapoff will turn off all swap for a while. I have 3GB of ram so this is not a problem. If you have less than 1GB you might experience more of a slowdown if you use a lot of ram.
Stress makes the hard drive read and write continuously, so it simulates heavy disk IO.

If you reboot swap will be turned on again. You would have to hash out the swap line in your fstab to stop that from happening.
Setting swappiness to a low value or 0 does not make a difference. Have to turn it off.

Thanks

Revision history for this message
Ronan Jouchet (ronj) wrote :

Hello!

Phoronix reports good results from patches by Wu Fengguang and Kosaki Motohiro have (article: http://www.phoronix.com/scan.php?page=news_item&px=ODQ3OQ , original lkml post: http://lkml.org/lkml/2010/8/1/40 ). Seems great news to me, maybe this could help closing this bug.

1. Was anyone here able to test the patches and confirm the impact?
2. Any chance to see the patches incorporated into Maverick's kernel sauce?
3. Any chance to see the patches backported into Lucid's kernel sauce?

Revision history for this message
psypher (psypher246) wrote :

Don't think we can rejoice yet, there are some mixed results on the above mentioned kernel bug. I haven't had a chance to test yet. Think it might be a step in the right direction, but I would not mark this as fixed quite yet.

Revision history for this message
Søren Holm (sgh) wrote :

I would say. Pull the patch into maverick and see if it makes a difference for the people running maverick now. I performance degrade because of it remove the patches. It could also help upstream better if more is testing it before including it into 2.6.36.

Changed in linux (Ubuntu):
assignee: Ubuntu Kernel Team (ubuntu-kernel-team) → nobody
Changed in linux-source-2.6.22 (Ubuntu):
assignee: Ben Collins (ben-collins) → nobody
Revision history for this message
Martin Meyer (elreydetodo) wrote :

If we can't pull the patch fully into Maverick for testing, can we possibly have someone setup a PPA containing a normal kernel for Lucid and Maverick except for having this patch applied to it? I would love to see if this patch helps the responsiveness of my desktop at work. I am always under memory pressure because I keep a VM running, and I frequently have heavy disk I/O situations due to log parsing. I think I'm a great tester for this.

The problem I'm already foreseeing here is that there isn't really a quantitative test for success. All I can say is whether or not my desktop "feels" more responsive. How would I actually measure responsiveness? Those types of issues are nearly impossible to reproduce reliably IMO.

Revision history for this message
Brian Rogers (brian-rogers) wrote :

I've set up a PPA here: https://launchpad.net/~brian-rogers/+archive/io-kernel

A Maverick kernel is building right now. The patch didn't cleanly apply to Lucid's kernel. Is there a version of the patch that's already been backported to 2.6.32?

Revision history for this message
Olivier Gagnon (procule) wrote :

I've tried a version of maverick kernel ported to Lucid, version
2.6.35-14.20~lucid2. It was supposed
to clear the problem but nop. I still have issues everytime I have
moderate to high I/O on the filesystem.

Tomorrow I will try another filesystem than ext4.

O. Gagnon

On Wed, Aug 18, 2010 at 1:56 AM, Brian Rogers <email address hidden> wrote:
> I've set up a PPA here: https://launchpad.net/~brian-rogers/+archive/io-
> kernel
>
> A Maverick kernel is building right now. The patch didn't cleanly apply
> to Lucid's kernel. Is there a version of the patch that's already been
> backported to 2.6.32?
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Invalid
> Status in “linux” package in Ubuntu: Confirmed
> Status in “linux-source-2.6.22” package in Ubuntu: Won't Fix
>
> Bug description:
> Binary package hint: linux-source-2.6.22
>
> When compared with 2.6.15 in feisty, heavy disk I/O causes increased iowait times and affects desktop responsiveness in 2.6.22
>
> this appears to be a regression from 2.6.15 where iowait is much lower and desktop responsiveness is unaffected with the same I/O load
>
> Easy to reproduce with tracker - index the same set of files with 2.6.15 kernel and 2.6.22 kernel and the difference in desktop responsiveness is massive
>
> I have not confirmed if a non-tracker process which does heavy disk i/o (especially writing) replicates this yet - will do further investigation soon
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/131094/+subscribe
>

Revision history for this message
ReneS (mail-03146f06) wrote :

I tried Brian's kernel (64bit, ext3). It feels sluggish and the overall IO impression is slow. I created memory stress and the desktop started to be unresponsive again, but it seems to happen later. But overall, single applications are less responsive now.

Sorry, I cannot quantify it in any way.

Revision history for this message
psypher (psypher246) wrote :

Some more news on this issue. Some new patches have arose:

http://www.phoronix.com/scan.php?page=news_item&px=ODU0OQ

This link has claims that the 1st patches have made a difference, although not a lot. The new patches claim big difference.

Revision history for this message
Brian Rogers (brian-rogers) wrote :

I've updated my PPA to include the new scheduler patches with version 2.6.35-iofix+19.28. For Lucid, I've provided both a patched and unpatched backport of Maverick's 2.6.35 kernel so they can be compared with each other to see the effect of just the patches.

Revision history for this message
Trey Blancher (ectospasm) wrote :

I've installed the iofix+19.28 kernel for Lucid from Brian Rogers. So far, it seems to work. When the machine is booting up, it still seems to have the problem, however. iotop (which now works appropriately) reports the [kdmflush] service as consuming 99.99% I/O when the system is unresponsive, but I haven't noticed it in the past 24hrs or so (my uptime is less than 48hrs currently). I will continue to monitor, and I'll post back with results.

Revision history for this message
Brian Rogers (brian-rogers) wrote :

That's a 2.6.35 kernel, and Lucid has a 2.6.32 kernel by default. So you can't tell whether 2.6.35 or the patches I added on top of it solved the problem, unless you also test the unpatched 2.6.35.

Revision history for this message
Trey Blancher (ectospasm) wrote :

OK, I'll test the unpatched 2.6.35 and report back.

Revision history for this message
Trey Blancher (ectospasm) wrote :

OK, it looks like the problem is fixed for me in both the stock 2.6.35 and +iofix provided by Brian Rogers. There's still [kdmflush] and a bunch of other programs causing a lot of I/O wait (according to top and iotop), but the system is MUCH more responsive. Usually when the problem occurred, the HDD light would go solid for several moments before I could use my system again. Now, the pause is brief, much less than two seconds if it ceases being responsive at all. I mentioned earlier that the problem still occurred shortly after boot, but that was the HDD light I was referring to, not the perceived responsiveness of the system. So for me, the solution is to upgrade to 2.6.35.

Revision history for this message
psypher (psypher246) wrote :

I have been running the 2.6.35 + iofix kernel for more than a week and unfortunately I am unable to see any difference. For example upon boot and logging into the desktop, my ubuntuone account will start doing it's syncing thing. I have about 20GB in the u1 folder and it takes about 5-10 minutes every boot to scan all the files and check for changes and sync etc. During that time the hard drive thrashes like crazy and when monitoring iotop the ubuntuone processes are reading and writing to the disk at about 400KB/s. During this process my PC is extremely slow and unresponsive. The default test is to boot up, start firefox and try a click a bookmark folder icon on my toolbar, which drops down a list of bookmarks. Firefox starts up ok, but it takes about 5 minutes for the drop down list to open once I click on it. No really 5 minutes.

Another default test is to boot up, let u1 do it's thing and quiet down, the open firefox. Then I start "stress -d 1" to stress out the disk and try and browse using firefox. I open Google Reader and try browse through my RSS feeds. While stress is running it practically impossible to to use or browse in firefox. Note stress is now reading and writing at 4-10MB/s. There seems no difference in responsiveness between the disk writing at 400KB/s or 4MB/s. And there seems no difference between the default kernel or this patched one.

Very sad :(

Are the guys who are seeing a difference doing anything else? Turning off swap? Changing the default scheduler? Why do some people see an improvement? Even though the improvements are still not good enough.

Revision history for this message
Virgil Brummond (uraharakisuke153) wrote :

This sounds like something hard to lock down. I did a bit of testing using audio as the main issue. Using the generic kernel, audio would stutter when the system went into swap and had any cpu load. The server kernel allows audio to play solid, and generally things seem responsive.

Revision history for this message
Virgil Brummond (uraharakisuke153) wrote :

The iofix kernel does seem to help responsiveness. Problem when anything starts to page to swap it goes to pieces, and nothing works much. I think the problem might be with the CFQ scheduler.

Revision history for this message
psypher (psypher246) wrote :

I have tried different schedulers and made no difference. It was suggested on the kernel bug page. As well as turning off swap. Some had better experiences,2.6.32 kernel, makes no difference to me.

Revision history for this message
Exquisite Dead Guy (ben-forlent) wrote :

psypher: Turning off swap definitely doesn't help for me as I've tried it with or without swap. It's actually a little better with swap turned on. It seems to get bad when I've been running for a while and free memory gets < 500MB or so. For some reason using the firefox all-in-one gestures add on and using middle-click to scroll down the page really aggravates the problem.

I really wish someone could find the cause of this. Sub-daily reboots are reminding me of my Windows ME days.

Revision history for this message
Peter Hoeg (peterhoeg) wrote :

For whatever it's worth, I'm seeing a separate issue where X leaks memory like crazy, which obviously has the interesting effect that I see impressive disk thrashing, and this is with swap turned off. As soon as free memory drops to a few 100 MBs, then my HDD light is pretty much lit up solid and everything slows to a complete crawl. This is with the .35-22 kernel (standard maverick generic kernel).

So something generates a lot of IO (what I still don't know) and when that happens, nothing works except for the 3 finger salute.

I'm willing to try pretty much anything if somebody can tell me what I should do or what information to provide.

Revision history for this message
Exquisite Dead Guy (ben-forlent) wrote :

Peter Hoeg: Mine behaves almost exactly like you're describing. I've found if I create a 512MB swapfile and every time it starts getting a bit laggy like it's about to freeze up, I tab over to a terminal window and run:

sudo swapon /path/to/swapfile
sudo swapoff -a

Something about turning on the swapfile and turning it back off does something that usually buys me about an hour before it starts locking up again. I know it's not great, but it's better than rebooting several times a day. I really wish someone would fix this problem, it's been consistent for me and a daily struggle/annoyance on 3 releases now. Until this problem is fixed, Ubuntu = Windows ME :(

Revision history for this message
Peter Hoeg (peterhoeg) wrote :

I'll try that on the box tomorrow.

The other odd thing is that turning off swap is extremely slow. As an example if I have about 60% memory used then it will start swapping a few 100 MBs. If I then do a "swapoff -a", then the box obviously starts swapping in, but it happens at approximately 500KB/s.

Revision history for this message
daneel (daneel) wrote :

Have try swappiness = 0 ?

2010/10/21 Peter Hoeg <email address hidden>:
> I'll try that on the box tomorrow.
>
> The other odd thing is that turning off swap is extremely slow. As an
> example if I have about 60% memory used then it will start swapping a
> few 100 MBs. If I then do a "swapoff -a", then the box obviously starts
> swapping in, but it happens at approximately 500KB/s.
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Invalid
> Status in “linux” package in Ubuntu: Confirmed
> Status in “linux-source-2.6.22” package in Ubuntu: Won't Fix
>
> Bug description:
> Binary package hint: linux-source-2.6.22
>
> When compared with 2.6.15 in feisty, heavy disk I/O causes increased iowait times and affects desktop responsiveness in 2.6.22
>
> this appears to be a regression from 2.6.15 where iowait is much lower and desktop responsiveness is unaffected with the same I/O load
>
> Easy to reproduce with tracker - index the same set of files with 2.6.15 kernel and 2.6.22 kernel and the difference in desktop responsiveness is massive
>
> I have not confirmed if a non-tracker process which does heavy disk i/o (especially writing) replicates this yet - will do further investigation soon
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/131094/+subscribe
>

Revision history for this message
Peter Hoeg (peterhoeg) wrote :

I haven't, no, but what effect would swappiness have if there is no swap anyway?

/Peter

On Thu, Oct 21, 2010 at 23:54, daneel <email address hidden> wrote:
> Have try swappiness = 0 ?
>
> 2010/10/21 Peter Hoeg <email address hidden>:
>> I'll try that on the box tomorrow.
>>
>> The other odd thing is that turning off swap is extremely slow. As an
>> example if I have about 60% memory used then it will start swapping a
>> few 100 MBs. If I then do a "swapoff -a", then the box obviously starts
>> swapping in, but it happens at approximately 500KB/s.
>>
>> --
>> Heavy Disk I/O harms desktop responsiveness
>> https://bugs.launchpad.net/bugs/131094
>> You received this bug notification because you are a direct subscriber
>> of the bug.
>>
>> Status in The Linux Kernel: Invalid
>> Status in “linux” package in Ubuntu: Confirmed
>> Status in “linux-source-2.6.22” package in Ubuntu: Won't Fix
>>
>> Bug description:
>> Binary package hint: linux-source-2.6.22
>>
>> When compared with 2.6.15 in feisty, heavy disk I/O causes increased iowait times and affects desktop responsiveness in 2.6.22
>>
>> this appears to be a regression from 2.6.15 where iowait is much lower and desktop responsiveness is unaffected with the same I/O load
>>
>> Easy to reproduce with tracker - index the same set of files with 2.6.15 kernel and 2.6.22 kernel and the difference in desktop responsiveness is massive
>>
>> I have not confirmed if a non-tracker process which does heavy disk i/o (especially writing) replicates this yet - will do further investigation soon
>>
>> To unsubscribe from this bug, go to:
>> https://bugs.launchpad.net/linux/+bug/131094/+subscribe
>>
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
exactt (giesbert) wrote :
Revision history for this message
psypher (psypher246) wrote :

actually hit slashdot quite a while ago, might have missed my comments:

http://it.slashdot.org/article.pl?sid=09/01/15/049201

Latest comment on the bugzilla report: https://bugzilla.kernel.org/show_bug.cgi?id=12309 states that new patches has fixed the issue.

I won't jump up for joy yet until I see it for myself. Can anyone do a ppa for the newer kernel 2.6.36 to test?

FYI turning off swap, setting swappiness or changing the scheduler makes no difference to me. Always slow.

If messing with the swap works for you, try: swapoff -a && swapon -a as well

Revision history for this message
Peter Hoeg (peterhoeg) wrote :

Regarding the PPA, you can always get the new kernel from here:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.36-maverick/

/Peter

Revision history for this message
KhaaL (khaal) wrote :

Regarding the kernel PPA, is the newest kernel there patched with the desktop responsiveness fix?

Revision history for this message
Peter Hoeg (peterhoeg) wrote :

I haven't looked through the 3 patch files in that directory, but
according to this guy:
https://bugzilla.kernel.org/show_bug.cgi?id=12309#c510 stock .36
fixes the problem.

/Peter

Revision history for this message
Rocko (rockorequin) wrote :

Yes, the stock 2.6.36 kernel (which is in the weekly builds linked to in comment #358), has a patch to improve responsiveness (this is from http://kernelnewbies.org/Linux_2_6_36):

1.7. Improve VM-related desktop responsiveness

There are some cases where a desktop system could be really unresponsive while doing things such as writing to a very slow USB storage device and some memory pressure. This release includes a small patch that improves the VM heuristics to solve this problem.

ie it helps improve responsiveness for a particular case.

FWIW, I haven't noticed any major desktop slowdowns on my system with 2.6.36 over the last 6 days.

Revision history for this message
exactt (giesbert) wrote :

maybe we have been waiting for this patch: http://www.phoronix.com/scan.php?page=article&item=linux_2637_video&num=1

could it be back-ported to 10.10?

Revision history for this message
Ofer Chen (oferchen) wrote :

i wish this look like a major improvement I'm tempted compiling the kernel myself...
is there a bleeding edge kernel ppa for Maverick?

Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Do people actually read what others write before asking and commenting? PPA is at:
http://kernel.ubuntu.com/~kernel-ppa/mainline/

Revision history for this message
Ofer Chen (oferchen) wrote :

*** mainline kernels does not include Ubuntu specific drivers.

I ended up installed Natty Narwhal, performance are much better I'll stay with Natty's kernel for now... ;)

Revision history for this message
god (humper) wrote :

Is there some ppa with ubuntu-specific kernel + backport fix for 10.10?

Revision history for this message
Søren Holm (sgh) wrote :
Revision history for this message
Søren Holm (sgh) wrote :

The performance is amazing. On my 1.6 GHz dual core systemI tried compiling a kernel with -j64. 2.6.37-rc2 without the patch crawled. Switching windows where a pain. With the patch the system runs smooth.

Revision history for this message
psypher (psypher246) wrote :

Hi Soren,

Any chance of 64bit pkgs?

Thanks

Revision history for this message
daneel (daneel) wrote :

I was trying this patch in Arch Linux and reading some clarifications
in the forum (https://bbs.archlinux.org/viewtopic.php?id=108516).
Apparently, this patch is not about IO performance, but only the
scheduling of process in different tty. So, if you launch all in the
same tty is not going to help at all.

2010/11/18 psypher <email address hidden>:
> Hi Soren,
>
> Any chance of 64bit pkgs?
>
> Thanks
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Invalid
> Status in “linux” package in Ubuntu: Confirmed
> Status in “linux-source-2.6.22” package in Ubuntu: Won't Fix
>
> Bug description:
> Binary package hint: linux-source-2.6.22
>
> When compared with 2.6.15 in feisty, heavy disk I/O causes increased iowait times and affects desktop responsiveness in 2.6.22
>
> this appears to be a regression from 2.6.15 where iowait is much lower and desktop responsiveness is unaffected with the same I/O load
>
> Easy to reproduce with tracker - index the same set of files with 2.6.15 kernel and 2.6.22 kernel and the difference in desktop responsiveness is massive
>
> I have not confirmed if a non-tracker process which does heavy disk i/o (especially writing) replicates this yet - will do further investigation soon
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/131094/+subscribe
>

Revision history for this message
Jamie Lokier (jamie-shareable) wrote :

daneel wrote:
> I was trying this patch in Arch Linux and reading some clarifications
> in the forum (https://bbs.archlinux.org/viewtopic.php?id=108516).
> Apparently, this patch is not about IO performance, but only the
> scheduling of process in different tty. So, if you launch all in the
> same tty is not going to help at all.

True. But as it produces such a difference for tasks which are in
different ttys (i.e. inside terminal windows, and initial daemons),
perhaps it would be good to create scheduling groups for other things
too, such as daemons (do they go in their own group when they detach
from a tty with the patch, or keep the group they had when they were
created?), different classes of background process (especially I/O
kernel tasks), and maybe different X applications?

I presume it's possible to create new scheduling groups manually, to
test the effect on desktop responsiveness? If it's shown to make a
good difference, then it'd be possible to look into whether the kernel
should do so automatically.

Also, there are I/O CFQ cgroups (CONFIG_CFQ_GROUP_IOSCHED) in current
kernels. That may be worth looking into in a similar way.

Revision history for this message
mattismyname (mattismyname) wrote :

Anyone pushing for the TTY grouping patch, please read: http://ck-hack.blogspot.com/2010/11/create-task-groups-by-tty-comment.html

Revision history for this message
Exquisite Dead Guy (ben-forlent) wrote :

I've installed the "alternative" patch ( http://www.webupd8.org/2010/11/alternative-to-200-lines-kernel-patch.html ), and it's made zero difference, still have all the lagging down issues requiring a daily reboot, no change.

Revision history for this message
John Doe (b2109455) wrote :

I have a Latitude D430 and the responsiveness is horrible if there is disk IO. With the scheduler changed from "cfq" to "deadline" for /dev/sda everything is A LOT better. I would say this solved the problem for me.

Revision history for this message
John Doe (b2109455) wrote :

Ubuntu 10.10, that is

Changed in linux:
status: Invalid → Confirmed
Changed in linux:
importance: Unknown → High
Revision history for this message
god (humper) wrote :

It could be a good idea to use ulatencyd once it is mature enough.
https://github.com/poelzi/ulatencyd/

Revision history for this message
god (humper) wrote :

Are there plans to enable CONFIG_SCHED_AUTOGROUP for ubuntu kernels in some ppa?
At least until ubuntu switch to systemd initialization?

Revision history for this message
AvitarX (ddwornik) wrote :

Is ubuntu going to throw out upstart?
On Mar 15, 2011 8:47 AM, "MSU" <email address hidden> wrote:
> Are there plans to enable CONFIG_SCHED_AUTOGROUP for ubuntu kernels in
some ppa?
> At least until ubuntu switch to systemd initialization?
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/131094
>
> Title:
> Heavy Disk I/O harms desktop responsiveness
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/131094/+subscribe

Revision history for this message
Omer Akram (om26er) wrote :

>
> Is ubuntu going to throw out upstart?
>
>
Simple answer: no.

Revision history for this message
AvitarX (ddwornik) wrote :

That's what I assumed, but the previous post tricked me.
On Mar 15, 2011 12:34 PM, "Omer Akram" <email address hidden> wrote:
>>
>> Is ubuntu going to throw out upstart?
>>
>>
> Simple answer: no.
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/131094
>
> Title:
> Heavy Disk I/O harms desktop responsiveness
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/131094/+subscribe

Revision history for this message
tankdriver (stoneraider-deactivatedaccount) wrote :

Testing oneiric beta + updates,
Under high I/O load the mouse pointer has now become a very choppy feeling, (e.g. freezes for 1 second)
can someone confirm this change from natty > oneiric?

Revision history for this message
cometdog (ericctharley) wrote :

Incredibly bad responsiveness under heavy IO for me on Oneiric. My only recent point of comparison is Lucid. Unfortunately it's not completely fair since I had a different HDD setup then. But in any case, desktop gets nearly unusable when starting up a program, etc. Freezes for multiple seconds at a time.

Revision history for this message
Vadim Peretokin (vperetokin) wrote :

Yeah. Anytime a system has to swap, you know it because your desktop
freezes.

Revision history for this message
Ofer Chen (oferchen) wrote :

I switched to using zramswap-enabler instead of a real swap partition it makes things a lot better if you have the ram..

sudo add-apt-repository ppa:shnatsel/zram && sudo apt-get update&& sudo apt-get install zramswap-enabler

Revision history for this message
DAF (dfiguero) wrote : AUTO: Diego Figueroa is out of the office

I am out of the office from Fri 01/21/2011 until Sun 01/08/2012.

Hi,

I will be out of the office from Wednesday December 21 until Monday January
8. If you need urgent assistance with any of my projects please contact my
manager Miguel Marques at extension 22684.

Thank you,

Diego.

Note: This is an automated response to your message "[Bug 131094] Re:
Heavy Disk I/O harms desktop responsiveness" sent on 11/18/2011 4:01:09 PM.

This is the only notification you will receive while this person is away.

Changed in linux:
status: Confirmed → Fix Released
Revision history for this message
Francisco J. Yáñez (fjyaniez) wrote :

5 years later... too late :(

I had to change to another OS after 8 years using linux... I won't get back now.

Revision history for this message
Vadim Peretokin (vperetokin) wrote : Re: [Bug 131094] Re: Heavy Disk I/O harms desktop responsiveness

I don't think it was actually fixed, if you look at the upstream report.
On Jun 11, 2012 5:06 PM, "Francisco J. Yáñez" <email address hidden> wrote:

> 5 years later... too late :(
>
> I had to change to another OS after 8 years using linux... I won't get
> back now.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/131094
>
> Title:
> Heavy Disk I/O harms desktop responsiveness
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/linux/+bug/131094/+subscriptions
>

Revision history for this message
LGB [Gábor Lénárt] (lgb) wrote :

Well, yes, many-many years ago on much more powerless machines, I could play MP3 and we had some kind of compo with friends to be able to interrupt the music by doing I/O. It was quite hard. Then as far as I can tell the situation became more and more worse, which is especially odd that I started to use more and more powerfull machines meanwhile. Now, I can say that almost anything I do which generates some I/O stops the whole desktop, gnome-terminal windows are white (not updated) for long seconds (sometimes even a minute!) sometimes even the mouse can't be moved. And no, it can't be a hw problem as I noticed it on many different machines with totally different hardware (SCSI, "normal" ide/pata, sata .... both of 32 and 64 bit kernels/systems, AMD/intel CPU, etc) and very different kernels and even distributions (well, ubuntu and debian to be precise) during the years. However that's true that the worst came in the last 1-2 years, as far as I can remember, though I could notice getting things worse even before that.

Revision history for this message
Mike Mestnik (cheako) wrote :

I had this issue, I've always had this issue. It get's really bad if your disk is doing bad sector relocation(s)... then the desktop/gui and mouse can freeze for 15minuets.

Revision history for this message
laksdjfaasdf (laksdjfaasdf) wrote :

@Canonical: Why don't you make the lowlatency kernel as the default one instead of generic kernel? This should solve the problem of bad responsiveness correlated with graphical user interface.

Even if the throughput isn't getting better with lowlatency kernel - it feels much faster if your mouse pointer moves _without_ dropouts or menus pop up instantly under heavy disk I/O.

On graphical desktops it's not always the real throughput what makes the system feels fast, but the responsivness! Even if it takes a second longer to copy a big file, your system "feels" much fast if mouse pointer still moves _without_ dropouts or menus pop up instantly.

Revision history for this message
Ronan Jouchet (ronj) wrote :

Interesting proposal. Are you sure about that claim, Felix? Do you
have data to support it?

Now that linux-lowlatency is in universe and is just a build with
different option of the same kernel, it might not be risky at all, and
if that's a real win for responsiveness (which is definitely an
important metric), using -lowlatency by default can be something to
suggest to the kernel team.

Revision history for this message
Jakob Lenfers (jakob-drss) wrote :

Thanks a lot Felix, just as an FYI for others: This helped me a lot. Writing this from an old (was a 08.04 IIRC) and often updated Ubuntu server 12.04 installation and I switched from the server kernel to the lowlatency one. Now I can run updatedb and start Thunderbird while music is running. I'm embarrassed to say that, but I haven't been able to do that (without a lot of ionice -c3) for quite some time. This makes this computer usable for me again. I just hope that the nvidia driver stops making problems with my onboard card soon and my old server & desktop is golden again. :)

Revision history for this message
LGB [Gábor Lénárt] (lgb) wrote :

Ok, but the odd thing that in the "old time" everything was much-much-much better even with regular kernel (so no special low-latency one etc) on much-much weaker hardwares than now :(

Revision history for this message
yarly (ih8junkmai1) wrote :

I agree with comments by Vorname Nachname (post #390). The low-latency kernel provides for a much more responsive desktop. Differences between linux-meta-lowlatency and linux-meta-generic are profound when running in a LUKS environment with FDE.

Revision history for this message
Adam Porter (alphapapa) wrote :

It's very true that years ago I/O latency was much less of a problem
with Linux. When I first started using Debian full-time about ten
years ago, I never had problems with music skipping or anything like
that. I guess in the kernel development since then, throughput has
been prioritized over latency. Nowadays with 3.8 kernels and the same
hardware, it's trivial to make my music player skip under load, even
when its buffer is set to 30000 ms.

I haven't thought of trying the lowlatency kernel, so thanks for that
idea. I will be trying that!

Besides that, I wish Ubuntu would make BFQ the default I/O scheduler
(or at least build it in by default so we can easily switch to it,
instead of having to build kernels or install from third-party repos).
 Check out this video from a year ago:

http://youtu.be/J-e7LnJblm8

Seems obvious to me that BFQ is the way to go for desktops.

I have noticed lately that Deadline seems to result in less music
skipping than CFQ, so I can see why Deadline is the default now. But
Deadline doesn't support ionice, so I can't do things like run backups
or upgrades in the background at minimum I/O priority.

Revision history for this message
penalvch (penalvch) wrote :

Jamie McCracken, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available (not the daily folder, but the one at the top) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.13-rc4

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

no longer affects: linux-source-2.6.22 (Ubuntu)
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Vadim Peretokin (vperetokin) wrote :

IO is still an issue on every Ubuntu machine I've used - whenever it
becomes heavily used, everything else slows down, sometimes drastically.
What is there to test - has anything been done to address it?

Revision history for this message
penalvch (penalvch) wrote :

Vadim Peretokin, so your hardware may be tracked, could you please file a new report by executing the following in a terminal while booted into a Ubuntu repository kernel (not a mainline one) via:
ubuntu-bug linux

For more on this, please read the official Ubuntu documentation:
Ubuntu Bug Control and Ubuntu Bug Squad: https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue
Ubuntu Kernel Team: https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports
Ubuntu Community: https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Thank you for your understanding.

Revision history for this message
vsuarez (vsuarez) wrote :

Can this be related with this issue?

http://lwn.net/Articles/572911/

Revision history for this message
penalvch (penalvch) wrote :

vsuarez, so your hardware may be tracked, could you please file a new report by executing the following in a terminal while booted into a Ubuntu repository kernel (not a mainline one) via:
ubuntu-bug linux

For more on this, please read the official Ubuntu documentation:
Ubuntu Bug Control and Ubuntu Bug Squad: https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue
Ubuntu Kernel Team: https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports
Ubuntu Community: https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Thank you for your understanding.

Revision history for this message
Vadim Peretokin (vperetokin) wrote :

I don't think it is related to http://lwn.net/Articles/572911/ because it
is a 32bit machine.

I'll file the report later when I've got access to the said machine.

Revision history for this message
Adam Niedling (krychek) wrote :

Christopher M. Penalver: are you going to tell all the 165 people that are affected by this bug to open a new bug report for the same issue which is not even hardware related?

If you just took a minute you could test this bug yourself instead of require us to do all that work to test the latest mainline kernel.

I think you are just mass closing linux kernel related bugs that are still valid and affect many people. Some of them have upstream bug reports which indicate that no actual work has been done to address those issues. So why do testing? Even if someone does the testing most likely no work will be done by downstream to fix the issue. So what's the point? I think doing what you're doing is just making more harm than good.

Revision history for this message
penalvch (penalvch) wrote :

Adam Niedling, thank you for your comments regarding them:
"...are you going to tell all the 165 people that are affected by this bug to open a new bug report..."

Given the Bug Description is so vague it's largely useless "heavy disk I/O causes increased iowait times", if one has a performance problem, and for hardware tracking purposes, then one would want to file a new report. For more on this, please read the official Ubuntu documentation:
Ubuntu Bug Control and Ubuntu Bug Squad: https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue
Ubuntu Kernel Team: https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports
Ubuntu Community: https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

"...for the same issue which is not even hardware related?"

This is speculation at best.

"If you just took a minute you could test this bug yourself instead of require us to do all that work to test the latest mainline kernel."

I've never had heavy disk I/O affect desktop responsiveness with my hardware, both with a HDD 3GB RAM, and now SSD with 8GB.

"I think you are just mass closing linux kernel related bugs that are still valid and affect many people."

This is also speculation at best, and incorrect. I've never mass closed any bugs anywhere, and your baseless accusations are not appreciated.

"Some of them have upstream bug reports which indicate that no actual work has been done to address those issues."

One having filed an upstream bug report, on a tracker which has no permission restrictions on who can file, is largely irrelevant if the full hardware isn't known, it hasn't been tested in the latest mainline kernel, it hasn't been bisected if a regression, and doesn't have specific, objective metrics demonstrating the issue.

"So why do testing?"

Testing gets a bug report one step closer to a fix. The best question is why do the complaining, which gets you nowhere? ;)

"Even if someone does the testing most likely no work will be done by downstream to fix the issue."

More incorrect speculation. Downstream has the same information requirements as upstream, as previously noted. No developer is going to take a strong interest in working on any problem, up or down, without it.

"So what's the point? I think doing what you're doing is just making more harm than good."

Wasting time arguing about things previously documented and discussed ad nauseam would be considered doing more harm than good, with the time better spent actually doing the testing and bug report filing previously requested.

If you have further comments, please refrain from making them in this report, as you are not the original reporter, and it already has quite enough "Me too!" and "Why isn't this fixed already?" comments. Instead, you are welcome to contact me directly, and/or redirect them to the appropriate mailing list or forum. http://www.ubuntu.com/support/community/mailinglists might be a good start for determining which mailing list to use.

Thank you for your understanding.

Revision history for this message
Adam Niedling (krychek) wrote :

Thanks for analysing each and every sentence of mine one by one.
Who says only the original reporter can comment on bugs? I'm not the original reporter, I'm just somebody who is affected by this bug which you are trying to close in a very crafty way. It's not a speculation that you're doing this all the time, you did this to 2 or 3 of my own bugs. I'm getting tired of you pasting the same text everywhere. Maybe you're pasting it to hundreds of bugs. There is no effort in pasting some text. However you are asking people to do a lot of work which takes huge effort. Most of the time it's completely unnecessary cause no one has made anything to fix the issue.

"Hey! No developer has ever touched this bug but let's ask the poor user who is suffering from it a ton of questions and half day of working and testing the latest mainline kernel maybe he won't be able to do it or just simply has no idea how to do it so we can close this completely valid bug! And let's just ignore the bug even if the poor user does all that work ha ha ha..... Oh yeah and make sure to paste lots of links about etiquette and what not so I will look official even though I'm not working for Canonical I'm just messing around with people's bugs."

Revision history for this message
Ronan Jouchet (ronj) wrote :

Adam Niedling wrote:
  "I'm just somebody who is affected by this bug which you are trying to close in a very crafty way. It's not a speculation that you're doing this all the time, you did this to 2 or 3 of my own bugs. I'm getting tired of you pasting the same text everywhere. Maybe you're pasting it to hundreds of bugs. There is no effort in pasting some text. However you are asking people to do a lot of work which takes huge effort. Most of the time it's completely unnecessary cause no one has made anything to fix the issue.
  "Hey! No developer has ever touched this bug but let's ask the poor user who is suffering from it a ton of questions and half day of working and testing the latest mainline kernel maybe he won't be able to do it or just simply has no idea how to do it so we can close this completely valid bug! And let's just ignore the bug even if the poor user does all that work ha ha ha..... Oh yeah and make sure to paste lots of links about etiquette and what not so I will look official even though I'm not working for Canonical I'm just messing around with people's bugs."

>> I can definitely recognize some of the behavior described here by Adam, and also suffered from it in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/908691 . In my case I could even pinpoint a specific mainline commit, but my inability to do the non-mainline git bisect requested by M. Penalver meant my request fell in deaf ears. I closed my own bug diplomatically, but it was extremely disappointing experience to see so little response for all the effort I put.

I understand Canonical must have lots of bug triage to do, but I'd too love a little more humanity in processing them. Canned answers and strict protocol don't show a lot of empathy, and don't echo into much user love.

Revision history for this message
penalvch (penalvch) wrote :

Quoting from https://bugs.launchpad.net/ubuntu/+source/linux/+bug/336652/comments/15 :
"this is a serious issue but only affects limited hardware..."

Revision history for this message
Adam Niedling (krychek) wrote :

And who is to say that comment #15 is not just a mere speculation at best? What does he mean by limited hardware? Every comp that has HDD and not SSD?

You really had someone's absolutely valid bug report closed because he wasn't able to do a git bisect? Just how many times did you do that? Who gave you the authority? How do you benefit from these kinds of things?

Just as Ronin has said: please show a little more empathy and stop talking to people like a robot with your canned comments.

Revision history for this message
Vadim Peretokin (vperetokin) wrote :

I'm surprised this is being debated. Look at Google:
https://www.google.com.au/search?q=linux+high+io+desktop&oq=linux+high+&aqs=chrome.0.69i59j69i57j69i64l2.1936j0j1&sourceid=chrome&ie=UTF-8

You will clearly see that high enough IO will harm desktop responsiveness.
Surely all of these people aren't making it up?

Revision history for this message
Adam Niedling (krychek) wrote :

Now Christopher is onto me. He started vandalizing another of my bug reports. Bug #1247189.

Changed in linuxmint:
status: New → Invalid
Revision history for this message
Davide Depau (depau) wrote :

This issue is not getting enough attention. I don't know if you all have SSDs but most people don't. On hard disk drives this is a huge issue. System responsiveness drops when tracker is running and pretty much nothing else can run smoothly while it's running, even on computers with fast CPU/large amounts of RAM. The I/O is often the cause of system slowdown and this needs to be reduced as much as possible.
I'm sure this issue can be fixed, a background daemon doesn't need to run at full speed, it can be niced to 19, and internal fixes can be made.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
penalvch (penalvch) wrote :

Davide Depau, it would help immensely if you filed a new report via a terminal:
ubuntu-bug linux

Please feel free to subscribe me to it.

no longer affects: linux (Ubuntu)
affects: linuxmint → linux (Ubuntu)
no longer affects: linux (Ubuntu)
affects: linux → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: High → Undecided
status: Fix Released → New
importance: Undecided → Low
status: New → Incomplete
information type: Public → Public Security
information type: Public Security → Public
Revision history for this message
god (humper) wrote :

I can observe this even on ssd with both ubuntu and mainline kernels. Especially when some background task like update.mlocate which spits out fs-wide find is triggered.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
penalvch (penalvch) wrote :

god (humper), please file a new report via a terminal:
ubuntu-bug linux

Feel free to subscribe me to it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
god (humper) wrote :

done.

Revision history for this message
AZ (m-dev) wrote :

@Christopher: This is not incomplete. Thanks.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (7.1 KiB)

It might not be good to stir up such an old bug, but it gets regularly updated and new complains so maybe a new approach might help.

So let us make one thing clear, IMHO if something overloads your machine with disk I/O it has to stall it.
So the solutions paths are more like this:
a) beat it with more Processsing / IO HW
b) mitigate the effect as far as possible
c) avoid the overload before it starts

The issue is a common one - so I'll keep my explanations general and not specific to trackerd or any other case that was mentioned before.

### a) beat it with more Processsing / IO HW ###
There are way more expensive machines out there which can handle way more I/O without being slown down. The reason is that they have more I/O Cards, virtual functions to spread over CPUs handling that and at the high end servers with totally different I/O IRQ designs.
We should agree that on cheap/slow or even medium machines I/O overload just *IS* an issue to responsiveness.
But that isn't important - the question is what can a normal user do about it and spending x000000 $ on a machine isn't the solution.

### b) mitigate the effect as far as possible ###
So regarding mitigation there were already some approaches in this bug discussion.
Like using ionice and several dirty ratio tunings, but all these don't prevent the I/O overload.
E.g. if you overload the system with only "Best Effort" I/O class, the only difference it makes is that "other I/O" might pass faster, but your system is still fairly busy => unresponsive
Also dirty ratios come down to spending the process remaining time slice to clean up dirty memory as soon as a certain level is reached, now while you can configure higher ratios (at the price of endangering integrity) it also won't stop the burst of I/O. No instead it will allow to submit more data to dirty the page cache and thereby indirectly more I/O overloading the system again.

### c) avoid the overload before it starts ###
It must be said, since this bug starts back in 2007 and a lot of the reports are related to I/O+*sync that just for sync&journaling various filesystem and general kernel improvements have been mad. Several posts in this bug confirm this already.
Now what I didn't see people trying throttle the processes that overload the system.
Throttling at => https://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt
As any - this approach has certain limitations, but it is a new way to tackle the overall issue.
It also need certain cgroup and filesystem features (like accounting writeback through pagecache) which might only be available in modern ubuntu releases.

### Experiment ###
As an experiment to prove the solution I use the tools fio and latencytop to compare:
1. no background load checking latencytop
2. running a random read/write mutlithread fio in background checking latencytop
3. running a throttled random read/write mutlithread fio in background checking latencytop

# Background Load #
A fio job file like this:

[global]
ioengine=libaio
rw=randrw
bssplit=1k/25:4k/50:64k/25
size=512m
directory=/home/paelzer/latencytest
iodepth=8

[dio]
direct=1
numjobs=8

[pgc]
direct=0
numjobs=8

# Case 1 - No backgroun...

Read more...

Revision history for this message
AZ (m-dev) wrote :

Thanks for driving this forward.

You argue from
> So let us make one thing clear, IMHO if something overloads your machine with disk I/O it has to stall it.

This is a bit tricky, because overload means that the machine will be able not complete all task in the time given, i.e. tasks will accumulate until the resources are exhausted. Though, we usually do not have this situation on desktop machines. There we have tasks to do and want them to complete as fast as possible, thought some tasks may take longer than others. For example, copying a 5 GB DVD disk will take some minutes or so, but refreshing the browser window or switching windows should never. Overlay here would mean the user will turn of the machine and by a windows licence.

So this bug is mostly about having too big delays in applications using only a small bit of the available resources (like when switching back to a libreoffice window) when some other applications (like background file indexing) are asking for the remaining disk io resource capacities.

> Code improves to mitigate effects but can never be perfect for *ALL* users at once (especially in the default config)

I do not agree. Desktop responsiveness was achieved with older ubuntu versions on the given hw and is achieved with other operating systems (windows) on a broad range of hardware. I believe desktop responsiveness is something sufficiently specific a cpu and io scheduler can be tuned to. Using cgroups and alike might help, but should be configured by Ubuntu by default if necessary.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (4.5 KiB)

Hi AZ,
thanks for your feedback.

>> IMHO if something overloads your machine with disk I/O it has to stall it.
> This is a bit tricky, because overload means that the machine will be able not complete all task in the time given, i.e. tasks will accumulate until the resources are exhausted. Though, we usually do not have this situation on desktop machines.

Excuse me - I didn't want to phrase it too hard - it is surely ok to assume that a system stays responsive :-)
But when you add an background indexer like in the initial example you add some serious load.
The system might add a few other things and somewhen it is this overloaded.
Would you agree to modify your "Though, we usually do not have this situation on desktop machines." to "Though, we usually *should* not have this situation on desktop machines."?

Because that is the point where my suggestion of "throttling the few loads that cause these situations" kicks in.

> So this bug is mostly about having too big delays in applications using only a small bit of the available resources (like when switching back to a libreoffice window) when some other applications (like background file indexing) are asking for the remaining disk io resource capacities.

When I think of an overload case where e.g. a Process submits requests as fast as it can (especially with asynchronous I/O a process can quickly fill up hundreds of I/Os to the block device layer).
Now what should a process scheduler or I/O scheduler do?
1. handle them asap -> achieve good throughput, but might add some stalls to the system
2. throttle them -> improves responsiveness by avoiding overload, but this comes at certain prices
2a) if the process scheduler stalls it people start to ask "there is nothing else on the runqueue, why isn't it running? I want to get all I can from my HW".
2b) if the I/O scheduler stalls it people start to ask "hey my device could handle way more, why isn't it fully utilized with the request queue being filled" (remember all the "fun" people had with anticipatory scheduler)

Both 2a and 2b existed in various patches/tunings and almost every time the decision was that "the default" can not be to stall too much because there are different demands.

That was the reason why I personally didn't think about a cool new piece of code (which surely someone could write), but instead of a good mitigation of the most frequent cases with tools that are already there (like the cgroup io throttling I suggested)

>> Code improves to mitigate effects but can never be perfect for *ALL* users at once (especially in the default config)
>I do not agree.

Long story short - a default configuration has to be a tradeoff trying to make everyone happy but no one sad (hard job).

> Desktop responsiveness was achieved with older ubuntu versions on the given hw and is achieved with other operating systems (windows) on a broad range of hardware.

I'm coming from the server world, and there I/O throughput, I/O latency and even process latency and fairness clearly is superior compared to older releases as well as when compared well to other OSes.
But that doesn't negate your experience - it is just a different one.

> I believe desktop ...

Read more...

Revision history for this message
god (humper) wrote :

In my case ( see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1460985 ) the culprit generating huge I/O throughput was in /etc/cron.daily/man-db

It's such a long-standing and persistent bug that the default advice I give nowadays to people complaining about their ubuntu "got stuck again" is to run "sudo killall -9 find".

That's really a shame:
- it's not some random IO spike coming from nowhere
- it's not 3rd-party, it's in default install
- it's reproducible

Yet, we still don't even have workaround, let alone proper policing IO of all the background tasks shipped in default ubuntu install.

Hopefully migration to systemd timer units would help tackling it.

To post a comment you must log in.