Bug #131094 “Heavy Disk I/O harms desktop responsiveness” : Bugs : linux package : Ubuntu

Revision history for this message

Jamie McCracken (jamiemcc-blueyonder) wrote on 2007-08-12:

#1

Further investigation has led me to conclude that this bug is no longer valid

Slowdown in system can be eliminated by:

1) Clean install of tribe 4. I originally had tribe 3 when problem occurred and it persisted when upgrading but clean install somehow fixes the desktop responsiveness issues

2) Apps still feel slow but this is not a kernel issue - disabling esd sound in sound preferences makes gutsy as fast as feisty (see https://bugs.launchpad.net/ubuntu/+source/libgnome/+bug/115652)

Changed in linux-source-2.6.22:
status:	New → Invalid

Revision history for this message

Jamie Lokier (jamie-shareable) wrote on 2007-08-15:

#2

I have esd sound disabled, and performance is still incredibly slow when trackerd is running on a 2.6.22-{7,8,9} kernel. When I want to actually get some work done, I "killall -STOP trackerd".

The effect on desktop performance is weird: it feels exactly like heavy swapping. Menus etc. take seconds to appear. New apps take ages. Dragging a window can even take 10 seconds or more before it responds.

But there is free RAM, and especially there's plenty of reclaimable (i.e. not used by programs) RAM. I have 1GB.

It's not using much CPU either. (I have a Core Duo; neither core sees much usage while trackerd is running).

So it may be in some way dependent on I/O. But this is with the trackerd set to maximum throttling, i.e. slowest scanning.

Interestingly, the disk activity monitoring applet shows very little activity (little spikes every second or two), but the disk light is constantly on.

There's something else fishy: strace -p on the trackerd process shows expected system calls, but sometimes killing the strace prints "Process xxx detached" but then strace doesn't terminate, even with kill -9.

Revision history for this message

Jamie McCracken (jamiemcc-blueyonder) wrote on 2007-09-11:

#3

Im reopening this

Other users have experienced this (see comments in https://bugs.launchpad.net/ubuntu/+source/tracker/+bug/135115) and reported that a fresh install cures the problem

This indicates there's a bug when upgrading to gutsy which causes the high iowait times which can only be solved by doing a clean install.

I cant say whether this bug only occurs when upgrading from older gutsy versions or from feisty...

All I can say is that it started from clean install of tribe 3 and persisted when upgraded and did not go away until clean install of tribe 4

Changed in linux-source-2.6.22:
status:	Invalid → Confirmed

Revision history for this message

Tom Badran (tom-badran) wrote on 2007-09-11:

#4

I've marked the bug i filed against trackerd as a dup against this bug.

Like i say, a fresh install has made a substantial difference (completely unuseable machien with trackerd -> useable). I do however still hear my disk being hit fairly often. Its not impacting interactivity as severely as it used to, but there are still noticeable short stalls doing fairly trivial things such as opening menus etc.

Revision history for this message

Miguel Martinez (el-quark) wrote on 2007-09-12:

#5

I'm also experiencing the slowdowns during large dist-upgrades involving several packages. This is a dist-upgraded Gutsy. Furthermore, I've seen firefox crashing pretty often during those heavy I/O periods. Sometimes, it has taken thunderbird with him.

Revision history for this message

Michael Vogt (mvo) wrote on 2007-09-12:

#6

I milestone this bug as it is important to get this fixed if we use tracker by default.

Changed in linux-source-2.6.22:
importance:	Undecided → High

Revision history for this message

Ben Collins (ben-collins) wrote on 2007-09-12:

#7

Please try booting with elevator=deadline and tell me if that helps any.

Changed in linux-source-2.6.22:
assignee:	nobody → ben-collins
status:	Confirmed → In Progress

Revision history for this message

Jamie McCracken (jamiemcc-blueyonder) wrote on 2007-09-14:

#8

if anything elevator=deadline seems to cause higher iowait and for longer periods (I even saw a 100% for it with that setting) when running trackerd

average iowait values when tracker is flushing to disk during heavy indexing of same files:

for feisty 2.6.20-15 : 90-95%
for 2.6.22-9 : 90-99%
for 2.6.22-9 with elevator=deadline: 95-100%

Revision history for this message

Miguel Martinez (el-quark) wrote on 2007-09-14: Re: [Bug 131094] Re: Heavy Disk I/O harms desktop responsiveness

#9

Same here. elevator=deadline doesn't seem to help, although I don't have
any objective data to complement Jamie's

Jamie McCracken escribió:
> if anything elevator=deadline seems to cause higher iowait and for
> longer periods (I even saw a 100% for it with that setting) when
> running trackerd
>
> average iowait values when tracker is flushing to disk during heavy
> indexing of same files:
>
> for feisty 2.6.20-15 : 90-95%
> for 2.6.22-9 : 90-99%
> for 2.6.22-9 with elevator=deadline: 95-100%
>

--
----------------------------------------
Miguel Martínez Canales
    Dto. Física de la Materia Condensada
    UPV/EHU
    Facultad de Ciencia y Tecnología
    Apdo. 644
    48080 Bilbao (Spain)
Fax: +34 94 601 3500
Tlf: +34 94 601 5437
----------------------------------------

  "If you have an apple and I have an apple and
  we exchange these apples then you and I will
  still each have one apple. But if you have an
  idea and I have an idea and we exchange these
  ideas, then each of us will have two ideas."

George Bernard Shaw

Revision history for this message

Ben Collins (ben-collins) wrote on 2007-09-14:

#10

Ok, for the fun of it, please also try elevator=anticipatory

Revision history for this message

Jeff Schroeder (sejeff) wrote on 2007-09-26:

#11

The latest gutsy kernel have the right settings to use blktrace. Try these commands
sudo apt-get install blktrace
sudo mount -t debugfs debugfs /sys/kernel/debug/

# If /dev/sda is the disk that / is located on
sudo btrace /dev/sda

# Let it run for a few seconds and then kill it with CTRL C.

That will show the top processes using your disk.

Revision history for this message

Jeff Schroeder (sejeff) wrote on 2007-09-26:

#12

Make that:
sudo btrace -s /dev/sda

It gives a summary of the disk usage of each proccess.

Revision history for this message

Jeff Schroeder (sejeff) wrote on 2007-09-26:

#13

Also note that gutsy has an 'ionice' command that you can use to slow don't IO for a process like trackerd. man ionice.

Revision history for this message

Julien Olivier (julo) wrote on 2007-09-26:

#14

Hi,

I have upgraded from feisty to gutsy and also noticed that my GNOME desktop felt way slower than on feisty. I tried disabling esd, but it didn't help. The thing is that I laos tried to disable trackerd, but the slowness remains when I open F-Spot, or when I use Firefox. Is there a way to know if the problem really comes from the kernel ? Is it safe to re-install linux-image-2.6.20 from feisty ? If yes, are there any other packages I should downgrade too ?

Revision history for this message

Tobias Oetiker (tobi-oetiker) wrote on 2007-09-26:

#15

Julien,

it seems that kernels 2.6.18 to 2.6.21 have some serios issues with
heavy disk io especially when multiple processes are fighting over
io and if read and write are going on in parallel ...

for us the upgrade to 2.6.22 helped a lot ...

there were changes to the io schedulers and massive changes to the
default values of the /proc/sys/vm/dirty_* tunables ...

we also found that the problems were more pronounced when using lvm
... unfortunately this is all anecdotal and non conclusive.

so if you have the chance, you might want to try 2.6.22 ...

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten
http://it.oetiker.ch <email address hidden> ++41 62 213 9902

Revision history for this message

Julien Olivier (julo) wrote on 2007-09-26:

#16

Tobias,

as I said, I have upgraded to gutsy recently, so I do have kernel 2.6.22, and I still have speed problems. Whether or not the kernel is the culprit is still a mystery to me though.

Someone said that the problems seem to persist when you upgrade from feisty (versus a fresh install), so maybe I have inherited wrong values in /proc/sys/vm/dirty_* ?

I would be really pleased to help, so if there is anything I can test, I'm ready to help.

PS: I installed kernel 2.6.20 from feisty and booted on it, and it didn't change anything.

Revision history for this message

Tobias Oetiker (tobi-oetiker) wrote on 2007-09-26:

#17

Today Julien Olivier wrote:

> Tobias,
>
> as I said, I have upgraded to gutsy recently, so I do have kernel
> 2.6.22, and I still have speed problems. Whether or not the kernel is
> the culprit is still a mystery to me though.
>
> Someone said that the problems seem to persist when you upgrade from
> feisty (versus a fresh install), so maybe I have inherited wrong values
> in /proc/sys/vm/dirty_* ?

this is highly unlikely ... check /etc/sysctl.conf to see if there
are any explicit settings

> I would be really pleased to help, so if there is anything I can test,
> I'm ready to help.
>
> PS: I installed kernel 2.6.20 from feisty and booted on it, and it
> didn't change anything.

in that case I am fresh out of ideas unfortunately.

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten
http://it.oetiker.ch <email address hidden> ++41 62 213 9902

Revision history for this message

Jamie McCracken (jamiemcc-blueyonder) wrote on 2007-09-26:

#18

I think there are two separate issues here

1) something in old tribes affects disk access (HAL or UDEV?) and on some occasions they persist when upgraded and only a fresh install cures the problem. This is what affected me and all disk IO read and writes were affected very badly even without tracker running. This only happens rarely as only a few people had this...

2) Ext3 write performance is very poor on both feisty and Gutsy - as soon as pdflush starts it tends to hog the disk. Putting $Home/.cache/tracker on a different FS like XFS improves things a lot (I only did this on feisty but not gutsy)

if default pdflush params have changed on gutsy kernel that could also affect write performance negatively.

Another thing is my hard disk is whisper quiet on feisty but extremely noisy on gutsy - I had to hdparm to lower the noise. WOuld be nice to make it quiet by default too especailly as tracker makes it very noisy at times

Revision history for this message

Julien Olivier (julo) wrote on 2007-09-26:

#19

Jamie,

about #1: any idea what exactly went wrong, and is there a chance that it might still be unfixed for some users ?

Revision history for this message

Jamie McCracken (jamiemcc-blueyonder) wrote on 2007-09-26:

#20

julien: Ive no idea what caused it but the effect was very noticeable even with light disk access. Only two people (myself included) have done a fresh install to solve the issue so i think its quite rare.

Im not sure if its recommended to dist-upgrade from feisty or not? (ive read a few cases where it did not work properly on osnews)

Revision history for this message

Julien Olivier (julo) wrote on 2007-09-26:

#21

OK, I will try to re-install everything from scratch then.

Revision history for this message

Martin (martin615) wrote on 2007-09-26:

#22

I disabled Tracker as a result of all the disk trashing. Yes, Tracker is nice. But I seriously question enabling it by default while this problem is still around (wherever the problem might lie).

Revision history for this message

Jamie McCracken (jamiemcc-blueyonder) wrote on 2007-09-26:

#23

Martin,

if the disk io issues are only tracker related then thats ok as we have fix for that in latest version (not yet in gutsy though) which should reduce the problem and prevent tracker from hogging the disk for long periods.

Revision history for this message

Jeff Fortin Tam (kiddo) wrote on 2007-09-26:

#24

Please, don't tell me this will be unfixed for users who went the dist-upgrade way.
This is not as rare as you think, and clean installing for lots of people is not someting you want to do all the time. Isn't it possible to fix that with upgrades? If some config broke sometime, it should be possible to reverse it for everyone no?

I actually don't even know what is going on exactly anymore, but the thing I do see is that all my gutsy computers have really horrible performance whenever I do anything that uses the hard drive.

Revision history for this message

Martin (martin615) wrote on 2007-09-26:

#25

Jamie,

Ok, that sounds great. I'll try enabling it again when the fix hits Gutsy.

Revision history for this message

Alexey Borzenkov (snaury) wrote on 2007-09-26:

#26

I can confirm strange disk-related performance problems too, and I dist-upgraded to gutsy way after tribe5 was already out (thus I don't think it could be something from previous tribes). Also I wonder if other problems (like desktop often not showing after I relogin [so I always have to restart if I logout, not even /etc/init.d/gdm restart helps], and login sound not playing the first time, even after I installed esound) could be cured by a fresh install, but I won't have time to do it for several weeks... I guess it will be after gutsy is already released.

And somehow I don't believe it's rare... I wonder how many people actually dist-upgraded, as opposed to fresh install of tribe5?

Revision history for this message

Lukas Kolbe (lukas-einfachkaffee) wrote on 2007-09-27:

#27

dmesg, hdparm -tT, smartctl -a, lspci -vvn and a vmstat 2 Edit (103.8 KiB, text/plain)

I can confirm this problem on latest Gutsy. It bothered me a while, but shamefully I didn't yet took the time to report it and I forgot wether this first appeared in feisty or in gutsy. My system was upgraded at least since feisty, possibly also since dapper. I actually can't remember when I last installed ubuntu from scratch.

Attached are the outputs of dmesg, hdparm -tT, smartctl -a, lspci -vvn and a vmstat 2 during my latest dist-upgrade that made the system heavily unresponsive (again). Also, while tracker is indexing, or evolution is starting or any other normal disk-io is happening, the system becomes unusable. Dist-upgrades of only a few packages take ages.

If there's anything I can do to help identify the root cause, please ask.

Revision history for this message

Lukas Kolbe (lukas-einfachkaffee) wrote on 2007-09-27:

#28

pvdisply; vgdisplay; lvdisplay; fdisk -l Edit (2.3 KiB, text/plain)

And as this was mentioned before I thought it might be important: I'm using LVM. Attached is the complete disk-layout on my system.

Revision history for this message

Amit Kucheria (amitk) wrote on 2007-09-27:

#29

This thread seems to be catching fire :-)

I did some IO testing of the Feisty and Gutsy kernels on Gutsy userspace. Results are at https://wiki.ubuntu.com/GutsyFeistySchedulerShootout?action=show

If someone can repeat these tests and posts the results, it would help drill down into the problem. Currently, it seems like only users doing dist-upgrades are having problems. Unfortunately, my machine was a fresh install.

Revision history for this message

Lukas Kolbe (lukas-einfachkaffee) wrote on 2007-09-27:

#30

logfile Edit (3.2 KiB, text/plain)

I run your test, the numbers seem quite equal to yours, but during the test my system became unresponsive like hell. Switching desktops (from web to evolution) took more than 20 seconds (probably due to swapping, I have 768MB RAM), subsequent switches took up to five seconds. I could see the drawing while I tried scrolling in evolutions' folder list. vim took ages to load etc. pp - all in all very sluggish.

Revision history for this message

Tobias Oetiker (tobi-oetiker) wrote on 2007-09-27:

#31

I don't think the problem is entirely ubunty made ... Other people
are looking at IO performance too.

This does look interesting

http://lkml.org/lkml/2007/8/16/77

and this ... http://lkml.org/lkml/diff/2007/8/23/218/1

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten
http://it.oetiker.ch <email address hidden> ++41 62 213 9902

Revision history for this message

Jamie McCracken (jamiemcc-blueyonder) wrote on 2007-09-29:

#32

Could this be sata related?

Can everyone who has this problem indicate if this is so?

just wondering if its related to https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/119730

Revision history for this message

Tom Badran (tom-badran) wrote on 2007-09-29:

#33

I am on a sata machine, however i never had a problem with file copy
throughut speed etc., its just interactivity.

On 29/09/2007, Jamie McCracken <email address hidden> wrote:
>
> Could this be sata related?
>
> Can everyone who has this problem indicate if this is so?
>
> just wondering if its related to
> https://bugs.launchpad.net/ubuntu/+source/linux-
> source-2.6.20/+bug/119730
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>

--
Tom Badran
http://badrunner.net

Revision history for this message

Miguel Martinez (el-quark) wrote on 2007-09-29:

#34

I don't think it's sata-related as I have an "old" Pentium-M (735) that
doesn't support SATA, and my laptop does suffer from the I/O issue.

Jamie McCracken escribió:
> Could this be sata related?
>
> Can everyone who has this problem indicate if this is so?
>
> just wondering if its related to
> https://bugs.launchpad.net/ubuntu/+source/linux-
> source-2.6.20/+bug/119730
>

--
----------------------------------------
Miguel Martínez Canales
    Dto. Física de la Materia Condensada
    UPV/EHU
    Facultad de Ciencia y Tecnología
    Apdo. 644
    48080 Bilbao (Spain)
Fax: +34 94 601 3500
Tlf: +34 94 601 5437
----------------------------------------

  "If you have an apple and I have an apple and
  we exchange these apples then you and I will
  still each have one apple. But if you have an
  idea and I have an idea and we exchange these
  ideas, then each of us will have two ideas."

George Bernard Shaw

Revision history for this message

Jamie McCracken (jamiemcc-blueyonder) wrote on 2007-09-29:

#35

Also forgot to mention tracker 0.6.3 is now in gutsy (its not in the beta) - this version is designed to work around the issues here as well as being much better optimised as far as disk access goes.

Revision history for this message

Jeff Fortin Tam (kiddo) wrote on 2007-09-29:

#36

Nope. My desktop only has IDE drives, and so does my laptop, so not sata-related.

Revision history for this message

Tobias Oetiker (tobi-oetiker) wrote on 2007-09-29:

#37

Jamie,

I run sata with lvm

cheers
tobi

Today Jamie McCracken wrote:

> Could this be sata related?
>
> Can everyone who has this problem indicate if this is so?
>
> just wondering if its related to
> https://bugs.launchpad.net/ubuntu/+source/linux-
> source-2.6.20/+bug/119730
>
>

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten
http://it.oetiker.ch <email address hidden> ++41 62 213 9902

Revision history for this message

Amit Kucheria (amitk) wrote on 2007-10-02:

#38

As pointed out by Jeff above, can someone having the problems run trackerd with ionice.

e.g. ionice -c3 -p<pid of trackerd>

Revision history for this message

Tom Badran (tom-badran) wrote on 2007-10-02:

#39

I had already tried the ionice in one of the bugs closed off as a dup, it
makes absolutely no difference whatsoever

On 02/10/2007, Amit Kucheria <email address hidden> wrote:
>
> As pointed out by Jeff above, can someone having the problems run
> trackerd with ionice.
>
> e.g. ionice -c3 -p<pid of trackerd>
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>

--
Tom Badran
http://badrunner.net

Revision history for this message

Jamie McCracken (jamiemcc-blueyonder) wrote on 2007-10-02:

#40

Amit,

trackerd uses the best effort 7 disk io schedule by default (it tries idle class first but as that needs root it will fail and default to BE 7)

note disk writes are not affected by the schedule as they are controlled by pdflush and heavy writing is where the problem lies (pdflush tends to go crazy)

tracker 0.6.3 mitigates the pdflush problems by intermittently calling fsync when merging indexes to prevent pdflush from taking over the disk and starving other apps

Revision history for this message

Bill Hand (fxwgbill-gmail) wrote on 2007-10-05:

#41

I too had this problem two or three kernels ago in gutsy. Finally as a last resort I uninstalled tracker for the time being. The response time improved greatly.

I have followed updates since a month into gutsy developement and ever since. If that helps any... Everything for the most part seems to be working on my system, other then a couple problems that are already in the bug list. Things seem to go a lot better as far as upgrades go, if you do follow along with the process. Doing it this way, has gotten me through the last couple of releases, without a reinstall, (knock on wood). It does get bumpy at times, but I am in a position that it isn't as critical if things 'break' for a bit.

Bill

Revision history for this message

Lukas Kolbe (lukas-einfachkaffee) wrote on 2007-10-06:

#42

With the latest updates in gutsy this problem seems to be gone for me. I
just did a dist-upgrade and nearly didn't notice it, my laptop just
worked without lagging much.

I'm hooked :)

--
Lukas

Revision history for this message

Bill Hand (fxwgbill-gmail) wrote on 2007-10-06:

#43

Interesting... I may re-install it and see what the result is. It
would help to know that it's acting better before release. I'll do it...

I'll let yall know how it goes. Working on it now.

Bill

Lukas Kolbe wrote:
> With the latest updates in gutsy this problem seems to be gone for me. I
> just did a dist-upgrade and nearly didn't notice it, my laptop just
> worked without lagging much.
>
> I'm hooked :)
>
>

Revision history for this message

Bill Hand (fxwgbill-gmail) wrote on 2007-10-06:

#44

OK... Installing tracker 0.6.3-0ubuntu2, which I am showing to be the
latest version. Also installing the tracker-search-tool.

and a reboot...

Initially.. Here we go... trackerd staying around 22 to 24% on Proc
monitor. high as 40 to 45%; almost seems like it's ramping up again.
I'll let it run for a while, see if it ever catches up.

OK... ain't been 7 to 10 minutes and it's dropped back to 0, with
periodic 'hits'

6:59.20 CPU time right now @ 23:31 local

ok... still sittin there. 2337 local.

OK.. I'll keep a good watch on it.

If anyone needs further info as to my set up... lemme know...

Bill

Lukas Kolbe wrote:
> With the latest updates in gutsy this problem seems to be gone for me. I
> just did a dist-upgrade and nearly didn't notice it, my laptop just
> worked without lagging much.
>
> I'm hooked :)
>
>

Revision history for this message

Martin (martin615) wrote on 2007-10-06:

#45

I'm still seeing an unexceptable amount of I/O.

For instance. I thought timing a git clone with tracker on and then off would be a good test. So I did:

time git clone git://git.kernel.org/pub/scm/git/git.git

Only to realize that tracker didn't start indexing until after the clone was done. That's ok. So, I thought I'd do:

date; time git clone git://git.kernel.org/pub/scm/git/git.git; date

instead, just to see how long after the clone was done tracker would keep indexing. I didn't get that far though. I did a "rm -rf git" while git was indexing and punding the disk and thought I'd wait until it was done. It took several _minutes_ (I don't know... 5? 10?) during which the disk was working frantically all the time.

:(

Revision history for this message

Martin (martin615) wrote on 2007-10-06:

#46

(The part of my brain responsible for my english is still sound asleep... ;)

Granted, the git repo is ~6MB of text. But I still find the disk trashing unacceptable.

Revision history for this message

Jamie McCracken (jamiemcc-blueyonder) wrote on 2007-10-06:

#47

Martin,

Im happy to add a performance option to do fast index merges instead of incremental ones (which currently uses intermittent fsync calls to stop pdflush hogging the disk but can cause thrashing)

the problem is that ext3 performance when doing lots of alternating read/writes is horrifically abysmal (probably buggy rather than design fault) so for best results we recommend mounting ~/.cache as reiser or some other fs other than ext2/3

Revision history for this message

Martin (martin615) wrote on 2007-10-06:

#48

* An option isn't really an option (pun intended ;). This sort of thing should Just Work (TM).
* If the problem is ext3, fine. That's what's used by default though. So I don't really see tracker being enabled by default while that's the case.

<ignorant question>
Is there no way to do more work in memory (depending on how much memory is available and needed, of course... there's a balance here) before writing to disk?
</ignorant question>

(Sorry if I come of as a bit harsh, I totally appreciate the work you do on Tracker and I've been looking forward to start using it full time for quite a while.)

Revision history for this message

Jamie McCracken (jamiemcc-blueyonder) wrote on 2007-10-06:

#49

in 0.6.3 we have a 16mb buffer to do that but once indexed we need to flush to disk at some point

Martin, can you confirm which version you are using? 0.6.2 was really poor in this regard

Revision history for this message

Martin (martin615) wrote on 2007-10-06:

#50

I'm using 0.6.3.

<more ignorant questions>
How does the size of the data written to this buffer relate to the size of what's being indexed? Is it written continuously to the disk? (What I'm really wondering is how it's used when indexing 6MB of source and text files. I mean, 6MB of text really ought to result in less than 6MB of "indexing data"... right? If it's split up in small parts, maybe it can be merged together?)

How much would increaring this buffer help? If it helps, perhaps one way forward would be to make the size dynamic, so when lots of data needs to be indexed, more memory means less trashing by ext3. Perhaps it's size could even depend on the filesystem being used?
</more ignorant questions>

Revision history for this message

Jamie McCracken (jamiemcc-blueyonder) wrote on 2007-10-06:

#51

martin,

it does not work like that

the 16MB hit buffer is sufficient for about 64MB of text (as we only store unique and valid words) so your 6mb text easily fits into it - it would only update the index once all the new stuiff is indexed or the buffer overflows

the problem is updating an existing index - each word (and you could have 100,000 + words that need updating) requires a seek and then a write. Ext3 performs really badly with such seek read seek write patterns

if we do it in one shot then pdflush could hog the disk and deny access to other apps but this would be the fastest way to update with the least thrashing

we currently do it incrementally 1000-5000 words at a time followed by fsync so it will take longer but should not delay access to disk to other apps for more than a few secs

At the moment we cannot really improve things here further until ext3 or whatever causes the bad performance is fixed.

Revision history for this message

Martin (martin615) wrote on 2007-10-06:

#52

There's no way you can reorder the data to better cope with ext3:s deficiencies (or whatever it is that's causing problems)?

(Humm... Maybe it's better to spend energy on fixing the real issue. :)

Revision history for this message

Martin (martin615) wrote on 2007-10-07:

#53

Btw, are there any kernel people looking in to this? (By "this" I mean "fixing ext3".)

Revision history for this message

Jamie McCracken (jamiemcc-blueyonder) wrote on 2007-10-07:

#54

Martin,

No Idea - I read ext4 will soon go into kernel but dont know if that fixes the issue

Revision history for this message

Martin (martin615) wrote on 2007-10-07:

#55

Yeah. I thought about ext4 too. Perhaps a post to LKML would be in order?

Revision history for this message

Jamie McCracken (jamiemcc-blueyonder) wrote on 2007-10-07:

#57

Martin,

Actually I have done some more testing and discovered that when index merging no physical reads were done (they are all cache hits) so ext3 problem is handling a ton of small writes which it does very badly according to google (it fragments destroying performance and preventing contiguous writing to the index)

XFS in contrast does these very well with a lot of contiguous writing which results in almost no loss of speed thanks to its delayed allocation feature (http://en.wikipedia.org/wiki/Delayed_allocation) which tracker benefits from.

the good news is that this feature is under consideration for ext4 - https://ols2006.108.redhat.com/2007/Reprints/sato-Reprint.pdf

so fingers crossed!

Revision history for this message

Martin (martin615) wrote on 2007-10-08:

#58

Delayed allocation is most definetely going in. :) The only question seems to be if it should go in ext4 or the VFS layer so it can be shared with XFS and other filesystems. See e.g.

Section 3.2 in https://ols2006.108.redhat.com/2007/Reprints/mathur-Reprint.pdf
http://ext4.wiki.kernel.org/index.php/OLS-bof-2007-minutes_OLS_2007
http://ext4.wiki.kernel.org/index.php/Minutes10-01-2007

Revision history for this message

Jamie McCracken (jamiemcc-blueyonder) wrote on 2007-10-08:

#59

"Currently Delalloc only works for writeback mode. Implementation for ordered mode would be tricky, because need to use bufferheads."

lets hope they get it fixed for ordered mode which is default for ext3/4.

If they dio this bug should be fixed for hardy

Revision history for this message

Martin (martin615) wrote on 2007-10-08:

#60

/me starts lurking around at <email address hidden> :)

Another question. Do any of the other indexers out there work around this problem somehow? And if so, how?

I mean, the problem is many small write():s and blocks being allocated directly during write() instead of at page flush time, right? Can't you merge several write operations together? Or maybe you already do that "1000-5000 words at a time"? If not, could the on disk format be changed to allow for merging several write():s together?

Also, is it really necessary to fsync() so often?

Revision history for this message

Jamie McCracken (jamiemcc-blueyonder) wrote on 2007-10-08:

#61

I dont know about other indexers

Trackers indexer is a hash table so words are written at random locations - its not possible to write more than one word at a time nor do we know whether certain words are stored sequentially as a result.

We rely on the kernel to order the writes elevator fashion so they can be written in a contiguous fashion - sadly that does not happen on ext3 but does if ~/.cache/tracker is mounted on XFS

We call fdatasync after every 1000-5000 words written to prevent pdflush starving the disk from other apps (this starvation appears to be a recent problem in kernels since 2.6.20)

I will add an option for fast merges which is approx 50% quicker without any fsyncs but will hog the disk when doing so on ext3.

Revision history for this message

Jamie Lokier (jamie-shareable) wrote on 2007-10-09:

#62

Download full text (9.0 KiB)

Jamie McCracken wrote:
> I dont know about other indexers

Someone should see what Beagle's like, I guess.

> Trackers indexer is a hash table so words are written at random
> locations - its not possible to write more than one word at a time nor
> do we know whether certain words are stored sequentially as a result.
>
> We rely on the kernel to order the writes elevator fashion so they can
> be written in a contiguous fashion - sadly that does not happen on ext3
> but does if ~/.cache/tracker is mounted on XFS
>
> We call fdatasync after every 1000-5000 words written to prevent pdflush
> starving the disk from other apps (this starvation appears to be a
> recent problem in kernels since 2.6.20)

Ew. Those both look like nasty kernel limitations when writing to a
file in a scattered fashion.

I guess this is also why producing multiple smaller index files, then
having a merge-fest to make the large index file when it's all done,
is faster than writing everything to one hash index as you go. That
would naturally decrease the _size_ of seeks and hence seek time, as
smaller files span less of the disk.

Coming back to this:

> Trackers indexer is a hash table so words are written at random
> locations - its not possible to write more than one word at a time nor
> do we know whether certain words are stored sequentially as a result.

That doesn't seem like a good way to write an index. I'll try to put
together a different idea. (I've thought a lot about indexes for
databases: I'm thinking of writing yet another database engine).

You've found that SQLite doesn't perform too well without
index-merging, and neither does the other db you tried.

But a _good_ database implementation obviously should perform better
writing everything to one big file, instead of to multiple smaller
files in sequence then merging them.

Think about it: a good database would do the optimisation you're doing
automatically, because it's quite a common requirement (and benchmark)
in databases to load loads of data with randomly ordered index keys.

The only different would be it would hide the "smaller files" as
allocated zones inside a single big file that it's apparently using.
That bigger file's size would grow in steps. The disk seeking and I/O
would still have similar properties.

There's quite a few different algorithms,to make the index (in a
general database engine) be always accessible while it grows, despite
the internal periodic reorganisation, and to keep the reorganisation
efficient with disk seeks.

One way is to store the index as two tables internally: one B-tree,
and one sequential table (because reads can easily check both), and
write updates (in your case, each new "word") fairly sequentially up
to a certain amount, as write-ahead logging, then periodically merging
the log into the main B-tree using a batched method. If two logs are
permitted, one being merged and one being written, writing doesn't
have to stall during merging.

(Aside:
  => I know for a fact that several databases do this, retain a
     separate sequential index and B-tree index, when there's a
     constant stream of updates.

You might find PostgreSQL, some MySQL backend, F...

Jamie McCracken wrote:
> I dont know about other indexers

Someone should see what Beagle's like, I guess.

> Trackers indexer is a hash table so words are written at random
> locations - its not possible to write more than one word at a time nor
> do we know whether certain words are stored sequentially as a result.
>
> We rely on the kernel to order the writes elevator fashion so they can
> be written in a contiguous fashion - sadly that does not happen on ext3
> but does if ~/.cache/tracker is mounted on XFS
> 
> We call fdatasync after every 1000-5000 words written to prevent pdflush
> starving the disk from other apps (this starvation appears to be a
> recent problem in kernels since 2.6.20)

Ew.  Those both look like nasty kernel limitations when writing to a
file in a scattered fashion.

I guess this is also why producing multiple smaller index files, then
having a merge-fest to make the large index file when it's all done,
is faster than writing everything to one hash index as you go.  That
would naturally decrease the _size_ of seeks and hence seek time, as
smaller files span less of the disk.

Coming back to this:

> Trackers indexer is a hash table so words are written at random
> locations - its not possible to write more than one word at a time nor
> do we know whether certain words are stored sequentially as a result.

That doesn't seem like a good way to write an index.  I'll try to put
together a different idea.  (I've thought a lot about indexes for
databases: I'm thinking of writing yet another database engine).

You've found that SQLite doesn't perform too well without
index-merging, and neither does the other db you tried.

But a _good_ database implementation obviously should perform better
writing everything to one big file, instead of to multiple smaller
files in sequence then merging them.

Think about it: a good database would do the optimisation you're doing
automatically, because it's quite a common requirement (and benchmark)
in databases to load loads of data with randomly ordered index keys.

The only different would be it would hide the "smaller files" as
allocated zones inside a single big file that it's apparently using.
That bigger file's size would grow in steps.  The disk seeking and I/O
would still have similar properties.

There's quite a few different algorithms,to make the index (in a
general database engine) be always accessible while it grows, despite
the internal periodic reorganisation, and to keep the reorganisation
efficient with disk seeks.

One way is to store the index as two tables internally: one B-tree,
and one sequential table (because reads can easily check both), and
write updates (in your case, each new "word") fairly sequentially up
to a certain amount, as write-ahead logging, then periodically merging
the log into the main B-tree using a batched method.  If two logs are
permitted, one being merged and one being written, writing doesn't
have to stall during merging.

(Aside:
  => I know for a fact that several databases do this, retain a
     separate sequential index and B-tree index, when there's a
     constant stream of updates.

You might find PostgreSQL, some MySQL backend, Firebird, or
     Oracle's version of Berkeley DB (see oracle.com - it's still the
     BSD license) actually performs better than anything you've tried
     so far for these reasons.  Don't assume a separate SQL server
     process means slow, as disk seeking is slower :-) Have you tried
     an SQL DB, or Oracle/non-Oracle Berkeley DB?  I can't offer
     advice, as I don't actually use these things myself much, other
     than consider trying them, and try Oracle's Berkeley DB first
     because the API is so similar to what you've already tried.
)

Another algorithm is to have concurrent logs in a
tree-of-logs-and-B-trees structure, with merge-sorting being
interleaved into the index accesses, either lazily on demand, or when
the tree-of-trees becomes unbalanced, or when seek time measurements
indicate it is worthwhile.  That's more complex, but algorithmically
adapts better with the size of the data and the approximate
seek-time-to-data-size ratio.

Your index-merging strategy is similar to tree-of-logs-and-B-trees but
without the logs ;-)

Because you don't have the logs part, all your disk writes are random,
suffering from these kernel issues and needing the fdatasync() calls.

To improve this, I can think of a couple of simple things:

1. Instead of writing small hash indexes, to merge them later,
      write small sequential indexes: i.e. simply append words (index keys)
      and the attached data until the file is large enough and it's
      time to move to the next.

Then merge the sequential indexes, much as you merge tree ones
      now.

You might want a two-level strategy, with relatively small
      sequential index files, and merging those to make medium size
      B-tree indexes, then merging those to make a large hash or
      B-tree index.  The merging would be slow, but you've said that's
      much less of a problem than the main index building time.

2. Don't write 1000-5000 random words, in the order you found them,
      to a hash index, followed by fdatasync.  Collect the words in
      memory first into a batch, then _sort_ them, then write then to
      a B-tree index (not hash) in the sorted order (the B-tree must
      use exactly the same order, beward of case/locale issues), then
      do fdatasync().

The reason is that writing a sorted batch of updates to a B-tree
      does less seeking, (and also less I/O under memory pressure),
      then writing the same data unsorted, to a B-tree or hash.
      (Sorting doesn't help with a hash as the hash mixes up the
      sorting of course :-)

The benefit of storing sorted memory batches into a file B-tree
      reduces as the tree gets larger, because the I/O will eventually
      be one update per block, with a huge tree and sparsely
      distributed keys, but it still reduces seek times.  The
      heuristic about how often to call fdatasync() to avoid pdflush
      problems is a function of tree size, but if you're limiting the
      tree size, for index merging multiple trees later, it may be
      irrelevant.

Seeing all these performance problems with Tracker, the index writing
issues, plus the inotify issues and disk-battering initial inotify
registration, they aren't Tracker's fault at all.

Tracker should be able to use all those facilities, and there is no
_good_ reason why they don't work well for this application.

They are weakness in the database engines you've used: A good database
engine should do what you're having to rather "manually", but even
better heuristics (multi-level batching and sorting in memory and on
disk, delayed reorganisation, seek clustering), and better I/O methods
(O_DIRECT and application-controlled elevator, for a start).

And they are weaknesses in the kernel's incomplete attempt at I/O
priorities, especially for writing, and lacking seek awareness to give
space to other applications, and lacking a file-change detection
mechanism which propagates up directories and persists across reboots.

This has really added to my urge to write my vision of a database
engine (I was thinking a lot about techniques), now that I see there's
so much room for improvement in performance for some applications.

But that's not going to happen for quite some while.  (Tracker would
be a great test case for it :-).

I'd be very happy if the ideas I've mentioned might make a big
difference to index writing:

1. Try Oracle Berkeley DB, use the latest stable source from their
       site, just to be sure of the best performance test.

2. Try the SQL separate process servers: Firebird, PostgreSQL,
       MySQL w/ various backends, or anything else which is free
       enough and claims speed.

3. Try batching and sorting (by index order) writes in memory, and
       writing them to a B-tree format file (ensuring the B-tree order
       is idential to batch sort order, including locale/case issues), not a
       hash format file.  (E.g. I know Berkeley DB offers both
       options).  Don't be afraid to use a large batch in memory: a
       few megabytes or tens of megabytes, especially to give the test
       a chance.  Still call fdatasync() every 1000-5000 words during
       the batch write (i.e. after it's sorted), to minimise ext3
       elevator screwups: the amount of unwritten index entries in the
       _kernel_ should always remain limited.

4. Try writing sequential indexes (i.e. just appending each
       word+data to the small index files), not using a hash/B-tree, and
       index merging those.  Use a two-level mergine scheme, so the
       sequential indexes are limited in size.  Can be combined with
       suggestion 3, with memory batch sorting to a B-tree being used
       by the merging step.

Some food for thought.  Please let me know yours :-)  Sorry if it seems
like there's always more to do.  I do think you might be pleasantly
surprised by the writing improvements, but I don't have enough
experience to guarantee it (just lots of theory...).

-- Jamie

Revision history for this message

Miguel Rodríguez (migrax) wrote on 2007-10-09:

#63

It may not be related, but we were having speed issues with tracker in some computers here, and it was fixed after adding the relatime mount option to ext3 indexed partitions.

HTH.

Revision history for this message

nowshining (nowshining) wrote on 2007-10-10:

#64

never had an issue myself however i never did an dist-upgrade, I just input the gutsy sources and downloaded from there.

Revision history for this message

Jamie McCracken (jamiemcc-blueyonder) wrote on 2007-10-10:

#65

Hi Jamie Lokier,

Just to clear up a few issues:

1) All the Dbs (btree/hashes/ berkeley) only have api for updating one key/Value at a time AFAIK

2) In our case, the final index merge is not updating anything as its creating a new index therefore the disk space for hits is contiguous so regardless of what word we start from, space is allocated on a first come first served basis (IE its appended) so whatever word order we choose basically. The buckets in the header are random of course but they are fixed at first 1MB of index (256,000 buckets at 32bits each)

3) all major indexers Lucene (Beagle/strigi) and google use index merges as updating a big index is slow + it helps remove deleted entries and fragmentation. Without merges no index would be scalable

4) we dont wanna use multiple tables and sql dbs are not appropriate as they store the word twice (once in index and once in table) hence bloating things up

5) The high end oracle RDBMs has support for clustered tables which allow storage of stuff in key order (normal tables are appended and only indexes are sorted). These are not practical as they are even more painful to update due to massive relocation (in fact its far quicker to append records then copy to new table in sorted order).

6) performance problems with existing merges dissappear on XFS (they merge in seconds as opposed to minutes on EXT3). If EXT4 gets similar delayed allocation then hopefully we will see same too

Revision history for this message

Jamie Lokier (jamie-shareable) wrote on 2007-10-10:

#66

Miguel Rodríguez wrote:
> It may not be related, but we were having speed issues with tracker in
> some computers here, and it was fixed after adding the relatime mount
> option to ext3 indexed partitions.

Back when I first tried it, I added noatime (before I knew about
relatime), and it did indeed help a lot.

Without noatime, the disk seeked heavily and the disk I/O activity
(according to the Gnome System Monitor applet) was always high while
indexing.

With noatime, the disk seeked a lot less, and the disk I/O activity
appeared to be nearly zero.

When I saw such a dramatic change, I thought that would mean the
problems affecting destkop application performance would be fixed.

However, despite the lack of much accounted-for I/O, and less noise
from the disk, for some reason all applications still ran really
slowly.

So, yes: noatime/relatime makes a good and essential different on ext3
partitions, and probably others. On latest kernels and newer
Trackers, O_NOATIME is I think, so you don't need those mount options.
(But relatime is generally a good choice anyway).

But: it's not enough by itself, at least on some of our systems.

-- Jamie

Revision history for this message

M (asobi) wrote on 2007-10-14:

#67

Trackerd consumes a lot of CPU mainly when I'm downloading something. I notice it most obviously when using KTorrent. It's as if trackerd sees the change to some blocks of the file and has to rescan the whole thing, over and over again. Since the file is continually modified, as long as anything is downloading, trackerd will grab 100% of my CPU. This happens on desktops, laptops, even while on battery. It is a major problem.

Revision history for this message

Arthur (moz-liebesgedichte) wrote on 2007-10-15:

#68

I'm on an old Athlon XP 1700+ with 512 MB RAM and never had such extreme "response blockers" in Feisty before. I've upgraded to Gutsy somewhere around beta release time. I'm not using trackerd and beagle is usually deactivated. But when issuing an 'aptitude dist-upgrade' GUI programs sometimes don't react for something like 8 seconds which never happened when on feisty.

Revision history for this message

Alexey Borzenkov (snaury) wrote on 2007-10-15:

#69

I can confirm that this is unrelated to trackerd at all (I had it uninstalled as soon as I was constantly running into it), and while observing memory consumption lately I can see that although I have 1GB or RAM, system monitor shows that only around 600-700MB of it are being used, and used swap is raising with time. Now I observe that it is 700MB of memory and 597MB swap occupied somehow, even while update-manager downloads packages my letters come out now with a very noticeable delay. The weirdest thing is that system monitor shows only one big consumer, firefox-bin (which it shows as 254MB only), and others are no bigger than 50MB. Top on the other hand shows unrealistically huge numbers for most applications (Xorg: 247m VIRT 184m SWAP, synaptic: 188m VIRT, 150m SWAP, firefox-bin: 915m VIRT, 661m SWAP, etc). I don't understand what's going on, but as far as I've seen before, as soon as synaptic starts installing updates it's just a complete showstopper for me. Disk thrashing and memory usage seem to suggest that my system is actually swapping BADLY (as if I don't touch my PC while it's updating and then come back all applications slowly get unswapped, even after disk activity is gone).

Could it be a memory leak somewhere? Or what is it, what can it be? It's disk activity related (copying big files, installing packages, other activities do trigger slowdowns, even on -rt kernel), but in my case it also seems to be memory related, at least the more time passes, the more it starts looking like swapping issue.

My current uptime is 4 days 13:39, that's why it's gone too bad.

Also I'm using linux-rt on amd64, though I'm thinking to move back to linux-general after the next reboot.

Anyone else noticed something strange with memory consumption?

P.S. When I first moved from WinXP to Feisty I was so laughing at WinXP as Feisty took just above only quarter of my memory. Why is it suddenly so big now?

Revision history for this message

Miguel Martinez (el-quark) wrote on 2007-10-16:

#70

I subscribe Alexey's comments.

I've just started my laptop and the only things I've done are installing
today's and yesterday's updates (54), create a tarball via Nautilus, check
my e-mail and edit a LaTeX file (didn't compile it). This is the output of
free:

$ free -m
total used free shared buffers cached
Mem: 503 451 52 0 51 233
-/+ buffers/cache: 166 336
Swap: 517 33 483

And the uptime is...

$ uptime
10:20:05 up 23 min, 2 users, load average: 0.02, 1.02, 1.37

Fortunately, since my gutsy system is installed from scratch, I didn't get
the incredible slowdown in responsiveness I used to get on a dist-upgraded
machine.

Revision history for this message

Jeff Fortin Tam (kiddo) wrote on 2007-10-16:

#71

seems like three different issues here:
- a kernel I/O bug (which is the one I'm interested about, according to this thread's title
- tracker indexing
- memory problems according to the newest comments?

I really think #2 is unrelated and #1 is the problem that needs to be fixed, otherwise the rest seems like band-aids.
There's only 2-3 days left before the 7.10 final release, will we dist-upgraders all be forced to clean-install? And will those dist-upgrading from feisty at that release date be bitten by that bug? I am worried.

Revision history for this message

b5baxter (robert-vanrenewable) wrote on 2007-10-30:

#72

I am also experiencing disk thrashing with the following characteristics.
- Mouse becomes very slow to respond
- Keyboard becomes extremely slow or unresponsive
- screen becomes unresponsive
- often requires a power down
- Tracker has been disabled
- Compaq Presario R3000 with AMD 64
- Install of Gutsy 7.1 on new partition
- Dual boots to Windows XP with no disk problems
- Fedora 7 was previously installed on same machine with no disk problems.
- System Monitor (and Htop) shows 100% CPU but all listed tasks are only using a small fraction of the CPU. Swap is not being used. (But may not be accurate because often screen freezes when the thrashing starts).

Revision history for this message

pinepain (pinepain) wrote on 2007-11-03:

#73

maybe hdd works in one of PIO mode? this mode use CPU to accelerate io operations (something like that =), look more in wikipedia) try to set one of udma

Revision history for this message

unggnu (unggnu) wrote on 2007-12-17:

#75

I can confirm this issue with Gutsy. It lets even stop audio and video for one second after a short period. An easy solution was to killall trackerd but the problem seems to become more bad in Hardy. I have made a dist-upgrade of current packages and on some configuring even the mouse lags pretty much which seems to be much more bad as in Gutsy.
Btw. I have an pata controller but sda drives so libata is used I guess.

Revision history for this message

Ravindran K (ravindran-k) wrote on 2008-01-20:

#76

I strongly think that this can be due another DMA related issue. I observe that DMA is not enabled on my HDDs and I'm unable to enable themusing hdparm. I have no idea how to accomplish the same using sdparm As a result, the drives are running slower.

Apparently, this due to the conversion of HDX to SDX (hda to sda) in recent Ubuntu releases. Not sure when this started, There are other bug reports related to this https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/110636
(I'm trying the Temp fix suggested in that ...).
Please check whether you all also observer something similiar.

Some info below on messages I get:
____________________________________________________

hdparm -i /dev/sda

/dev/sda:

Model=QUANTUM FIREBALLlct15 20 , FwRev=A01.0F00, SerialNo=613024132415
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
RawCHS=16383/16/63, TrkSize=32256, SectSize=21298, ECCbytes=4
BuffType=DualPortCache, BuffSize=418kB, MaxMultSect=16, MultSect=?16?
CurCHS=17475/15/63, CurSects=16513875, LBA=yes, LBAsects=39876480
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 *udma2 udma3 udma4
AdvancedPM=no WriteCache=enabled
Drive conforms to: ATA/ATAPI-5 T13 1321D revision 1: ATA/ATAPI-1,2,3,4,5

* signifies the current active mode

root@homeserver:~# hdparm -i /dev/sdb

/dev/sdb:

Model=ST380011A , FwRev=8.01 , SerialNo=4JV1QKYD
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=unknown, BuffSize=2048kB, MaxMultSect=16, MultSect=?16?
CurCHS=17475/15/63, CurSects=16513875, LBA=yes, LBAsects=156301488
IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 *udma2 udma3 udma4 udma5
AdvancedPM=no WriteCache=enabled
Drive conforms to: ATA/ATAPI-6 T13 1410D revision 2: ATA/ATAPI-1,2,3,4,5,6

* signifies the current active mode

root@homeserver:~#
____________________________________________________
root@homeserver:~# hdparm -d1 -X66 /dev/sda

/dev/sda:
setting using_dma to 1 (on)
HDIO_SET_DMA failed: Inappropriate ioctl for device
setting xfermode to 66 (UltraDMA mode2)
SG_IO: bad/missing ATA_16 sense data:: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
HDIO_DRIVE_CMD(setxfermode) failed: Input/output error
root@homeserver:~# hdparm -d1 -X66 /dev/sdb

/dev/sdb:
setting using_dma to 1 (on)
HDIO_SET_DMA failed: Inappropriate ioctl for device
setting xfermode to 66 (UltraDMA mode2)
SG_IO: bad/missing ATA_16 sense data:: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
HDIO_DRIVE_CMD(setxfermode) failed: Input/output error

I strongly think that this can be due another DMA related issue. I observe that DMA is not enabled on my HDDs and I'm unable to enable themusing hdparm. I have no idea how to accomplish the same using sdparm  As a result, the drives are running slower.

Apparently, this due to the conversion of HDX to SDX (hda to sda) in recent Ubuntu releases. Not sure when this started, There are other bug reports related to this https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/110636 
(I'm trying the Temp fix suggested in that ...). 
Please check whether you all also observer something similiar.

Some info below on messages I get:
____________________________________________________

hdparm -i /dev/sda

/dev/sda:

Model=QUANTUM FIREBALLlct15 20                , FwRev=A01.0F00, SerialNo=613024132415        
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=32256, SectSize=21298, ECCbytes=4
 BuffType=DualPortCache, BuffSize=418kB, MaxMultSect=16, MultSect=?16?
 CurCHS=17475/15/63, CurSects=16513875, LBA=yes, LBAsects=39876480
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 *udma2 udma3 udma4 
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: ATA/ATAPI-5 T13 1321D revision 1:  ATA/ATAPI-1,2,3,4,5

* signifies the current active mode

root@homeserver:~# hdparm -i /dev/sdb

/dev/sdb:

Model=ST380011A                               , FwRev=8.01    , SerialNo=4JV1QKYD            
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=2048kB, MaxMultSect=16, MultSect=?16?
 CurCHS=17475/15/63, CurSects=16513875, LBA=yes, LBAsects=156301488
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 *udma2 udma3 udma4 udma5 
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: ATA/ATAPI-6 T13 1410D revision 2:  ATA/ATAPI-1,2,3,4,5,6

* signifies the current active mode

root@homeserver:~# 
____________________________________________________
root@homeserver:~# hdparm -d1 -X66 /dev/sda

/dev/sda:
 setting using_dma to 1 (on)
 HDIO_SET_DMA failed: Inappropriate ioctl for device
 setting xfermode to 66 (UltraDMA mode2)
SG_IO: bad/missing ATA_16 sense data::  70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 HDIO_DRIVE_CMD(setxfermode) failed: Input/output error
root@homeserver:~# hdparm -d1 -X66 /dev/sdb

/dev/sdb:
 setting using_dma to 1 (on)
 HDIO_SET_DMA failed: Inappropriate ioctl for device
 setting xfermode to 66 (UltraDMA mode2)
SG_IO: bad/missing ATA_16 sense data::  70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 HDIO_DRIVE_CMD(setxfermode) failed: Input/output error

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2008-02-03:

#77

I have made some tests on different notebooks. I have made a dist upgrade of the 32-Bit Version of feisty and a fresh install of the 32-Bit and 64-Bit version of Gusty. The disc performance is bad. It occurs, when an program accesses the hard drive heavily. It's not a tracker problem. The resume process of an 512MB RAM of the virtual machine (VMWare Workstation 6) last about 5 minutes, on Feisty about 1 minute. Sometimes the mouse freezes for two seconds on heavy disc access.

The problem occurs on my sata and pata machine.

ThinkPad R50p - Pentium M (1,7GHz) - 2GB – PATA
ThinkPad T61p – Core2 Duo 7700 (2,4GHz) – 4 GB – SATA

On my old machine Gusty is unusable, that's why I reinstalled Feisty. There are no problems under Feisty yet. Gusty on the new machine is like using Vista. Gnome needs a lot of time to start and I am missing the fast response I know from Feisty. On Gusty I starts Firefox and it takes about 8 second before the firefox window pops up. My old Pentium M seems to be faster (faster response) than my new machine with Gusty.

I think it's not an DMA problem, because I get good results with hdparm -tT /dev/hda.

/dev/sda:
Timing cached reads: 8604 MB in 1.99 seconds = 4315.53 MB/sec
Timing buffered disk reads: 136 MB in 3.01 seconds = 45.17 MB/sec

And I get good read and write results on the hard disc at 800MHz (lowest frequency), when there is no other disc access.

sudo dd if=/dev/sda1 of=/dev/null bs=1M count=1000
1048576000 Bytes (1,0 GB) kopiert, 18,7464 Sekunden, 55,9 MB/s

dd if=/dev/zero of=test bs=1M count=1000
1048576000 Bytes (1,0 GB) kopiert, 17,0126 Sekunden, 61,6 MB/s

The sync command last about one second.
I am using ext3 as file system and I don't think it's an ext3 problem, because there aren't any problems without concurrent access.

Filesystem features: has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file
Filesystem flags: signed directory hash

I am using now the newest kernel, but same problems on all other kernels.
2.6.22-14-generic #1 SMP Tue Dec 18 05:28:27 UTC 2007 x86_64 GNU/Linux

I have made some tests on different notebooks. I have made a dist upgrade of the 32-Bit Version of feisty and a fresh install of the 32-Bit and 64-Bit version of Gusty. The disc performance is bad. It occurs, when an program accesses the hard drive heavily. It's not a tracker problem. The resume process of an 512MB RAM of the virtual machine (VMWare Workstation 6) last about 5 minutes, on Feisty about 1 minute. Sometimes the mouse freezes for two seconds on heavy disc access.

The problem occurs on my sata and pata machine.

ThinkPad R50p - Pentium M (1,7GHz) - 2GB – PATA
ThinkPad T61p – Core2 Duo 7700 (2,4GHz) – 4 GB – SATA

On my old machine Gusty is unusable, that's why I reinstalled Feisty. There are no problems under Feisty yet. Gusty on the new machine is like using Vista. Gnome needs a lot of time to start and I am missing the fast response I know from Feisty. On Gusty I starts Firefox and it takes about 8 second before the firefox window pops up. My old Pentium M seems to be faster (faster response) than my new machine with Gusty.

I think it's not an DMA problem, because I get good results with hdparm -tT /dev/hda.

/dev/sda: 
 Timing cached reads:   8604 MB in  1.99 seconds = 4315.53 MB/sec 
 Timing buffered disk reads:  136 MB in  3.01 seconds =  45.17 MB/sec

And I get good read and write results on the hard disc at 800MHz (lowest frequency), when there is no other disc access.

sudo dd if=/dev/sda1 of=/dev/null bs=1M count=1000 
1048576000 Bytes (1,0 GB) kopiert, 18,7464 Sekunden, 55,9 MB/s

dd if=/dev/zero of=test bs=1M count=1000
1048576000 Bytes (1,0 GB) kopiert, 17,0126 Sekunden, 61,6 MB/s

The sync command last about one second.
I am using ext3 as file system and I don't think it's an ext3 problem, because there aren't any problems without concurrent access.

Filesystem features:      has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file 
Filesystem flags:         signed directory hash

I am using now the newest kernel, but same problems on all other kernels.
2.6.22-14-generic #1 SMP Tue Dec 18 05:28:27 UTC 2007 x86_64 GNU/Linux

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2008-02-09:

#78

I have made some tests with feisty and gutsy.
I connected my SATA drive on the usb port and testet gutsy on the same machine. Now I get a bag read and write performance of about 20MB/s instead of 50MB/s. But the system is now faster. My system consumes much more CPU power, but even thought every program start faster and I get faster response times, even when the harddisk is heavily used and the cpu consume is about 100%.

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2008-02-11:

#79

I have copied two big files concurrent (~15% bigger than my memory) with dd form one ext3 (xfs) to another ext3 (xfs) partition (same hard disk). I tried a block size of 10 bytes and 100 bytes for the copy operation. Tracker is disabled. My disc performance is about 50 - 60 MB/s (USB2 ~20MB/s), when I copy one big file form /dev/zero to disc. The CPU consume is higher in Gutsy and Hardy. Perhaps there is a different caching algorithm implemented since gutsy?
I always used a fresh install with all updates.

Feisty:
bs=100 - xfs=~10MB/s - ext3=~9MB/s
bs=10 - xfs=~6MB/s - ext3=~6MB/s

Gusty:
bs=100 - xfs ~6MB/s - ext3 ~3MB/s (sometimes I get results about 9MB/s for both file systems)
bs=10 - xfs ~3MB/s - ext3 ~1MB/s

Hardy:
bs=100 - xfs=~7MB/s
bs=10 - xfs=~4MB/s

bs=100 - ext3(usb2)=~6MB/s
bs=10 ext3(usb2)=~2,5MB/s

Revision history for this message

pinepain (pinepain) wrote on 2008-02-12:

#80

hi

strange, it is too slow even for coping from one partition to another on
the same hd. it's look like some hardware isn't configured properly.
maybe u should try to manually set one of UDMA mode (max from available,
but first try set max UDMA with software) (not PIO). it helps me on
feisty and also works nice in gutsy. btw, it works 4 me fine on ext2 as
well as on ext3.

also notice, u can have USB2 interface, but not true USB2 speed. btw, it
will took some resources while coping from /dev/zero.

good luck.

Revision history for this message

Alexey Borzenkov (snaury) wrote on 2008-02-13:

#81

I wonder if Ben Collins can give us some status update on what is being done to fix this bug? Is he still working on this bug?

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2008-03-04:

#82

I have tried the vanilla kernel (2.6.22-14) on Gutsy. Now I get better disc performance, (ext3 / bs=10 ~4MB/s / bs=100 7MB/s), but desktop responsiveness becomes worse.
I have tried Hardy too for a while and recognise, that the responsiveness of the desktop is sometimes worse than under gutsy, even when there is no or light disc access only.
Sometimes the gnome menu needs about 2 seconds before it appears. And there are only logos and metadata to load. This are the problems as in gutsy and especially in gusty with 2.6.22 kernel.
Hdparm reposts, that UDMA is on. And there is only iowait cpu consume on lowest cpu frequency and a great disc performace, when coping a normal blocksize (>= 4k) with dd.
Compiz is disabled on every of my installation. I use a 64-bit version of gutsy and hardy.
I don't know the internal design of the linux kernel, so it's only a guess of mine. There must be a bottleneck, which is especially caused by high disc access, but also occurs on other activities. Perhaps interrupt handling. Powertop reports ~400 awaking of the keyboard while copying to files from one partition to another partition with a block size of 4k and writing this text. And I am writing only maximal two or three letters per second. And the count is only as high, when coping the files. Without disc access there are only about 200 awaking, while writing in the same speed.

With two writing and two reading disc access
  38,9% (445,7) <interrupt> : PS/2 keyboard/mouse/touchpad
  17,5% (200,7) <interrupt> : extra timer interrupt
  14,5% (165,8) <interrupt> : libata
   8,8% (100,5) <interrupt> : uhci_hcd:usb1, eth0

Without high io access
  31,2% (213,6) <interrupt> : extra timer interrupt
  25,6% (175,2) <interrupt> : PS/2 keyboard/mouse/touchpad
  14,7% (100,5) <interrupt> : uhci_hcd:usb1, eth0
   8,9% ( 60,8) <interrupt> : uhci_hcd:usb3, ahci, yenta, nvidia

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2008-03-05:

#83

Hi Thomas,

Care to comment which version of the Hardy kernel you are running (cat /proc/version_signature)? Also are you testing on a fully uptodate Hardy install or are you just running the hardy kernel from say a Gutsy install?

Note that we'll keep this report open against the actively developed kernel bug against 2.6.22 this will be closed. Thanks.

Changed in linux:
status:	New → Incomplete
Changed in linux-source-2.6.22:
status:	In Progress → Won't Fix

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2008-03-06:

#84

Hi Leann,

I have tried a kernel from kernel.org under Gutsy, because I cannot work on my machine anymore.
The interrupt issue seems to be a problem of my kernel-build.

With two writing and two reading disc access under Hardy 2.6.24-11-generic:
  27,1% (173,7) <kernel IPI> : Rescheduling interrupts
  20,4% (130,4) USB device 1-1 : BCM2045B (Broadcom Corp)
  15,6% (100,0) <interrupt> : uhci_hcd:usb1
  10,3% ( 66,0) <interrupt> : libata
   9,5% ( 60,9) <interrupt> : uhci_hcd:usb3, yenta, nvidia
   5,7% ( 36,8) dd : blk_plug_device (blk_unplug_timeout)

Hardy is a fresh test installation and with all updates

The problem was there under the 2.6.24-10 kernel.
2.6.24-10-generic #1 SMP Fri Feb 22 18:26:06 UTC 2008 x86_64 GNU/Linux

Now I try the 2.6.24-11 kernel.
2.6.24-11-generic #1 SMP Fri Feb 29 21:26:31 UTC 2008 x86_64 GNU/Linux
(Ubuntu 2.6.24-11.17-generic)

I have only made some tests on this kernel, but there are hang-ups, when there are disc activities. E.g. switching a desktop last 2 seconds, the response of selecting icons on the desktops is executed after 2-4 seconds. Sometimes the main menu stays still there and freezes after a program has been started for many seconds. Firefox hangs for 20 - 30 seconds (i think it' s a firefox problem).
The problem does not occur regular and there are periods when the systems works smooth. But from time to time (20s - 10min) the problem occurs every few seconds.

Can I make something to help solve the problem?

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2008-03-06:

#85

Hi Leann,

I have made a mistake in the kernel versions. I am using the "Ubuntu 2.6.22-14.52-generic" kernel under gusty and tried the kernel linux-2.6.24.2 from kernel.org, but the desktop responsibility was not good (I think it was an configuration mistake of my kernel build).
I tried the linux-2.6.24.2 from kernel.org, because I had no desktop responsiveness problems under ArchLinux on the same machine and they use the 2.6.24.1 kernel (now it's 2.6.24.3). (http://www.archlinux.org/packages/13318/)

The result of the post from 2008-03-04 are made under gusty "Ubuntu 2.6.22-14.52-generic".

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2008-03-17:

#86

I can easy reproduce the problem using, when execute the following command on hardy (2.6.24-12.22-generic). All updated. I have not tested it under gutsy.

dd if=/dev/zero of=test1 bs=4k count=250000 & \
dd if=/dev/zero of=test2 bs=4k count=250000 & \
dd if=/dev/zero of=test3 bs=4k count=250000 & \
dd if=/dev/zero of=test4 bs=4k count=250000 & \
dd if=/dev/zero of=test5 bs=4k count=250000 & \
dd if=/dev/zero of=test6 bs=4k count=250000 & \
dd if=/dev/zero of=test7 bs=4k count=250000 & \
dd if=/dev/zero of=test8 bs=4k count=250000 &

The firefox freezes almost every time. Other application (evolution) can be stooped too, but it's not as easy as using firefox.
If I want to switch the desktop by key combination, sometimes it is executes after 10 seconds.
From time to time I can produce an complete freeze of gnome (input events?). All applets (like systemmonitor, cpu-frequency) are working correctly, but I cannot move any windows, using the mouse or keyboard. I have to switch to an console (Ctrl+Alt+F1) and kill all dd processes as root. After killing the dd processes, gnome and all input events work.
The average system load is continuesly climbing. While it reaches a value about 8, the klogd usages 100% of the cpu (I think the daemon crashes).

Leann Ogasawara (leannogasawara) on 2008-03-20

Changed in linux:
assignee:	nobody → ubuntu-kernel-team
importance:	Undecided → Medium
status:	Incomplete → Triaged

Revision history for this message

exactt (giesbert) wrote on 2008-04-13:

#87

this looks like a dup of #43484

Revision history for this message

Francisco Borges (francisco-borges) wrote on 2008-04-13:

#88

On Sun, Apr 13, 2008 at 8:19 PM, exactt <email address hidden> wrote:
> this looks like a dup of #43484

Perhaps it is.

But FWIW I would just like to point out that:

1. I have the same case as many others here (heavy disk IO -> poor
system responsiveness)

2. However, unlike every note on LP #43484, I am running reiserfs, and not ext3.

[...]

Is anybody experiencing this bug running with "data=writeback"?

Cheers,
--
Francisco

Revision history for this message

Dym (dmarszal) wrote on 2008-04-17:

#89

Latest Hardy, while copying large file or other heavy I/O operation load peaks to 8.0 and desktop becomes unresponsive, You can see this well in Firefox.

uname -a
Linux dark-laptop 2.6.24-16-generic #1 SMP Thu Apr 10 13:23:42 UTC 2008 i686 GNU/Linux

hdparm -tT /dev/sda

/dev/sda:
Timing cached reads: 3918 MB in 2.00 seconds = 1962.06 MB/sec
Timing buffered disk reads: 138 MB in 3.01 seconds = 45.88 MB/sec

Revision history for this message

Dym (dmarszal) wrote on 2008-04-17:

#90

Booting Hardy with old 2.6.22 kernel fixes the problem. Load avg after copying over 4 GB no more than 3.3.

uname -a
Linux dark-laptop 2.6.22.7 #4 SMP Fri Oct 19 16:03:46 CEST 2007 i686 GNU/Linux

hdparm -tT /dev/sda

/dev/sda:
Timing cached reads: 6546 MB in 1.99 seconds = 3286.52 MB/sec
Timing buffered disk reads: 136 MB in 3.01 seconds = 45.18 MB/sec

My config is HP 6710b laptop.

Revision history for this message

Jamie McCracken (jamiemcc-blueyonder) wrote on 2008-04-17:

#91

The kernel needs to throttle the process thats doing a large heavy write rather than stalling every other process thats trying to read/write to disk

Im surprised this has not been fixed at the kernel yet ):

Revision history for this message

Sam Kimbrel (kimbrel) wrote on 2008-05-05:

#92

I can confirm this on a fresh install of 8.04, kernel 2.6.24-16-generic.

Any sustained disk I/O causes other apps to become unresponsive.

I don’t get how this was not caught in beta, because it has the capability to render my system unusable.

Revision history for this message

Tim Gardner (timg-tpi) wrote on 2008-05-05:

#93

Enable Hardy -proposed and install the -17.31 kernel which has SCHED_CGROUPS enabled. I believe it will have an effect on interactivity responsiveness.

Changed in linux:
status:	Triaged → Confirmed

Revision history for this message

Francisco Borges (francisco-borges) wrote on 2008-05-05:

#94

On Mon, May 5, 2008 at 4:02 PM, Tim Gardner <email address hidden> wrote:
> Enable Hardy -proposed and install the -17.31 kernel which has
> SCHED_CGROUPS enabled. I believe it will have an effect on interactivity
> responsiveness.

Just to help other people that perhaps were puzzled, like I was, by
Tim's comments.

The package he was talking about is this one:
https://launchpad.net/ubuntu/hardy/+source/linux/2.6.24-17.31

Revelant bugs that are probably related to this one:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/188226 (see the
description of this one)

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/218516

[...]

@Tim: thanks for the tip, I will be trying the package when I get home.

--
Francisco

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2008-05-05:

#95

The kernel 2.6.24-17 does not help. I think it's becomes even worse with the new kernel.
I tried Debian Lenny and Fedora Core 8. The problem exists in these distros too, but it's much better than in gutsy. I am using ext3 in journal mode. I tried xfs, but same problem. I had never performance problems with ext3 under feisty.

Hardy is unusable for me. It's fine for surfing, writing documents or reading pdfs. It's awful for working with vmware or disk intensive apps.

I just finished a backup (2.6.24-16) with rdiff-backup (pybackpack) on a local usb2 harddisk. I think it's related with this problem. Here is the log.

StartTime 1209878093.00 (Sun May 4 07:14:53 2008)
EndTime 1210019188.36 (Mon May 5 22:26:28 2008)
ElapsedTime 141095.36 (39 hours 11 minutes 35.36 seconds)
SourceFiles 530196
SourceFileSize 171720983690 (160 GB)
MirrorFiles 501408
MirrorFileSize 139162927405 (130 GB)
NewFiles 240477
NewFileSize 86693084491 (80.7 GB)
DeletedFiles 211689
DeletedFileSize 57296675851 (53.4 GB)
ChangedFiles 103582
ChangedSourceSize 33585642382 (31.3 GB)
ChangedMirrorSize 30423994737 (28.3 GB)
IncrementFiles 555748
IncrementFileSize 32637905299 (30.4 GB)
TotalDestinationSizeChange 65195961584 (60.7 GB)
Errors 66

The last backup under gutsy takes about two hours.

StartTime 1205655350.00 (Sun Mar 16 09:15:50 2008)
EndTime 1205663714.77 (Sun Mar 16 11:35:14 2008)
ElapsedTime 8364.77 (2 hours 19 minutes 24.77 seconds)
SourceFiles 501408
SourceFileSize 139162927405 (130 GB)
MirrorFiles 497434
MirrorFileSize 147995632923 (138 GB)
NewFiles 5124
NewFileSize 316728530 (302 MB)
DeletedFiles 1150
DeletedFileSize 9327300673 (8.69 GB)
ChangedFiles 1648
ChangedSourceSize 41242024637 (38.4 GB)
ChangedMirrorSize 41064158012 (38.2 GB)
IncrementFiles 7924
IncrementFileSize 343516700 (328 MB)
TotalDestinationSizeChange -8489188818 (-7.91 GB)
Errors 34

Revision history for this message

Rocko (rockorequin) wrote on 2008-05-14:

#96

2.6.24-17 doesn't fix it for me either. The desktop still becomes unusable when copying large files, or after some time using vmware or virtualbox. It's unbelievable that this bug survived to release, especially an LTS release.

When's -18 coming out?

Revision history for this message

Gate (gatewarstrekme) wrote on 2008-05-18:

#97

This is happening to me on Hardy with 2.6.24-16 when copying large numbers of files.

Weird thing is that Compiz remains perfectly responsive (desktop switching) but every other application including Firefox and and the terminal emulator continue to remain unresponsive for me minutes to *hours* after the disk I/O has finished (copying a few thousand files from USB to HDD using cp).

This happens despite processor and RAM usage both staying under 40%.

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2008-05-19:

#98

Hi Guys,

If you are willling, just to see if it makes a difference, care to test the upcoming Intrepid Ibex 8.10 kernel? It was most recently rebased with the upstream 2.6.25 kernel and is currently available in the following PPA:

https://edge.launchpad.net/~kernel-ppa/+archive

If you are not familiar with how to install packages from a PPA basically do the following . . .

Create the file /etc/apt/sources.list.d/kernel-ppa.list to include the following two lines:

deb http://ppa.launchpad.net/kernel-ppa/ubuntu hardy main
deb-src http://ppa.launchpad.net/kernel-ppa/ubuntu hardy main

Then run the command: sudo apt-get update

You should then be able to install the linux-image-2.6.25 kernel package. After you've finished testing you can remove the kernel-ppa.list file and run 'sudo apt-get update' once more. Please let us know your results. Thanks

Revision history for this message

Reeve Yang (reeve-yang) wrote on 2008-05-20:

#99

Though I'm not ubutu user, but I do have the same problem while upgrading vanilla kernel from 2.6.17.4 to 2.6.22.15. Therefore it should not be the ubutu specific problem. Here is bonnie++ test result on old and new kernel respectively. The disk throughput is improved 10% but CPU utilization shoot up to 95% from 34%, for the same amount of I/O. Could someone help to point to any patches, or fixes on this issue?

## ###############Linux 2.6.17 #1 SMP Tue May 6
##################################

Version 1.03c ------Sequential Output------ --Sequential Input- --Random-
                   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP
K/sec %CP /sec %CP
ib-10-34-68-2 2016M 23989 35 44123 6 16360 1 21823 28 43090 1 172.7 0
                   ------Sequential Create------ --------Random Create--------
                   -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
             files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
                16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
ib-10-34-68-2.infoblox.com,2016M,23989,35,44123,6,16360,1,21823,28,43090,1,172.7,0,16,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++

## ###############Linux 2.6.22.15 #1 SMP Tue May 6
##################################

bonnie++ -d /storage -s 2016M -u root
                   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP
K/sec %CP /sec %CP
ib-10-34-68-2 2016M 26078 94 52117 17 23596 5 27172 86 56402 4 160.2 0
                   ------Sequential Create------ --------Random Create--------
                   -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
             files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
                16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
ib-10-34-68-2.infoblox.com,2016M,26078,94,52117,17,23596,5,27172,86,56402,4,160.2,0,16,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++
###################################################################################

Though I'm not ubutu user, but I do have the same problem while upgrading vanilla kernel from 2.6.17.4 to 2.6.22.15. Therefore it should not be the ubutu specific problem. Here is bonnie++ test result on old and new kernel respectively. The disk throughput  is improved 10% but CPU utilization shoot up to 95% from 34%, for the same amount of I/O. Could someone help to point to any patches, or fixes on this issue?

## ###############Linux 2.6.17 #1 SMP Tue May 6
##################################

Version 1.03c       ------Sequential Output------ --Sequential Input- --Random-
                   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size       K/sec %CP K/sec %CP K/sec %CP K/sec %CP
K/sec %CP  /sec %CP
ib-10-34-68-2 2016M 23989  35 44123   6 16360   1 21823  28 43090   1 172.7   0
                   ------Sequential Create------ --------Random Create--------
                   -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
             files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
ib-10-34-68-2.infoblox.com,2016M,23989,35,44123,6,16360,1,21823,28,43090,1,172.7,0,16,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++

## ###############Linux 2.6.22.15 #1 SMP Tue May 6
##################################

bonnie++ -d /storage -s 2016M -u root
                   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size       K/sec %CP K/sec %CP K/sec %CP K/sec %CP
K/sec %CP  /sec %CP
ib-10-34-68-2 2016M 26078  94 52117  17 23596   5 27172  86 56402   4 160.2   0
                   ------Sequential Create------ --------Random Create--------
                   -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
             files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
ib-10-34-68-2.infoblox.com,2016M,26078,94,52117,17,23596,5,27172,86,56402,4,160.2,0,16,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++
###################################################################################

Revision history for this message

Rocko (rockorequin) wrote on 2008-05-23:

#100

I booted into kernel 2.6.25.1, copied a 2.8 GB file from one internal partition to another and tried using the desktop during the copy.

I *think* responsiveness is improved. Firefox didn't grey out at all during the copy, and I could open other nautilus windows. But about 2GB into the copy FF started taking a long time to respond.

It's hard to be sure because a lot of new stuff in the new kernel was broken: in particular, USB works, but transfers data intermittently. So I couldn't use the USB mouse because it moved too jerkily. I also couldn't test desktop responsiveness while copying large files to an USB drive.

Revision history for this message

Ravindran K (ravindran-k) wrote on 2008-05-23:

#101

Greetings to all,

I ran the below bonnie++ and my system was responding fine during the tests. Please see and let all know if the results make sense.

Linux ravi-desktop 2.6.24-17-server #1 SMP Thu May 1 15:05:55 UTC 2008 i686 GNU/Linux

Version 1.03b ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
ravi-desktop 8G 38301 58 38881 8 16309 4 18669 34 31416 4 91.5 0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
                 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
ravi-desktop,8G,38301,58,38881,8,16309,4,18669,34,31416,4,91.5,0,16,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++

Revision history for this message

exactt (giesbert) wrote on 2008-05-26:

#102

@Rocko
maybe this (http://shaver.off.net/diary/2008/05/25/fsyncers-and-curveballs/ ) explains your firefox behaviour after the 2GB...?

Revision history for this message

Rocko (rockorequin) wrote on 2008-05-27:

#103

@exactt: Yes, it could have been that. Interesting article.

On another note, after reading bug #188226, I tried the same test using the 2.6.24-17-server kernel and the desktop just flew! No pauses, no windows greying out, and FF3 was responsive all the way through. It's a pity that the server kernel doesn't configure the sound card etc, otherwise I'd just use it instead of the generic one.

And just to be sure there weren't any other changes that had improved things, I retried the test under 2.6.24-17-generic and FF3 still greyed out during the copy.

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2008-05-27:

#104

The ppa kernel does not fix the problem. It is only delayed. I have made some tests on a fresh hardy installation with the kernel Ubuntu 2.6.20-15.27-generic (feisty repository), Ubuntu 2.6.24-17.31-generic and Ubuntu 2.6.25-1.2ubuntu6-generic. I always restarted the system and started the tests directly after the login. To simulate a disc intensive application, I copied eight 1GB files from /dev/zero to the harddisk. It has a similar behavior as working with vmware and eclipse.
Every application was started only once on every tests (there was always a reboot between two gimp tests or two firefox tests). Sometimes it takes a lot of time to browse to /usr/share/backgrounds and to open a image or to select an tool under the 24 and 25 kernel. And I have to wait 5s - 10s after every mouse action. But this behavior is not deterministic, and occurs only every second or third test. You can see it on the firefox test.

Has someone recognized the poor desktop responsiveness on a SCSI-System?

And I don't think, that this problem is caused by the kernel only, because the desktop freezes occurs with the feisty kernel under hardy too. Perhaps something with xorg or gnome?

test results:

kernel
20 / 24 / 25

start gimp at load avg 6
10s / 30s / 30s

start gimp at load avg 8
10s / 23s / 40s

start firefox at load avg 10
17s / 44s / 44s
load four pages (saved session) after start
7s / 27s / 22s

start firefox as load avg 14
15s / 20s / 60s
load four pages (saved session) after start
7s / 20s / 20s

starting oowriter at load avg 12
15s / 20s / 40s

Revision history for this message

Anil (anilkumar-as) wrote on 2008-05-28:

#105

Thomas, I did similar tests with hardy. I had the same results. But a fresh installation is working just fine. This happened only when I upgraded to Hardy. I wonder if it is related to libata bug 195221 (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/195221)
You can see Ravindran's comment (https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/131094/comments/760) It show udma2 being selected instead of udma4.
Can others confirm this ?

Revision history for this message

Rocko (rockorequin) wrote on 2008-05-28:

#106

hdparm -i shows that Hardy is configuring the drives on both my laptops correctly for udma5 (100 MB/s), so I don't think that is the problem.

Revision history for this message

Francisco Borges (francisco-borges) wrote on 2008-05-28:

#107

On Wed, May 28, 2008 at 8:42 AM, Rocko <email address hidden> wrote:
> hdparm -i shows that Hardy is configuring the drives on both my laptops
> correctly for udma5 (100 MB/s), so I don't think that is the problem.

Same here. My laptop runs with udma4 but still presents the
"responsiveness" problem.

--
Francisco

Revision history for this message

Anil (anilkumar-as) wrote on 2008-05-29:

#108

ya, you are right, I added one of the patches given in the link which forced udma4 selection. But the problem still exists.
Can anybody suggest me an alternate other than fresh install. I have a laptop with no cdrom and limited internet.A kernel switch to 2.6.22 on hardy didn't solve the problem. Any older kernel that might work ? Does 2.6.20 work fine ?
Thomas's tests shows 2.6.20 working fine. Is it recommended to use this kernel on hardy ?

Revision history for this message

Ravindran K (ravindran-k) wrote on 2008-05-29:

#109

Hi ppl.. Pls try the server kernels (eg. 2.6.24-17-server ) and check whether you have such issues.

Revision history for this message

Francisco Borges (francisco-borges) wrote on 2008-05-29:

#110

On Thu, May 29, 2008 at 6:10 PM, Ravindran K <email address hidden> wrote:
> Hi ppl.. Pls try the server kernels (eg. 2.6.24-17-server ) and check
> whether you have such issues.

I just booted with 2.6.24-17. It appears to solve the problem.

My usual test is to start copying large files (to an external disk),
and try to show/hide Yakuake. Which to my surprise, doesn't freeze mid
way through the screen.

Making multiple threads read from /dev/zero (see below), with atop
reporting disk busy at 99%, I still have a responsive system.

Took this from an earlier email from Thomas Pi:

dd if=/dev/zero of=test1 bs=4k count=250000 & \
dd if=/dev/zero of=test2 bs=4k count=250000 & \
dd if=/dev/zero of=test3 bs=4k count=250000 & \
dd if=/dev/zero of=test4 bs=4k count=250000 & \
dd if=/dev/zero of=test5 bs=4k count=250000 & \
dd if=/dev/zero of=test6 bs=4k count=250000 & \
dd if=/dev/zero of=test7 bs=4k count=250000 & \
dd if=/dev/zero of=test8 bs=4k count=250000 &

--
Francisco

Revision history for this message

Anil (anilkumar-as) wrote on 2008-05-30:

#111

It's working fine with 2.6.24-17-server. Even with udma2 selected, the responsiveness is great.
The one difference is saw in the config of generic and server kernel that might me affecting is this

CONFIG_DEFAULT_IOSCHED="cfq" CONFIG_DEFAULT_IOSCHED="deadline"

Revision history for this message

Tobias Oetiker (tobi-oetiker) wrote on 2008-05-30:

#112

Anil,

Today Anil wrote:

> It's working fine with 2.6.24-17-server. Even with udma2 selected, the responsiveness is great.
> The one difference is saw in the config of generic and server kernel that might me affecting is this
>
> CONFIG_DEFAULT_IOSCHED="cfq"
> CONFIG_DEFAULT_IOSCHED="deadline"

you can switch the ioscheduler on the fly:

echo cfq >/sys/block/sda/queue/scheduler

echo deadline >/sys/block/sda/queue/scheduler

(instead of sda use the name of your disk devices)

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten
http://it.oetiker.ch <email address hidden> ++41 62 213 9902

Revision history for this message

Anil (anilkumar-as) wrote on 2008-05-30:

#113

How do you know that scheduler is changed ?
I tried changing it, but didn't have much difference.

Revision history for this message

Tobias Oetiker (tobi-oetiker) wrote on 2008-05-30:

#114

Anil

Do a cat on the file. the word in [...] is the active scheduler.

The reason I am interessted in this bug is that we are seeing
similar issues on file servers and have not been able to pin
them down reliably. We found tweaks here and there, but nothing
decisive. :-(

cheers
tobi

Today Anil wrote:

> How do you know that scheduler is changed ?
> I tried changing it, but didn't have much difference.
>
>

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten
http://it.oetiker.ch <email address hidden> ++41 62 213 9902

Revision history for this message

Anil (anilkumar-as) wrote on 2008-05-30:

#115

Ok, so io scheduler not making any difference in both the kernels.
The new kernel made my touchpad useless now :(

Revision history for this message

Francisco Borges (francisco-borges) wrote on 2008-05-30:

#116

On Thu, May 29, 2008 at 10:27 PM, Francisco Borges
<email address hidden> wrote:
> On Thu, May 29, 2008 at 6:10 PM, Ravindran K <email address hidden> wrote:
>> Hi ppl.. Pls try the server kernels (eg. 2.6.24-17-server ) and check
>> whether you have such issues.
>
> I just booted with 2.6.24-17. It appears to solve the problem.

I just saw that I wasn't entirely clear here. FWIW, what I meant to
say is that I had used

2.6.24-17-server (server!)

and that It appeared to solve the problem.

Cheers,
--
Francisco

Revision history for this message

Ravindran K (ravindran-k) wrote on 2008-05-30:

#117

hii ppl...greetings and TGIF..

my sincere apologies.. I forgot to mention 1 major boot setting which might influence. Please check the combined_mode= and elevator=deadline options enabled in my kernel. writeback option is for better write performance but i'm sure that it doesn't affect read performance.

title Ubuntu 8.04, kernel 2.6.24-17-server
root (hd0,0)
kernel /boot/vmlinuz-2.6.24-17-server root=UUID=b9f4c570-8b44-413d-bcc8-300f0a0890f9 ro combined_mode=libata clocksource=acpi_pm elevator=deadline rootflags=data=writeback splash vga=795
initrd /boot/initrd.img-2.6.24-17-server

Please try the same. Please note that to make the changes permanent and automatically for all future kernels.. you have modify:
# kopt=root=UUID=b9f4c570-8b44-413d-bcc8-300f0a0890f9 ro combined_mode=libata clocksource=acpi_pm elevator=deadline rootflags=data=writeback
and run update-grub

@tobiaz...anil.. thanks for helping me realize the same.

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2008-05-30:

#118

I tried the server (Ubuntu 2.6.24-17.31-server) kernel with all schedulers.
It's much better, but it's the performance like in gutsy. Working with vmware is still awful. Mouse freezes, text delays, long start times of apps. Feisty is two times faster on my old pentium-m machine, than my core2duo on gutsy or hardy (Ubuntu 2.6.24-17.31-server).

I have recognized, that this is not an hard drive issue only. I can reproduce the problem with high network io too.

Revision history for this message

Rocko (rockorequin) wrote on 2008-05-31:

#119

I retried my 2.8GB copy tests on the -generic kernel after manually switching to deadline scheduling and back to cfq, and for me the desktop is definitely more responsive under deadline (note: I didn't try writeback or any other settings, just the scheduling).

In one test with deadline scheduling firefox didn't grey out at all, and in another it greyed out twice but only for half a second or so (and nautilus greyed out once for a couple of seconds). With cfq, firefox was greying out for five to ten seconds at a time. Irrespective of which scheduling I choose, the average throughput reported by nautilus is the same.

Thanks to Anil, Tobias, and Ravindran for the info about how to change the scheduling.

@Thomas Pi: I actually don't normally have problems with vmware-server 1.05 (XP runs at a similar speed to natively), but I find that occasionally the VM will get itself into a state where it thrashes the disk whenever I try to do something (even scrolling), and at this point it becomes unusable like you say. Sometimes vmware-server does this when I first boot the VM, in which case it takes forever to boot. Whenever it happens, a power reset from the VMWare menu fixes the problem. So maybe there's a separate bug in vmware that is making it look like this bug?

Revision history for this message

Anil (anilkumar-as) wrote on 2008-05-31:

#120

Well is this some kinda problem with kjournlad ? It seems to have happened in past releases as well. Some googeling shows it has started with gutsy upgrade.

Can any body tell me why this bug is "Medium" ? This has made by system unusable. I think it is the same with you guys.

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2008-06-02:

#121

I build the generic kernel without the "Fair group CPU scheduler", "Tickless System (Dynamic Ticks)" and "High Resolution Timer Support". My systems seems to be faster. Can someone check it?
My application startup speed is now between feisty and gutsy/hardy. Firefox is still unusable on high io access.
I will try a 100Hz version and check some more options from the server kernel too.

Revision history for this message

Milan Bouchet-Valat (nalimilan) wrote on 2008-06-02:

#122

Please notice also that Firefox 3 has a problem with fsync that makes it hang when I/O load is high, and this bug was not in Gutsy. So be careful when you compare slowness based only on Firefox.

Revision history for this message

Rocko (rockorequin) wrote on 2008-06-04:

#123

Have there been any beneficial changes in the 2.6.24-18 kernel?

I installed it this morning (still using deadline scheduling) and just tried copying six 2GB files within the same ext3 partition. The desktop apps didn't slow down significantly or grey out at all (I was running FF3 RC1, a vmware VM, and Thunderbird). In fact, the "file operations" window greyed out briefly while I was opening a new nautilus window. In the past, the new nautilus window (and FF3 and Thunderbird) used to grey out instead.

So hopefully the new kernel is working better...

I also recently switched to the 32bit kernel, but I don't think the 32bit 2.6.24-17 kernel was any different from the amd64 one with respect to desktop responsiveness under disk I/O load.

Revision history for this message

Austin Lund (austin-lund) wrote on 2008-06-07:

#124

Using the 2.6.24-19-generic in hardy-proposed with CFQ works fine for me now. Unzipping the kernel tarball and compiling the kernel don't affect responsiveness at all, and only very slightly in FF3.

Revision history for this message

Anil (anilkumar-as) wrote on 2008-06-08:

#125

I have installed the new kernel 2.6.24-19-generic. Doesn't seem to make any difference, I still wait for a long time when i switch windows :(

Revision history for this message

Rocko (rockorequin) wrote on 2008-06-09:

#126

I find that the desktop is much more responsive under heavy disk I/O with either 2.6.24-18-generic or 2.6.24-19-generic (64 bit and 32 bit) when you compare it to 2.6.24-17-generic and earlier. Both deadline and cfq scheduling work fine.

My test was to start my 2.8GB test file copy from one sda partition to another and then try opening lots of webpages in different tabs in FF3RC1, opening lots of new nautilus windows from the gnome 'Places' menu, and starting up a number of new apps. The desktop runs slower than if you aren't copying a large file, and once an open Thunderbird window greyed out briefly, but the desktop is definitely usable now, and it wasn't in the original Hardy release.

I do notice the following though: if you start the copy and leave everything alone for a few seconds (eg around 400MB, when the file operations window tells you its throughput) and then try to switch between windows, sometimes there's a delay in the desktop responding. It's just not as long as it used to be, and it only seems to happen once for me.

Revision history for this message

Tobias Oetiker (tobi-oetiker) wrote on 2008-06-09:

#127

Not that this helps much for a laptop setup. But since I think that
the problem is more deeply rooted than this. I tried what happens
when the ext3 journal is kept on a fast external device ... It
seems to take the pressure of the vm, so that its fairness bugs do
not hurt much anymore.

http://insights.oetiker.ch/linux/external-journal-on-ssd.html

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten
http://it.oetiker.ch <email address hidden> ++41 62 213 9902

Revision history for this message

Ravindran K (ravindran-k) wrote on 2008-07-04:

#128

Simple script to run tests Edit (1.2 KiB, text/x-sh)

Greetings..

I m convinced that I no longer face the issue.
However, after various simple tests I observe that I get best disk performance in kernel 2.6.24-18-server.
I guess the UI responsiveness should be good in this kernel as well.

In other kernels, either my SATA drives are faster but IDE drives run lot slower or vice versa. Attaching some logs.

Revision history for this message

Ravindran K (ravindran-k) wrote on 2008-07-04:

#129

diskperf_2.6.24-17-server.txt Edit (2.2 KiB, text/plain)

Revision history for this message

Ravindran K (ravindran-k) wrote on 2008-07-04:

#130

diskperf_2.6.24-18-server.txt Edit (3.3 KiB, text/plain)

Revision history for this message

Ravindran K (ravindran-k) wrote on 2008-07-04:

#131

diskperf_2.6.24-19-server.txt Edit (3.6 KiB, text/plain)

Revision history for this message

Ravindran K (ravindran-k) wrote on 2008-07-04:

#132

diskperf_2.6.25-1-server.txt Edit (5.4 KiB, text/plain)

Revision history for this message

Ravindran K (ravindran-k) wrote on 2008-07-04:

#133

diskperf_2.6.25-rc8-custom0.txt Custom kernel Edit (1.1 KiB, text/plain)

I get excellent performance in this custom kernel, but unfortunately unable to use VMWare 2.0 under this kernel. Sad :(

Revision history for this message

laga (laga) wrote on 2008-07-04:

#134

Ravindran K schrieb:
> I get excellent performance in this custom kernel, but unfortunately
> unable to use VMWare 2.0 under this kernel. Sad :(
>
> ** Attachment added: "diskperf_2.6.25-rc8-custom0.txt Custom kernel"
> http://launchpadlibrarian.net/15828991/diskperf_2.6.25-rc8-custom0.txt
>
Can you post the .config? Or at least tell us what you changed in the
kernel?

Revision history for this message

Ravindran K (ravindran-k) wrote on 2008-07-04:

#135

config-2.6.25-rc6.txt Edit (57.6 KiB, text/plain)

Yes. Sure.. Actually I tried to Customize the Kernel for my motherboard Intel DG33TL (+my older mobo ASUS p2bVT) removing all other unnecessary drivers.
here it is..

I was trying to enable 64-bit kernel and I have 4 Gb ram (older 32bit kernels show up only 3 GB). After I found the 2.6.24-x-server kernels, I stopped trying.
Moreover, as I said, the 2.6.25.x kernel, i'm unable to run VMWare.

Revision history for this message

Martin (martin615) wrote on 2008-07-25:

#136

FWIW, delayed allocation was added for ext4 in the 2.6.27 merge window.

Revision history for this message

Tom Badran (tom-badran) wrote on 2008-07-25:

#137

Does this mean this will likely ship with intrepid?

On Fri, Jul 25, 2008 at 5:34 PM, Martin <email address hidden> wrote:

> FWIW, delayed allocation was added for ext4 in the 2.6.27 merge window.
>
>

Revision history for this message

tomaszr (tomasz-rosinski) wrote on 2008-08-27:

#138

I confirm, this is very hard life when disk i/o i all time occupied :(
i was copied some files (dvd.iso 4GB) and i cannot make nothing :( hard life....

what can i change or when this can be done?

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2008-08-28:

#139

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message

Ravindran K (ravindran-k) wrote on 2008-08-30:

#140

I think the desktop reponsiveness is OK atleast for me

OT: OLD IDE performance IO still has come down in new kernel (2.6.27-1-server) [it was upto 45 MB/s in 2.6.26-5-server]

2.6.27-1-server

Date & Time:
Sat Aug 30 07:55:04 IST 2008
----------------------------------------------------------------------------------------------
SATA 250 GB HDD

/dev/sda:
Timing cached reads: 8118 MB in 2.00 seconds = 4066.33 MB/sec
Timing buffered disk reads: 252 MB in 3.02 seconds = 83.35 MB/sec
----------------------------------------------------------------------------------------------
IDE 160 GB HDD

/dev/sdb:
Timing cached reads: 7034 MB in 2.00 seconds = 3523.05 MB/sec
Timing buffered disk reads: 96 MB in 3.03 seconds = 31.71 MB/sec
----------------------------------------------------------------------------------------------
IDE 250 GB HDD

/dev/sdc:
Timing cached reads: 6638 MB in 2.00 seconds = 3324.28 MB/sec
Timing buffered disk reads: 230 MB in 3.02 seconds = 76.04 MB/sec
----------------------------------------------------------------------------------------------
USB 160 GB HDD

/dev/sdd:
Timing cached reads: 6438 MB in 2.00 seconds = 3223.58 MB/sec
Timing buffered disk reads: 100 MB in 3.02 seconds = 33.06 MB/sec

*************************************************************
2.6.26-5-server
Date & Time:
Wed Aug 27 20:42:10 IST 2008
----------------------------------------------------------------------------------------------
SATA 250 GB HDD

/dev/sda:
Timing cached reads: 6746 MB in 2.00 seconds = 3378.94 MB/sec
Timing buffered disk reads: 244 MB in 3.01 seconds = 80.97 MB/sec
----------------------------------------------------------------------------------------------
IDE 160 GB HDD

/dev/sdb:
Timing cached reads: 6494 MB in 2.00 seconds = 3252.63 MB/sec
Timing buffered disk reads: 138 MB in 3.01 seconds = 45.79 MB/sec
----------------------------------------------------------------------------------------------
IDE 250 GB HDD

/dev/sdc:
Timing cached reads: 6600 MB in 2.00 seconds = 3305.97 MB/sec
Timing buffered disk reads: 230 MB in 3.00 seconds = 76.57 MB/sec

******************************************************************

I think the desktop reponsiveness is OK atleast for me

OT: OLD IDE performance IO still has come down in new kernel (2.6.27-1-server) [it was upto 45 MB/s in 2.6.26-5-server]

2.6.27-1-server

Date & Time: 
Sat Aug 30 07:55:04 IST 2008
----------------------------------------------------------------------------------------------
SATA 250 GB HDD

/dev/sda:
 Timing cached reads:   8118 MB in  2.00 seconds = 4066.33 MB/sec
 Timing buffered disk reads:  252 MB in  3.02 seconds =  83.35 MB/sec
----------------------------------------------------------------------------------------------
IDE 160 GB HDD

/dev/sdb:
 Timing cached reads:   7034 MB in  2.00 seconds = 3523.05 MB/sec
 Timing buffered disk reads:   96 MB in  3.03 seconds =  31.71 MB/sec
----------------------------------------------------------------------------------------------
IDE 250 GB HDD

/dev/sdc:
 Timing cached reads:   6638 MB in  2.00 seconds = 3324.28 MB/sec
 Timing buffered disk reads:  230 MB in  3.02 seconds =  76.04 MB/sec
----------------------------------------------------------------------------------------------
USB 160 GB HDD

/dev/sdd:
 Timing cached reads:   6438 MB in  2.00 seconds = 3223.58 MB/sec
 Timing buffered disk reads:  100 MB in  3.02 seconds =  33.06 MB/sec

*************************************************************
2.6.26-5-server
Date & Time:
Wed Aug 27 20:42:10 IST 2008
----------------------------------------------------------------------------------------------
SATA 250 GB HDD

/dev/sda:
 Timing cached reads:   6746 MB in  2.00 seconds = 3378.94 MB/sec
 Timing buffered disk reads:  244 MB in  3.01 seconds =  80.97 MB/sec
----------------------------------------------------------------------------------------------
IDE 160 GB HDD

/dev/sdb:
 Timing cached reads:   6494 MB in  2.00 seconds = 3252.63 MB/sec
 Timing buffered disk reads:  138 MB in  3.01 seconds =  45.79 MB/sec
----------------------------------------------------------------------------------------------
IDE 250 GB HDD

/dev/sdc:
 Timing cached reads:   6600 MB in  2.00 seconds = 3305.97 MB/sec
 Timing buffered disk reads:  230 MB in  3.00 seconds =  76.57 MB/sec

******************************************************************

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2008-09-08:

#141

I was not able to test the alpha 5 on my notebook. I will start another try soon.

But I have a workaround for all, who cannot work on their systems. I am currently using Fedora 9 with the RHEL kernel (CentOS 2.6.18-92.1.10.el5) and have a speed up of 10 and more. It's great to have all advantage of a up to date user interface and tools, and a stable and fast kernel. Everything works fine on my one year old T61p. I have no sound problems with vmware. I can even use firefox at load average of 12 and more.

I think hardy users can use the debian kernel as well. There should be all modules available for the debian kernel too. At least as third party repository.

Perhaps the ubuntu team can put a kernel option with an older kernel in their repositories, as long as the problem is not solved.

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2008-09-12:

#142

Now I have made some tests with itrepid. The io wait time is lower and the throughput is higher with and without concurrent disc access as in hardy or gutsy. But the desktop responsiveness problem still exists.
The overall throughput of concurrent disc access is about 30% lower than on my 2.6.18 kernel.

When writing eight 2GB files concurrent on the disc, there is difference between the written data during the operation is up to 500 MB. All writing operations start at the same time. This difference is much lower (~200MB) under the 2.6.18 kernel

The kernel signature is Ubuntu 2.6.27-2.3-generic.

My tests results are only simulated once, because I was not able to use vmware or virtualbox and create a real working environment.

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2008-10-21:

#143

I have some new information on this topic. I tried to bypass the problem by using a fast SSD, but the desktop responsiveness becomes horrible. I think it's because I get only a write throughput of 20MB/s on sequential write access on the block devices. After some research, I got some new information. The problem is caused because there is no fair scheduling between read and write access.

https://fcp.surfsite.org/modules/newbb/viewtopic.php?viewmode=thread&topic_id=52598&forum=10&post_id=247938

After some more tests. I got these results.

# dd if=/dev/zero of=/dev/sda6 bs=1M count=1500
1572864000 Bytes (1,6 GB) kopiert, 57,5956 s, 27,3 MB/s
And poor desktop responsiveness.

# dd if=/dev/zero of=/dev/sda6 bs=1M count=1500 oflag=direct
1572864000 Bytes (1,6 GB) kopiert, 20,9958 s, 74,9 MB/s
And even firefox does not freeze.

Revision history for this message

Jeffery Davis (heavensblade23) wrote on 2008-10-22:

#144

Still very much present in Ibex as of 10-22-2008.

Revision history for this message

Jeffery Davis (heavensblade23) wrote on 2008-10-22:

#145

Changing to the deadline scheduler appears to alleviate this bug for the most part.

Revision history for this message

Bálint Magyar (balintm) wrote on 2008-10-23:

#146

I can confirm that running Intrepid on a notebook with 512MB of RAM is much, much tolerable with elevator=deadline, mostly getting rid of the long pauses the heavy swapping caused.

Revision history for this message

isecore (isecore) wrote on 2008-10-24:

#147

Running Intrepid Ibex, same issue as in Hardy. Desktop goes numb when heavy disk I/O occurs. Changing scheduler to deadline makes it slightly more tolerable at the cost of applications and desktop feeling slower. Unacceptable. Changing scheduler to elevator=as makes system intolerably sluggish.

Revision history for this message

Jeffery Davis (heavensblade23) wrote on 2008-10-24:

#148

Things I can reasonably confirm are not the cause:
-Hardware drivers (I've had this problem across several different machines)
-Versions of Ubuntu (I've been having this bug at least since Hardy and I believe before that)
-Different distros (I found a forum thread where people were having the same issue on Fedora)
-Dist-upgrade (I always install fresh)
-Search Indexing (Problem occurs even with indexing completely disabled...if there is an issue, it's a symptom and not a cause)
-Firefox versions (Fsync bug was fixed a long time ago, and people have tried reverting to Firefox 2 without success)
-Filesystem in use (People have reported the problem on ReiserFS as well as ext3)

Reasons I think it's the scheduler:
-It was reported switching schedulers helped this problem on Fedora which also uses CFQ.
-I've tried both deadline and anticipatory schedulers and then copied large numbers of files and the system was 95% more responsive.
-People first started reporting this problem on the Ubuntu forums around the time CFQ was switched to the default scheduler, which I believe was around the Edgy/Feisty timeframe.
-People have reported the problem doesn't exist on the 'server' version of Ubuntu, which uses the deadline scheduler instead of CFQ.

There may be other issues at work that cause similar unresponsiveness, but I think the scheduler thing is it for most people.

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2008-10-26:

#149

I don't think, that it is not a scheduler problem only. Switching the scheduler does not result in significant changes. The desktop responsiveness is still bad with all schedulers. The server kernel is less affected, but the desktop responsiveness is still bad.

Fedora and ArchLinux are less affected, but the problem still exists. It's perhaps like using the server kernel in Ubuntu.

For me the problem first appears in Gutsy. Feisty has a great desktop responsiveness on heavy io on my old machine with the cfq scheduler. CentOS with the 2.6.18 kernel and the cfq scheduler works really great.

Once I have produced the same issue with a crash of network manager while transferring a big file through a wlan connection. I think there was not logging io, but I am not sure.

Are there only a few people affected by the problem? It makes my system nearby unusable, while e.g. updating Intrepid. There are sometimes desktop freezes for more than 20 seconds on my machine.

Revision history for this message

AvitarX (ddwornik) wrote on 2008-10-27:

#150

I have been looking at this and running informal (read sloppy) tests today.

For the people upgrading from older systems perhaps the problem is with relatime in the fstab being missing.

I was testing by running "vi 123" while "dd if=/dev/zero of=~/test2 bs=2M count=2048&"

I tried all 4 schedulers and all took a long while (30 secs+) to an active vi screen.

Now it is below 10.

I tried different schedulers and they were all slow. Now CFQ (default?) is what I am using (have not compared if less than 10 seconds will drop to less than 5).

The system-cleaner (Applications --> System Tools --> System Cleaner) identified the lack of relatime for me.

It could also simply be rebooting that caused the speed up too though.

It is only speculation that the upgrade missed that, but I may have removed it myself by accident at some point.

To change schedulers without rebooting:

as root (run "sudo bash", or someone correct how to redirect while using sudo)

echo "scheduler name" > /sys/block/sdx/queue/scheduler (where x is the drive e.g. /sda).

for a list of schedulers and what is selected run cat on the same file

example:
$cat /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq]

I am not at my desktop, so I can't confirm weather this is simply for command line and I will see similar problems to Jamie above.

In a few days I can follow-up if long uptime (for a desktop) is the problem.

Revision history for this message

Irrlicht (irrlicht) wrote on 2008-10-27:

#151

irrlicht@home:~$ cat /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq]
irrlicht@home:~$ sudo echo deadline > /sys/block/sda/queue/scheduler
bash: /sys/block/sda/queue/scheduler: Permission denied

I am copying a dvd full of data to my local drive via cp. My mouse and all Desktop apps are lagging like if I would use glx without the proper drivers (slow responsiveness and screen update). Using Kubuntu 8.10 amd64... Can I help in any way? Or someone has an idea how to change the scheduler in my case?

Cheers,
Daniel

Revision history for this message

AvitarX (ddwornik) wrote on 2008-10-27:

#152

I don't know how to redirect with sudo.

I had to run:
sudo bash

this gave a root prompt, then I could run:
echo deadline > /sys/block/sda/queue/scheduler

without running sudo again (running bash with sudo makes the prompt root until you type exit).

I am curious about your fstab though, that was what really changed things for me.

Also, to you have lots of small files or a few big ones? I can try testing. It was primarily running updates, and the daily file indexing that killed me.

Revision history for this message

Irrlicht (irrlicht) wrote on 2008-10-27:

#153

No there are only big files on the DVD. 4.5 GB of files ~250 MB. I changed to all available schedulers now, it doesn't change anything. I noticed this the first time today, so I searched Google and found this bug.

What did you change in your fstab? Is it fixed for you?

My /etc/fstab:
root@home:~/# cat /etc/fstab
# /etc/fstab: static file system information.
#
# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc defaults 0 0
# /dev/sdb2
UUID=3a3e3337-0416-46f4-9336-253bd7dfbeac / ext3 relatime,errors=remount-ro 0 1
# /dev/sdb1
UUID=f3eba35a-325c-41db-bebe-03680a0b1f89 /boot ext3 relatime 0 2
# /dev/sda1
UUID=a665a132-3023-4b70-b1f6-c38120307a6a /home ext3 relatime 0 2
# /dev/sdb3
UUID=27c83b47-7ec7-4781-a2c5-d900291b92d4 none swap sw 0 0
/dev/scd0 /media/cdrom0 udf,iso9660 user,noauto,exec,utf8 0 0

Cheers,
Daniel

Revision history for this message

John (john-m-lang) wrote on 2008-10-28:

#154

You should always use 'sudo -i' to get a root prompt rather than 'sudo
bash', 'sudo su -', or any other method.

On Mon, Oct 27, 2008 at 4:02 PM, Irrlicht <email address hidden> wrote:

> No there are only big files on the DVD. 4.5 GB of files ~250 MB. I
> changed to all available schedulers now, it doesn't change anything. I
> noticed this the first time today, so I searched Google and found this
> bug.
>
> What did you change in your fstab? Is it fixed for you?
>
> My /etc/fstab:
> root@home:~/# cat /etc/fstab
> # /etc/fstab: static file system information.
> #
> # <file system> <mount point> <type> <options> <dump> <pass>
> proc /proc proc defaults 0 0
> # /dev/sdb2
> UUID=3a3e3337-0416-46f4-9336-253bd7dfbeac / ext3
> relatime,errors=remount-ro 0 1
> # /dev/sdb1
> UUID=f3eba35a-325c-41db-bebe-03680a0b1f89 /boot ext3 relatime
> 0 2
> # /dev/sda1
> UUID=a665a132-3023-4b70-b1f6-c38120307a6a /home ext3 relatime
> 0 2
> # /dev/sdb3
> UUID=27c83b47-7ec7-4781-a2c5-d900291b92d4 none swap sw
> 0 0
> /dev/scd0 /media/cdrom0 udf,iso9660 user,noauto,exec,utf8 0 0
>
> Cheers,
> Daniel
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2008-10-28:

#155

There is a new article on Phoronix, which compares the performance of different ubuntu versions (Feisty - Intrepid).
(see http://www.phoronix.com/scan.php?page=article&item=ubuntu_bench_2008&num=1 )

There is a huge difference between Feisty and the following version in the "Ram sequential read" test ( 3100 MB/s for Feisty and about 1800 MB/s for the other version). Perhaps the poor desktop performance is related with the issue.
(see http://www.phoronix.com/scan.php?page=article&item=ubuntu_bench_2008&num=3 )

Revision history for this message

dr4cul4 (dr4cul4) wrote on 2008-11-19:

#156

Using current build of Intrepid Ibex kernel (at the moment of writing) problem still exist. I have 3 machines running Ubintu, on allof them there is high unresponcivenes during disk activity. 2 machines are laptops with sis and intel chipsets with IDE drives. Big machine is VIA with SATA and IDE drive. Last one has issues only during havy disk activity, but laptops have this issue all the time, For example each right click on desktop takes about 4 seconds to show up menu, and a lot of disk reading (using noatime partly solved that... only first time takes long, next are fast). This issue is driving me crazy, will someone deaply investigate it?

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2008-11-30:

#157

I have uses three different hard drives on the same machine. A Seagate Momentus 7200.2 with a throughput about 70MB/s, a Western Digital Scorpio WD2500BEVE with throughput about 60MB/s and a OCZ CoreSeries 64GB with throughput about 75MB/s and real write performance of 25MB/s.

The desktop responsiveness with the Seagate is bad, with WD is awful and unusable with the OCZ.

What's wrong with the current kernel versions? Has someone equal problems on a SCSI system?

Revision history for this message

Luka Renko (lure) wrote on 2008-12-03:

#158

Thomas Pi, first I need to thank you for very detailed testing you have performed. I can more or less confirm the same on my HP nw8440: feisty was the last version that worked nicely on my laptop. Even latest intrepid release did not help.

I notice that when machine get's unresponsive, that most of CPU (and this means both cores here) is occupied by io-wait. You can best see this with "htop" utility - just install it from the repository.
Also, during this "storm of load", CPU load gets to 8-10, resulting in high heat of my laptop and heavy work on the fan. "acpi -t" shows temperature around 80C, which is much more than I want to see on average.

Since I expect that this is not just scheduler related, but may be also something with HW, I would like to know what kind of IDE/SATA controllers do you have. I have ICH7.

I agree that priority should be increased and this should get more attention from kernel developers.

Revision history for this message

exactt (giesbert) wrote on 2008-12-03:

#159

As you are asking for IDE/SATA controllers. I have a
SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA .
I first experienced the problem when I enabled AHCI mode in BIOS.

I am running Intrepid AMD64

dmesg | grep ahci
[ 2.480262] ahci 0000:00:12.0: version 3.0
[ 2.480288] ahci 0000:00:12.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
[ 2.480315] ahci 0000:00:12.0: controller can't do 64bit DMA, forcing 32bit
[ 2.480415] ahci 0000:00:12.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode
[ 2.480418] ahci 0000:00:12.0: flags: ncq sntf ilck led clo pmp pio

Revision history for this message

Milan Bouchet-Valat (nalimilan) wrote on 2008-12-03:

#160

If you have some ideas about the cause of this bug, I suggest you file a report directly on bugzilla.kernel.org, you'll get attention from people that have (relatively more) time to work on it. This is not likely to be a bug specific to Ubuntu.

Revision history for this message

Luka Renko (lure) wrote on 2008-12-03:

#161

Two upstream bug reports that may be related:
1. http://bugzilla.kernel.org/show_bug.cgi?id=7372
2. http://bugzilla.kernel.org/show_bug.cgi?id=12072

First one has very similar pattern of when regression was first detected. Second one has similar symptoms, but not clear if older kernels were ok.

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2008-12-04:

#162

I was using a ICH4-M chipset and currently a ICH8-M system. The I/O wait is at 100% when, the problem occurs.
This issues occurs on my desktop machine with an AMD 790G chipset too. I know someone, who uses a VIA KT800 and is affected too by this bug. But as he uses his computer only for office work, it does not appear as often as on my machines.

At http://bugzilla.kernel.org/show_bug.cgi?id=7372 there are at least three different bugs. The initial bugs should be solved (see https://bugzilla.redhat.com/show_bug.cgi?id=444759).

The 64 bit problem, cannot cause the desktop responsiveness issue, because I have used a Pentium M, when I have switched from Feisty to Gutsy.

And it is right, that this is not a bug specific to Ubuntu, but Ubuntu seems to be most affected by the bug. The desktop responsiveness with Fedora 9 and 10 is much better on my machine.

I have done some research and tests. Here are some interesting results.

As the desktop performance in centos becomes "worse", when updating the from kernel-2.6.18-92.1.10.el5 to kernel-2.6.18-92.1.13.el5, I have checked the differences. One patch "linux-2.6-fs-dio-use-kzalloc-to-zero-out-struct-dio.patch" was applied to the Gusty kernel the first time. I know, that the patch should not have affected the performance.
I have unpatched the fs/direct-io.c file in my Hardy test installation (790G chipset) (kernel 2.6.24-22-generic) and made some tests.
Compared to the pachted kernel, I have >>mostly<< startup speedups of applications up to 10. The boot process takes 45 seconds instead of 60 seconds. During heavy write IOs and loadavg of 18, I could sometimes "use" even firefox (maximal 2-3 seconds of freezes). The start of gimp takes sometimes only 30 seconds instead of 140 seconds, but mostly. I have made these tests more than five times and used an equal process for testing the two kernels. Always these differences.
But when copying a file, instead of executing eight concurrent dd writing operation. Firefox freezes immediately and it takes about a minute to connect over ssh. Although there is a load avg of only 2. This should be the problem, which is caused by not fair scheduling between read and write access. It was described somewhere in the thread, but i cannot access the thread anymore.
https://fcp.surfsite.org/modules/newbb/viewtopic.php?viewmode=thread&topic_id=52598&forum=10&post_id=247938

The I tried to simulate a slower harddisk (ICH8-M), and installed hardy on a full encrypted disc. Which reduces the write speed to 40MB/s. The were no more differences between the patched and unpatched kernels. Both were unusable.

The bug is affected by different timings. As there must be an threshold of drive speed, at which the system switches from bad to unusable. That's must be the reason, why ubuntu kernels are more affected as the fedora onces.

This bug is annoying. Please help the kernel team to solve this bug.

I was using a ICH4-M chipset and currently a ICH8-M system. The I/O wait is at 100% when, the problem occurs.
This issues occurs on my desktop machine with an AMD 790G chipset too. I know someone, who uses a VIA KT800 and is affected too by this bug. But as he uses his computer only for office work, it does not appear as often as on my machines.

At http://bugzilla.kernel.org/show_bug.cgi?id=7372 there are at least three different bugs. The initial bugs should be solved (see https://bugzilla.redhat.com/show_bug.cgi?id=444759).

The 64 bit problem, cannot cause the desktop responsiveness issue, because I have used a Pentium M, when I have switched from Feisty to Gutsy.

And it is right, that this is not a bug specific to Ubuntu, but Ubuntu seems to be most affected by the bug. The desktop responsiveness with Fedora 9 and 10 is much better on my machine.

I have done some research and tests. Here are some interesting results.

As the desktop performance in centos becomes "worse", when updating the from kernel-2.6.18-92.1.10.el5 to kernel-2.6.18-92.1.13.el5, I have checked the differences. One patch "linux-2.6-fs-dio-use-kzalloc-to-zero-out-struct-dio.patch" was applied to the Gusty kernel the first time. I know, that the patch should not have affected the performance.
I have unpatched the fs/direct-io.c file in my Hardy test installation (790G chipset) (kernel 2.6.24-22-generic) and made some tests. 
Compared to the pachted kernel, I have >>mostly<< startup speedups of applications up to 10. The boot process takes 45 seconds instead of 60 seconds. During heavy write IOs and loadavg of 18, I could sometimes "use" even firefox (maximal 2-3 seconds of freezes). The start of gimp takes sometimes only 30 seconds instead of 140 seconds, but mostly. I have made these tests more than five times and used an equal process for testing the two kernels. Always these differences.
But when copying a file, instead of executing eight concurrent dd writing operation. Firefox freezes immediately and it takes about a minute to connect over ssh. Although there is a load avg of only 2. This should be the problem, which is caused by not fair scheduling between read and write access. It was described somewhere in the thread, but i cannot access the thread anymore.
https://fcp.surfsite.org/modules/newbb/viewtopic.php?viewmode=thread&topic_id=52598&forum=10&post_id=247938

The I tried to simulate a slower harddisk (ICH8-M), and installed hardy on a full encrypted disc. Which reduces the write speed to 40MB/s. The were no more differences between the patched and unpatched kernels. Both were unusable.

The bug is affected by different timings. As there must be an threshold of drive speed, at which the system switches from bad to unusable. That's must be the reason, why ubuntu kernels are more affected as the fedora onces.

This bug is annoying. Please help the kernel team to solve this bug.

Revision history for this message

Luka Renko (lure) wrote on 2008-12-06:

#163

Interesting that you mention it: I have started to notice this slowdown much more with hardy, and this is also the time when I switched to almost fully encrypted disk on my laptop. It may be that kcryptd is making it worse...

Revision history for this message

yarly (ih8junkmai1) wrote on 2008-12-22:

#164

Luka,

I've noticed general sluggish performance when using dm-crypt/kcryptd for a fully encrypted disk (minus boot partition).

@Thomas, I can try the 'time cat..' line later but I don't think it will reveal information other than that catting the SATA harddisk is probably faster, because the drive is generally faster (higher capacity per platter, more cache).

@KhaaL, I must say that for the last test I used Anticipatory as queueing mechanism, I found that out jsut before rebooting, but both the PATA and SATA harddisks were set on Anticipatory, and both had the same amount of read ahead set, at that time 4096.

Native Command Queueing can be changed by changing the value of 'queue_depth' for a specific drive. You can find it like this:

cd /sys
find |grep queue_depth

Now the system will report a file like 
./devices/pci0000:00/0000:00:0e.0/host2/target2:0:0/2:0:0:0/queue_depth

I will post my findings in

If you look inside that file you can see the value it is on (just 'cat' it) and you can change the value by something like

echo 1 > ./devices/pci0000:00/0000:00:0e.0/host2/target2:0:0/2:0:0:0/queue_depth

If it is already 1, your drive might not support NCQ.

I will post my findings in the other thread later. What I can also try is to make an exact copy of the contents of the SATA drive to an other SATA or PATA drive laying around and see if the sluggishness is different.

@Milan: you might think that the system on SATA is more sluggish than on PATA as it contains the system files, but even just alt-tabbing through windows is sluggish. If I do this with no I/O load on the drive this tabbing through programs does not interact with the disk as all programs are in memory and swap is turned off. When I start the load on the PATA drive the system remains responsive and windows just appear when I alt-tab, perhaps with a short delay, but that is ok. When I start the load on the SATA drive however, the alt-tab may take seconds to complete before the selected window appears. That's strange isn't it? To make sure I want to do the test mentioned above by 'dd-ing' the full SATA drive to the PATA or some other PATA/SATA drive and do the same test on the drive the system boots from.

@Jamie: NCQ might be horrible on the drive, afaik it is one of the first 500 GB 7200.10 drive from Seagate. I can try to update its firmware but if that removes the problem I cannot help out anymore ;). On my 7200.11 1 TB drives (also one of the first 1 TBs on the market) I also disabled NCQ because I found some thread that it might kill the contents of the file system. If you want I can lookup where I found that. I had a problem with my RAID 5 array in a server with these 1 TB disks and contacted the libata maintainers but I had to restore the bad sector before my array was destroyed by not having a spare disk, so after fixing the bad sector I could not reproduce the problem anymore. In the meantime I switched that server to an older kernel, probably a stock kernel from Gutsy or Feisty.

Hendrik: The fact that work on your SATA drive makes the system sluggish, contrary to the PATA one, is normal since your system files are on that drive. Schedulers only deal with processes competing for the same drive access. If the problem is actually with SATA, the only proof we have is that you only changed your drive to SATA, and nothing else.

I'm now at work so I cannot test, but I must say that I have the exact same configuration over here except for a 5000+ processor instead of a 5200+. This machine however has a SATA disk and I never experienced any sluggishness on this machine, so far running Hardy and Intrepid. So the sluggishness may even be drive specific?

Revision history for this message

Jamie Lokier (jamie-shareable) wrote on 2009-04-22:

#210

Hendrik van den Boogaard wrote:
> @Jamie:
> On my 7200.11 1 TB drives (also one of the first 1 TBs on the
> market) I also disabled NCQ because I found some thread that it
> might kill the contents of the file system. If you want I can lookup
> where I found that.

I'm guessing it's barriers not being implemented or enabled properly
in the kernel? (See ext3 "barrier=1" off by default, controversial
threads about it..) Even if the filesystem does barriers, Linux
software RAID does not. If it's not that, I'm interested in why NCQ
should be disabled. In principle, if used right, it should always be
an improvement or about the same, and it would be quite bad firmware
to fail at that.

> This machine however has a SATA disk and I never experienced any
> sluggishness on this machine, so far running Hardy and Intrepid. So the
> sluggishness may even be drive specific?

It might. There are tools, such as blktrace, which can help diagnose
if it's the drive if you know how to read the output. It is quite
outside what I have time for though :-)

-- Jamie

Revision history for this message

Rocko (rockorequin) wrote on 2009-04-28:

#211

I get the same behaviour as I reported earlier (https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/131094/comments/192) with the vanilla kernel 2.6.30.rc3. Once the system tries to use swap memory, the disk starts thrashing heavily and X basically freezes.

Is this behaviour related to this particular bug, or is it something else? I'm finding that X becomes unusable and I have to hard reset the PC.

Revision history for this message

Milan Bouchet-Valat (nalimilan) wrote on 2009-04-28:

#212

That must be the same bug, or at least member of that family of bugs, because we still don't know how many different issues there are. See http://bugzilla.kernel.org/show_bug.cgi?id=12309#c316 for example.

Bug Watch Updater (bug-watch-updater) on 2009-04-29

Changed in linux:
status:	Confirmed → Invalid

Revision history for this message

daneel (daneel) wrote on 2009-05-01:

#213

I found a (very dirty) workaround: Install an old kernel (yes, its posible, im using Intrepid with Feisty kernel).

I instaled this one:

Image: http://www.michaelhallquist.com/ubuntu-cfs/linux-image-2.6.20.16-ubuntu-cfs-v20.4_Custom_i386.deb
Headers: http://www.michaelhallquist.com/ubuntu-cfs/linux-headers-2.6.20.16-ubuntu-cfs-v20.4_Custom_i386.deb

I found it here: http://ubuntuforums.org/showthread.php?t=538068
Its working very good but i have some problems with the nvidia driver (cant use dkms).
If you don't have sound try:

sudo chmod 777 /dev/snd/*

This nasty bug should be fixed. Thanks for keep working hard guys!

Revision history for this message

Hendrik van den Boogaard (chasake) wrote on 2009-05-01:

#214

For me the problem disappeared completely after a fresh install of Jaunty. I think this is very strange, but two things were different from my original install, where my previous posts were about.

* I did a fresh install of the final version of Jaunty AMD64, not the release candidate
* The first time I installed Jaunty from inside a virtual machine running under Intrepid with Virtualbox, where I actually installed Jaunty on to a real hard disk, from which I rebooted after installation (I did this because I didn't have a blank CD available at that time and this way allowed me to install from the ISO image)
* I now formatted the root/boot partition where Jaunty was installed top, to ext4

Did something in the kernel or kernel settings change between the release candidate and the final version? Is it possible that when installing from within a virtual machine some default settings are different than when installing directly (I can imagine some timer settings are different in a virtual machine, and in a system in virtualbox the CPU is recognized as single core only).

In Windows I can imagine the systems parameters during installation are critical for running the system later, but I though that when booting Linux everything (all hardware) is recognized during startup so I does not matter on which host it is running (as long as the architecture is the same).

I used the exact same hardware and installed to the exact same partition as the first time. I don't think changing from ext3 to ext4 is the key here, because the slowdowns appeared when catting files from an xfs partition (however on the same physical disk as the root partition). When I do this 'cat * > /dev/null' on the large files the machine is not slow anymore, and everything just seems normal and works as it does in 8.10.

So on one hand I am a happy user now, because everything is normal again, but on the other hand I would like to know what the cause of all this was.

Revision history for this message

Paulo J. S. Silva (pjssilva) wrote on 2009-05-01:

#215

In my machine I found a workaround after reading many threads in the subject. If you are using ext3, try changing the data mode. The ext3 filesystem has three modes. The default one is "ordered", the other two are "writeback" and "journal". They differ basically by the amount of information that is written to the journal before the real write to disk, the more information better the recovery from a system crash. The safest mode is journal, followed by ordered and then writeback.

In my machine, if I change the mode from ordered to journal or writeback the slowness under heavy load becomes much more bearable (it is not completely gone, but acceptable). In my case journal mode is the best, even though it is supposed to be the slowest mode (but the safest). I can now use tracker again.

To change the mode of your disk partitions (you need to do it for each partition) use tune2fs. For example

sudo tune2fs -o journal_data /dev/sda6

changes the mode to journal in partition sda6. To change the mode to writeback try

sudo tune2fs -o journal_data_writeback /dev/sda6

and to ordered (the default in Ubuntu)

sudo tune2fs -o journal_data_ordered /dev/sda6

After using tune2fs you need to reboot.

Obs: It seems that writeback may become the default mode in future kernels (or maybe they will use a new mode called guarded). The new kernels are supposed to have lots of fixes in this issue.

Revision history for this message

cornbread (corn13read) wrote on 2009-05-14:

#216

This is happening to me and I have a fresh install of jaunty x64. Is Paulo's solution working for others? Is it safe to try?

I do a lot of large movie transfers. I didn't notice this issue in 8.10 x64 but after 9.04 install performance during large transfer is unbearable. Reminds me of dialup days but for local transfers.

Revision history for this message

KhaaL (khaal) wrote on 2009-05-14:

#217

Changing to writeback mode is not harmful, however it did not help in my
case. I got improved performance but i still have stutters during I/O
activity, the more intense the lower response from the GUI

On Thu, May 14, 2009 at 04:23, cornbread <email address hidden> wrote:

> This is happening to me and I have a fresh install of jaunty x64. Is
> Paulo's solution working for others? Is it safe to try?
>
> I do a lot of large movie transfers. I didn't notice this issue in 8.10
> x64 but after 9.04 install performance during large transfer is
> unbearable. Reminds me of dialup days but for local transfers.
>
> --
> Heavy Disk I/O harms desktop responsiveness
> https://bugs.launchpad.net/bugs/131094
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Invalid
> Status in “linux” source package in Ubuntu: Confirmed
> Status in “linux-source-2.6.22” source package in Ubuntu: Won't Fix
>
> Bug description:
> Binary package hint: linux-source-2.6.22
>
> When compared with 2.6.15 in feisty, heavy disk I/O causes increased iowait
> times and affects desktop responsiveness in 2.6.22
>
> this appears to be a regression from 2.6.15 where iowait is much lower and
> desktop responsiveness is unaffected with the same I/O load
>
> Easy to reproduce with tracker - index the same set of files with 2.6.15
> kernel and 2.6.22 kernel and the difference in desktop responsiveness is
> massive
>
> I have not confirmed if a non-tracker process which does heavy disk i/o
> (especially writing) replicates this yet - will do further investigation
> soon
>

Revision history for this message

cornbread (corn13read) wrote on 2009-06-09:

#218

For the first time since coming to ubuntu with 7.04 I am thinking of switching distros. I do a lot of large file copying and I might as well go get coffee during copies of large data. For the entire duration my computer is so slow it is unusable (even to browse the web)

core2duo e8400 3.0ghz with 4GB ram is brought to its knees when copying a simple 1GB+ file.

Revision history for this message

Ben Gamari (bgamari) wrote on 2009-06-09:

#219

@cornbread

Comments like that really don't help. Moreover, this is a kernel issue that is affecting all distributions across the board; I recently came to Ubuntu from Fedora where it was just as bad.

However, things are looking pretty good for getting this fixed by 2.6.31, which as it stands will be in Karmic. Last month there was a set of patches[1] posted to the LKML reworking the page eviction code to give executable code priority over streaming pages, which should help the thrashing situation quite a bit.

Secondly, there is the Jens Axboe's per-bdi flusher threads which seem to be kicking some serious ass in initial testing[2].

Lastly, there was very recently a breakthrough on the kernel.org incarnation of this bug where Thomas Pilarski's tireless efforts in bisecting the issue finally resulted in some measurably regressing commits. Jens has already looked at the commits in question and it looks very promising that we'll see at least some improvement come of this.

All in all, don't fret, things are looking up. It's certainly a frustrating bug for users and developers alike, but I think the efforts of the community may about to pay off.

- Ben

[1] http://thread.gmane.org/gmane.linux.kernel.mm/33818
[2] http://lkml.org/lkml/2009/5/28/164
[3] http://bugzilla.kernel.org/show_bug.cgi?id=12309#c360

Revision history for this message

pinepain (pinepain) wrote on 2009-06-09:

#220

no solution fore a about 2 years!!! wow! this really cool. but latest
ubuntu distros hung nice without need to copy large data. they just
hangs (windozz way?). sorry.

emm.... maybe i'm wrong, but this bug appears only (or almost always)
in ubuntu-based distros, isn't it?

has anyone tried to reproduce this bug on other distros?

what do u think, may hdd (or other device from/to we try to copy data)
firmware affect on low performance???

Revision history for this message

Bryan Wu (cooloney) wrote on 2009-06-11:

#221

Unfortunately it seems this bug is still an issue. Can you confirm this issue exists with the most recent Jaunty Jackalope 9.04 release - http://www.ubuntu.com/news/ubuntu-9.04-desktop . If the issue remains in Jaunty, please test the latest upstream kernel build - https://wiki.ubuntu.com/KernelMainlineBuilds . Let us know your results. Thanks.

-Bryan

Changed in linux (Ubuntu):
status:	Confirmed → Incomplete

Revision history for this message

KhaaL (khaal) wrote on 2009-06-11:

#222

Bryan, this bug is still alive and kicking in Jaunty with 2.6.30 kernel. I've been following this bug on kernel.orgs bugzilla and there seems to have been a breakthrough lately, see this comment: http://bugzilla.kernel.org/show_bug.cgi?id=12309#c366.

Unfortunatly, since I don't know how to compile a kernel, or even how to apply patches, i cannot test it. If someone can guide me through the process or even provide prebuilt kernels, I'd be grateful..

Revision history for this message

Milan Bouchet-Valat (nalimilan) wrote on 2009-06-11:

#223

Bryan, please check what a report is about before asking stock questions. This one is very hard to work out, and work is going on upstream to find where the regression can have been introduced. Asking people to check if it's in Jaunty doesn't make sense, we're not even sure everybody here is experiencing the same problem.

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Revision history for this message

qwerty (escalantea) wrote on 2009-06-11:

#224

Just an idea ... try tunning pdflush parameters ( ... must be root) :

echo 200 > /proc/sys/vm/dirty_writeback_centisecs
echo 400 > /proc/sys/vm/dirty_expire_centisecs

If it works, make the changes permanent by editing /etc/sysctl.conf

Revision history for this message

Bryan Wu (cooloney) wrote on 2009-06-17:

#225

@Milan, I followed this thread for a long time and try to help here. Although I used the stock questions, I want to make sure everyone here know this issue is still remain in all the release. This is a long story, so we need add some checkpoint to let people understand this issue. I will try to provide 2.6.30 kernel + Jen's patches and call for testing.

Thanks

-Bryan

Revision history for this message

Milan Bouchet-Valat (nalimilan) wrote on 2009-06-17:

#226

Providing a testing kernel package would really be great! Then it would make sense to ask people to confirm the bug is still here. Though we have learned that I/O and responsiveness are very difficult to measure - not sure we'll be able to clearly confirm anything on a so large thread... ;-)

Revision history for this message

Ben Gamari (bgamari) wrote on 2009-06-17:

#227

@Bryan, @Milan, It is unlikely that Jens' bdi patches will substantially affect the issue. It appears that the problem is in large part due to poor eviction choices on the part of the VM system. There are some patches in mm to fix this. See my previous comment. If you are going to put together a testing kernel you would be much better off trying these I believe.

Revision history for this message

Jamie Lokier (jamie-shareable) wrote on 2009-06-17:

#228

Milan Bouchet-Valat wrote:
> Providing a testing kernel package would really be great! Then it would
> make sense to ask people to confirm the bug is still here. Though we
> have learned that I/O and responsiveness are very difficult to measure -
> not sure we'll be able to clearly confirm anything on a so large
> thread... ;-)

There are several reports that copying a large file always make the
desktop very slow, so that should be simple to test - on those systems.

Revision history for this message

Jared Wiltshire (jaredwiltshire) wrote on 2009-07-01:

#230

Is this the same bug that is being talked about here?
http://ubuntuforums.org/showthread.php?t=1152176

People in that thread indicate they only have problems when using AHCI mode SATA disks.

Revision history for this message

Igor Lautar (igorl) wrote on 2009-07-01:

#231

Just changed from AHCI to IDE in BIOS (HP 8530w).

Initial feeling is that it makes a huge difference (for the better).

Revision history for this message

exactt (giesbert) wrote on 2009-07-01:

#232

for me also the problem appeared when turning on AHCI.

check out https://bugs.launchpad.net/bugs/343371 .

Revision history for this message

Petter (pettno) wrote on 2009-07-02:

#233

Jared, Igor and Exactt and others with problems introduced in recent Ubuntu versions: This bug is related to something introduced in Gutsy (version 7.10). I still (!) have problems related to this. Please move along, follow bug #343371 or create your own bug reports.

Revision history for this message

ReneS (mail-03146f06) wrote on 2009-07-05:

#234

AHCI on/off does not make a difference. When copying a 10GB file on disc, the machine becomes unresponsive. Top shows high IO up to 80%. Application do not start until the copy operation has finished.

Running Ubuntu 9.04 with Linux 2.6.30-020630-generic #020630 SMP Wed Jun 10 09:04:38 UTC 2009 x86_64 GNU/Linux

Revision history for this message

Jim Lieb (lieb) wrote on 2009-07-08:

#235

This regression has been around since about the 2.6.18 timeframe and has eluded a lot of testing to isolate the root cause. The most promising fix is in the VM subsystem (mm) where the LRU scan has been changed to favor keeping executable pages active longer. Most of these symptoms come down to VM thrashing to make room for I/O pages. The key change/commit is ab4754d24a0f2e05920170c845bd84472814c6, "vmscan: make mapped executable pages the first class citizen". For those interested in the details and are familiar with 'git', the commit changelog entry has a complete description of the problem and the fix. You can find this in either the ubuntu-karmic git repository or on kernel.org.

This change was merged into the 2.6.31r1 kernel. The Karmic Alpha 3 snapshot, currently scheduled for the last week of July, will have a 2.6.31 kernel containing this change. Please test this version and report back whether your latency issues have been resolved. There is no guarantee that this change will solve the latency problems in any particular workload so as much testing in a variety of machines and workloads is important.

Thank you.

NOTE: This new version of the Karmic kernel will also have the new KMS patches to match the upgrade of the Xorg server. Since most of the latency complaints center around GUI latencies, this adds a new set of variables. There are mainline kernel packages available now for those who cannot wait for the Alpha 3 release that can run on either Jaunty or Karmic (alpha) but they may have problems with the older Xorg server. If you find X related problems with these kernels, please wait for the Alpha 3 release and not bother to report X problems unless they are also present in the A3 release.

Backporting note:
The commit mentioned above is just one change in the VM subsystem. Backporting this and the number of its associated patches back to a Jaunty (2.6.28) or earlier kernels would probably not be productive and may create new stability problems of their own given the amount of change between the two versions.

This regression has been around since about the 2.6.18 timeframe and has eluded a lot of testing to isolate the root cause.  The most promising fix is in the VM subsystem (mm) where the LRU scan has been changed to favor keeping executable pages active longer.  Most of these symptoms come down to VM thrashing to make room for I/O pages.  The key change/commit is ab4754d24a0f2e05920170c845bd84472814c6, "vmscan: make mapped executable pages the first class citizen".  For those interested in the details and are familiar with 'git', the commit changelog entry has a complete description of the problem and the fix.  You can find this in either the ubuntu-karmic git repository or on kernel.org.

This change was merged into the 2.6.31r1 kernel.  The Karmic Alpha 3 snapshot, currently scheduled for the last week of July, will have a 2.6.31 kernel containing this change.  Please test this version and report back whether your latency issues have been resolved.  There is no guarantee that this change will solve the latency problems in any particular workload so as much testing in a variety of machines and workloads is important.

Thank you.

NOTE: This new version of the Karmic kernel will also have the new KMS patches to match the upgrade of the Xorg server.  Since most of the latency complaints center around GUI latencies, this adds a new set of variables.  There are mainline kernel packages available now for those who cannot wait for the Alpha 3 release that can run on either Jaunty or Karmic (alpha) but they may have problems with the older Xorg server.  If you find X related problems with these kernels, please wait for the Alpha 3 release and not bother to report X problems unless they are also present in the A3 release.

Backporting note:
The commit mentioned above is just one change in the VM subsystem.  Backporting this and the number of its associated patches back to a Jaunty (2.6.28) or earlier kernels would probably not be productive and may create new stability problems of their own given the amount of change between the two versions.

Changed in linux (Ubuntu):
assignee:	nobody → Jim Lieb (lieb)
status:	Confirmed → In Progress

Revision history for this message

Ben Gamari (bgamari) wrote on 2009-07-08:

#236

While there are certainly a lot of considerations here, I fail to see how KMS (kernel mode setting) could ever even _possibly_ affect desktop responsiveness. Most sessions changes modes once, if that. Once the mode is set and framebuffer is setup KMS is entirely out of the picture. Let's not pretend there are more variables than there really are.

Revision history for this message

Jim Lieb (lieb) wrote on 2009-07-10:

#237

Sorry, I did not make myself clear. KMS only enters into this picture because there have been some reports that during this transition period to KMS, the kernel and Xorg have not played well together. This is simply a "heads up" until the next Alpha appears. There have been plenty of side issues in the history of this and other reports already. I intended to mention KMS as a side issue up front to keep testing focused on the I/O + Latency issue, not on something that might be broken in pre-release packages. Consider it, "Warning, do not step here."

Revision history for this message

JQ (bazs111) wrote on 2009-07-11:

#238

latencytop screenshop when problem happens Edit (8.3 KiB, image/png)

can latencytop help with anything?

Revision history for this message

cornbread (corn13read) wrote on 2009-07-31:

#239

+1 for making this a critical bug.

I have to warn ever ubuntu user I set up that this will cause them issues when doing large data transfers. Quite harmful to user friendliness.

Canonical should make this one of their higher priorities to get fixed.

Revision history for this message

cl333r (cl333r) wrote on 2009-08-08:

#240

The importance is (already) marked as "high". I wonder, is it difficult to fix or reproduce? Some ppl claim this is a regression since kernel 2.6.15.

Anyway, the bug is real, I (like many others) tested it on my dual booting computer, and the Ubuntu desktop feels significantly less responsive than winXP when doing file I/O in the background, it's also a bit annoying that the bug has been filed back in 2007 and there still seems to be certain uncertainty around it.

Revision history for this message

Paulo J. S. Silva (pjssilva) wrote on 2009-08-13:

#241

Did anyone try koala alpha 3?

I did and have some interesting behavior.

My machine (a intel motherboard with a G33 chipset) and Core 2 Duo E6600 processor. It has 2 SATA drives under AHCI and I use a 64bit kernel.

I can easily reproduce the problem using the fsync-tester program that you can get in the kernel bug #12309 (http://bugzilla.kernel.org/show_bug.cgi?id=12309). All I need to do is to run the program with a dd that creates
very large file with the command line:

dd if=/dev/zero of=./bigfile bs=1M count=15000 & ./fsync-tester

In jaunty (ext3) the desktop becomes unresponsive right away. The times given by fsync-tester look like:

fsync time: 0.5422
fsync time: 6.3691
fsync time: 8.5983
fsync time: 0.7820
fsync time: 0.7695
fsync time: 4.6577
fsync time: 5.6024
fsync time: 9.4238
fsync time: 10.8609

I tried some variations of koala. Here are my findings:

1) 64bit Koala in default configuration (ext4).

The desktop is more responsive, but not really good yet. The fsync-tester gives:

fsync time: 0.1351
fsync time: 0.9104
fsync time: 9.1311
fsync time: 1.9133
fsync time: 11.9529
fsync time: 1.6751
fsync time: 2.7171
fsync time: 8.4801

So the better responsiveness is not related to the fsync times, probably it is related to other
changes in the kernel. There is no clear change if I go from turn AHCI off in the bios.

2) If I change the journal mode to writeback, the fsync times improve a lot (and the responsiveness improve):

fsync time: 0.0781
fsync time: 0.0581
fsync time: 1.7040
fsync time: 1.4743
fsync time: 1.5957
fsync time: 1.7751
fsync time: 1.9164
fsync time: 1.4886
fsync time: 1.3991
fsync time: 1.8332

3) If on the other hand, I use a i386 kernel with the default ordered mode the times are also much better (as good as writeback + amd64):

fsync time: 0.0677
fsync time: 3.8825
fsync time: 1.4467
fsync time: 2.7759
fsync time: 1.5819
fsync time: 3.2423
fsync time: 3.4318
fsync time: 1.5432
fsync time: 1.3225

4) i386 + writeback gets a little better:

fsync time: 0.0946
fsync time: 1.3528
fsync time: 1.4029
fsync time: 1.3787
fsync time: 1.0880
fsync time: 1.2656
fsync time: 0.9047
fsync time: 0.8842
fsync time: 0.8008
fsync time: 1.4933
fsync time: 1.3645

(There are know 3s delays as above)

5) Now comes the interesting surprise. If I install amd64 koala using ext3 with its default mode (which I think is
writeback, but I am not sure). I get very good responsiveness and times:

fsync time: 0.0329
fsync time: 0.0156
fsync time: 0.2369
fsync time: 0.1274
fsync time: 0.2285
fsync time: 0.2196
fsync time: 0.2563
fsync time: 0.2147
fsync time: 0.2968
fsync time: 0.2602
fsync time: 0.1131
fsync time: 0.2348

I didn't try i386+ext3. I ran out of time and patience :-)

So, in my computer the problem seems to be related with many factors: amd64 X i386, writeback X ordered mode, and ext4 X ext3.

It would be very interesting if anyone can reproduce my little experiment.

And remember, if you are annoyed by this bug you may want to stick with ext3 in koala for now.

Did anyone try koala alpha 3?

I did and have some interesting behavior.

My machine (a intel motherboard with a G33 chipset) and Core 2 Duo E6600 processor. It has 2 SATA drives under AHCI and I use a 64bit kernel.

I can easily reproduce the problem using the fsync-tester program that you can get in the kernel bug #12309 (http://bugzilla.kernel.org/show_bug.cgi?id=12309). All I need to do is to run the program with a dd that creates 
very large file with the command line:

dd if=/dev/zero of=./bigfile bs=1M count=15000 & ./fsync-tester

In jaunty (ext3) the desktop becomes unresponsive right away. The times given by fsync-tester look like:

fsync time: 0.5422
fsync time: 6.3691
fsync time: 8.5983
fsync time: 0.7820
fsync time: 0.7695
fsync time: 4.6577
fsync time: 5.6024
fsync time: 9.4238
fsync time: 10.8609

I tried some variations of koala. Here are my findings:

1) 64bit Koala in default configuration (ext4).

The desktop is more responsive, but not really good yet. The fsync-tester gives:

fsync time: 0.1351
fsync time: 0.9104
fsync time: 9.1311
fsync time: 1.9133
fsync time: 11.9529
fsync time: 1.6751
fsync time: 2.7171
fsync time: 8.4801

So the better responsiveness is not related to the fsync times, probably it is related to other
changes in the kernel. There is no clear change if I go from turn AHCI off in the bios.

2) If I change the journal mode to writeback, the fsync times improve a lot (and the responsiveness improve):

fsync time: 0.0781
fsync time: 0.0581
fsync time: 1.7040
fsync time: 1.4743
fsync time: 1.5957
fsync time: 1.7751
fsync time: 1.9164
fsync time: 1.4886
fsync time: 1.3991
fsync time: 1.8332

3) If on the other hand, I use a i386 kernel with the default ordered mode the times are also much better (as good as writeback + amd64):

fsync time: 0.0677
fsync time: 3.8825
fsync time: 1.4467
fsync time: 2.7759
fsync time: 1.5819
fsync time: 3.2423
fsync time: 3.4318
fsync time: 1.5432
fsync time: 1.3225

4) i386 + writeback gets a little better:

fsync time: 0.0946
fsync time: 1.3528
fsync time: 1.4029
fsync time: 1.3787
fsync time: 1.0880
fsync time: 1.2656
fsync time: 0.9047
fsync time: 0.8842
fsync time: 0.8008
fsync time: 1.4933
fsync time: 1.3645

(There are know 3s delays as above)

5) Now comes the interesting surprise. If I install amd64 koala using ext3 with its default mode (which I think is 
writeback, but I am not sure). I get very good responsiveness and times:

fsync time: 0.0329
fsync time: 0.0156
fsync time: 0.2369
fsync time: 0.1274
fsync time: 0.2285
fsync time: 0.2196
fsync time: 0.2563
fsync time: 0.2147
fsync time: 0.2968
fsync time: 0.2602
fsync time: 0.1131
fsync time: 0.2348

I didn't try i386+ext3. I ran out of time and patience :-)

So, in my computer the problem seems to be related with many factors: amd64 X i386,  writeback X ordered mode, and ext4 X ext3.

It would be very interesting if anyone can reproduce my little experiment.

And remember, if you are annoyed by this bug you may want to stick with ext3 in koala for now.

Revision history for this message

cornbread (corn13read) wrote on 2009-08-13:

#242

Bug exists for me with 9.10 Alpha 3 64 bit

Revision history for this message

Ben Gamari (bgamari) wrote on 2009-08-13:

#243

Can we please stop referring to this as a bug? It may be a problem, it may be
the product of a collection of bugs, but it is almost certainly not one bug.
This report has to-date accumulated almost 250 comments, including numerous
incomparable benchmarks, dozens of descriptions of subtly different problems,
countless flawed workarounds, and yet not a single bisection attempt.

In fact, this report is in far worse shape than the kernel.org report which was
closed months ago due to lack of focus. I strongly believe that this report
should see the same end. So far the patchset which was most likely to fix this
has already been merged (8cab4754: vmscan: make mapped executable pages the
first class citizen). Since this clearly hasn't improved things, it is time
that we go back to the drawing board.

At this point, the only responsible course forward is to close this bug and
start from scratch, this timing taking greater care to keep independent bugs in
separate reports; otherwise we will end up in the same situation as we
currently find ourselves. In general, I believe that Ubuntu's bug tracker
really isn't an appropriate forum for discussing what is demonstrably a
cross-distribution kernel issue. While we can certainly have a tracker here,
true technical discussion belongs on the kernel.org report.

As has been demonstrated in the past, this bug is quite difficult to pin-down.
A responsive desktop is the product of interactions between components in all
layers of the stack, including (perhaps) most importantly the memory management
and block layers. We must avoid convoluting things any more than they already
are by tying together matters which are fundamentally independent (no more
driver references; this has been shown to be a largely hardware-independent
bug, treat it as such).

Anyways, despite all of these considerations, I am hopeful that a solution will
be found. As a first order of business, someone with the proper permissions
must put this bug out of its long-lived misery. Then perhaps we can move
forward to isolating the true cause of this issue.

Can we please stop referring to this as a bug? It may be a problem, it may be
the product of a collection of bugs, but it is almost certainly not one bug.
This report has to-date accumulated almost 250 comments, including numerous
incomparable benchmarks, dozens of descriptions of subtly different problems,
countless flawed workarounds, and yet not a single bisection attempt.

In fact, this report is in far worse shape than the kernel.org report which was
closed months ago due to lack of focus. I strongly believe that this report
should see the same end. So far the patchset which was most likely to fix this
has already been merged (8cab4754: vmscan: make mapped executable pages the
first class citizen). Since this clearly hasn't improved things, it is time
that we go back to the drawing board.

At this point, the only responsible course forward is to close this bug and
start from scratch, this timing taking greater care to keep independent bugs in
separate reports; otherwise we will end up in the same situation as we
currently find ourselves. In general, I believe that Ubuntu's bug tracker
really isn't an appropriate forum for discussing what is demonstrably a
cross-distribution kernel issue. While we can certainly have a tracker here,
true technical discussion belongs on the kernel.org report.

As has been demonstrated in the past, this bug is quite difficult to pin-down.
A responsive desktop is the product of interactions between components in all
layers of the stack, including (perhaps) most importantly the memory management
and block layers. We must avoid convoluting things any more than they already
are by tying together matters which are fundamentally independent (no more
driver references; this has been shown to be a largely hardware-independent
bug, treat it as such).

Anyways, despite all of these considerations, I am hopeful that a solution will
be found. As a first order of business, someone with the proper permissions
must put this bug out of its long-lived misery. Then perhaps we can move
forward to isolating the true cause of this issue.

Revision history for this message

Sam Davies (seivadmas) wrote on 2009-09-10:

#244

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/427210

^^ Bug report that actually discusses one particular possible solution for this problem with an easily reproducible test.

Revision history for this message

Rocko (rockorequin) wrote on 2009-10-09:

#245

In case it's relevant for Karmic, this email mentions latency issues reintroduced in kernel 2.6.30/31 but apparently fixed now in 2.6.32: http://lkml.org/lkml/2009/10/5/247

Revision history for this message

LGB [Gábor Lénárt] (lgb) wrote on 2009-10-09:

#246

Hmm, Rocko, sounds so good ;) I'm wondering if it's possible to backport these patch(es?) into the kernel used by karmic (or is the change to big?), since it seems that's a serious problem for most desktop users, even my friends and relatives who are using ubuntu said, that ubuntu went totally unusable because of performance (responsiveness under even smaller loads) problems :( So I think at least, it's a quite serious issue for desktop users.

Revision history for this message

Tobias Oetiker (tobi-oetiker) wrote on 2009-10-09: FsOpBench shows only data=writeback with cfq works

#247

I have just completed extensive benchmarking on 2.6.31.2 and 2.6.24
with a new benchmark program I have written, to measure real-world
io performance in high load situations where readers and writers
are compeeting.

I am looking at HW RAID setups as well a normal single harddrive
situations. The bottom line is that on a single harddrive only

data=writeback with cfq scheduler

has decent performance when readers and writers are in competition
and even then, there will be huge outliers of many seconds happeing
every now and then. See

http://insights.oetiker.ch/linux/fsopbench/

for details.

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch <email address hidden> ++41 62 775 9902 / sb: -9900

Revision history for this message

Jamie Lokier (jamie-shareable) wrote on 2009-10-09: Re: [Bug 131094] FsOpBench shows only data=writeback with cfq works

#248

Tobias Oetiker wrote:
> I have just completed extensive benchmarking on 2.6.31.2 and 2.6.24
> with a new benchmark program I have written, to measure real-world
> io performance in high load situations where readers and writers
> are compeeting.
>
> I am looking at HW RAID setups as well a normal single harddrive
> situations. The bottom line is that on a single harddrive only
>
> data=writeback with cfq scheduler
>
> has decent performance when readers and writers are in competition
> and even then, there will be huge outliers of many seconds happeing
> every now and then. See
>
> http://insights.oetiker.ch/linux/fsopbench/
>
> for details.

Is that with ext4 or ext3?

Revision history for this message

Tobias Oetiker (tobi-oetiker) wrote on 2009-10-09:

#249

Today Jamie Lokier wrote:

> Tobias Oetiker wrote:
> > I have just completed extensive benchmarking on 2.6.31.2 and 2.6.24
> > with a new benchmark program I have written, to measure real-world
> > io performance in high load situations where readers and writers
> > are compeeting.
> >
> > I am looking at HW RAID setups as well a normal single harddrive
> > situations. The bottom line is that on a single harddrive only
> >
> > data=writeback with cfq scheduler
> >
> > has decent performance when readers and writers are in competition
> > and even then, there will be huge outliers of many seconds happeing
> > every now and then. See
> >
> > http://insights.oetiker.ch/linux/fsopbench/
> >
> > for details.
>
> Is that with ext4 or ext3?

I have tested with ext3

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch <email address hidden> ++41 62 775 9902 / sb: -9900

Revision history for this message

Yan Li (yanli) wrote on 2009-10-10:

#250

Tobi, your testing and results are great and very useful. It would be better if you can run those tests on ext4. Thank you.

Revision history for this message

Tobias Oetiker (tobi-oetiker) wrote on 2009-10-11: Re: [Bug 131094] Re: Heavy Disk I/O harms desktop responsiveness

#251

Hi Yan,

Yesterday Yan Li wrote:

> Tobi, your testing and results are great and very useful. It would be
> better if you can run those tests on ext4. Thank you.

I have now also put ext4 through the paces ... its overall
behaviour seems to be that same than with ext3, the same settings
render the best performance. Overall, the single reader scenario
seems to suffer a performance drop of 20% to 30% while the three
reader scenario gains about 30%. Large maximum latencies have
become bigger if anything.

I have updated the report on

http://insights.oetiker.ch/linux/fsopbench/

including the detailed results ...

The cfq scheduler seems todo a pretty good job at being fair. The
main problem (no one seems to be talking about) in my eyes, is the
hangups, where suddently an mkdir call takes up 19 seconds to
complete. This makes all the rest seem like minor issues.

cheers
tobi

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch <email address hidden> ++41 62 775 9902 / sb: -9900

Revision history for this message

Yan Li (yanli) wrote on 2009-10-13:

#252

Tobias Oetiker:

Thank you very much for the update. I'm a bit surprised to see the single-reader on ext4 is worse than that on ext3. I'm to postpone the upgrade of my systems to ext4. I dare not using data=writeback yet.

I'm a bit confused about why you ran this on a RAID6 system. The RAID card/driver might affected the performance in a way yet to be understood. IMHO the less layers between Linux kernel and hard drive, the better we can understand the kernel I/O scheduler/fs etc.

Again, thank you for the great data.

Revision history for this message

Tobias Oetiker (tobi-oetiker) wrote on 2009-10-13:

#253

Today Yan Li wrote:

> Tobias Oetiker:
>
> Thank you very much for the update. I'm a bit surprised to see the
> single-reader on ext4 is worse than that on ext3. I'm to postpone the
> upgrade of my systems to ext4. I dare not using data=writeback yet.
>
> I'm a bit confused about why you ran this on a RAID6 system. The RAID
> card/driver might affected the performance in a way yet to be
> understood. IMHO the less layers between Linux kernel and hard drive,
> the better we can understand the kernel I/O scheduler/fs etc.
>
> Again, thank you for the great data.

As you can see from the results, running the test on a RAID6 gives
vastly different results. Fact is that for reliability we are
running all our servers on RAID6, so this is the configuration I am
most interested to see working well ... good performance on a
single disk does not help me much ... (I am glad to see that it is
even worse to some extent, than my RAID6 performance).

I think at the heart of the problem lies the fact benchmarks focus
on single aspects of subsystems which then get optimized without
looking at the overall impact.

cheers
tobi

>
>

--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch <email address hidden> ++41 62 775 9902 / sb: -9900

Revision history for this message

cornbread (corn13read) wrote on 2009-10-22:

#254

Fixed for me on my desktop with karmic beta!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Revision history for this message

Rocko (rockorequin) wrote on 2009-10-22:

#255

@cornbread: I think they've put the 'no new fair sleepers' patch into ubuntu's 2.6.31 kernel, which does help with responsiveness (though it slows down some high-CPU tasks like games). This might explain it.

However, I noticed a massive drop in responsiveness yesterday while copying 14GB to a flash drive, including a frozen mouse cursor for up to thirty seconds. Although atop showed that my internal hard drive (/dev/sda) was only being used very lightly (as you'd expect when copying to flash), it was thrashing away constantly. So there are still some issues.

Revision history for this message

Rocko (rockorequin) wrote on 2009-10-26:

#256

An update on my comment above: when doing heavy I/O from an external drive (/dev/sdb) to a slow external flash key (/dev/sda), I can hear my internal hard drive (/dev/sda) thrashing away constantly, even though its light indicates no disk read/write operations. So something in the kernel must be making it constantly seek, and this is affecting /dev/sda access and hence desktop responsiveness.

The desktop is now much faster for applications that are already loaded, but anything that has to access the disk still experiences long wait times.

Revision history for this message

Rocko (rockorequin) wrote on 2009-10-28:

#257

I opened http://bugzilla.kernel.org/show_bug.cgi?id=14491 for this disk thrashing issue. I couldn't get 2.6.32-rc5 to do it, but I can repeat it in 2.6.31.5 when the PC RAM is near full.

Revision history for this message

Geoffrey Pursell (geoffp) wrote on 2009-11-07:

#258

For me, apps become sporadically unresponsive (one or another will actually freeze solid for a few seconds at a time) during a simple file copy from one hard drive to another. The source drive is an older IDE drive (ext3) and the destination is a newer SATA drive (ext4). It's a music collection, with files varying in size from 3 MB to 20; the transfer goes about 27MB/sec.

This is with a stock Ubuntu 2.6.31-14 kernel on a fresh AMD64 Karmic.

Revision history for this message

Hans van den Bogert (hbogert) wrote on 2009-12-15:

#259

I've noticed something very weird, when running tiobench, When run on the root filesystem I'm experiencing same as everyone else, When the exact same test is run/written from/to something else than the root fs, io wait is much lower, responsiveness is excellent and data rate is excellent too.

I can't seem to find any paramaters which are different across filesystems, ruled out lvm, and it's on the same disk.

Revision history for this message

LGB [Gábor Lénárt] (lgb) wrote on 2010-01-04:

#260

I've just had a very annoying experience: on notebook (running up-to-date ubuntu 9.10, 32 bit) I wanted to copy a large file onto a pendrive. Normally, - I think - it shouldn't affect the I/O performance too much, since that pendrive is quite slow, so I doubt that it can make the hdd (I was copying from) or any other I/O subsystem too busy other than the pendrive itself. However it almost killed the notebook: I couldn't change windows, alt-tab required 10 minutes (!) to react. Without any I/O of course I have no usability problems with it. Nothing interesting in the kernel log ...

Revision history for this message

Jonathan Bower (jonathanbower) wrote on 2010-01-04:

#261

LGB, Yes, and this is why I can't really recommend Ubuntu to my friends. unfortunately.

Revision history for this message

tankdriver (stoneraider-deactivatedaccount) wrote on 2010-01-04:

#262

I found out something very interesting:
Test case: Ubuntu Karmic, external USB HDD, Usb stick
1. Copy a lot of data via nautilus to external Usb
2. During copying, plug in the USB Stick.
Nothing happens.
3. When copying is finished, (e.g.. after 20minutes) the USB-Stick suddenly appears on the desktop.

I can confirm this testcase with jaunty&karmic, 32&64-bit on HP Laptops, Asrock&Gigabyte Desktops with multiple USB-Sticks and HDDs (and even with 2 USB-Sticks in the Test case)
Can anyone confirm this? Is this a Kernel (this Bug) or eventually a nautilus issue?

Revision history for this message

Rocko (rockorequin) wrote on 2010-01-05:

#263

@LGB and tankdriver: I've noticed both these problems, but kernel 2.6.32 fixes them for me. The easiest way to try it is to get the header and image deb files (32 or 64 bit as appropriate) from http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.32.2/ and install them with "sudo dpkg -i".

Revision history for this message

tankdriver (stoneraider-deactivatedaccount) wrote on 2010-01-05:

#264

@Rocko: 2.6.32 does not change anything for me (except nvidia doesn't compile ;-) )
For me, I think its an non-kernel issue (nautilus?) because in dmesg the sdc1 sdd .. stuff will show up every time shout after i plug in a drive, but on the (gnome)-Desktop, nothing appears.
I will try Kubuntu for testing.

Revision history for this message

Rocko (rockorequin) wrote on 2010-01-07:

#265

@tankdriver: you are absolutely right; I found I can reproduce this if I copy a large file to a USB flash device (I was using USB hard drives when I tested it before). I've opened a new bug for it (see bug #504113) because I don't think it's relevant to this bug.

However, 2.6.32 does fix (for me) the unresponsive desktop problem when copying a large file to a slow device that LGB reported.

Colin Ian King (colin-king) on 2010-01-19

Changed in linux (Ubuntu):
assignee:	Jim Lieb (lieb) → Ubuntu Kernel Team (ubuntu-kernel-team)
status:	In Progress → Confirmed

Revision history for this message

LGB [Gábor Lénárt] (lgb) wrote on 2010-01-19:

#266

@Rocko: I have problems with non-slow drives too :) I have another hdd where I usually have big I/O (let's call it "data hdd"), the system, my home, swap everything is on another one ("system hdd"). When I have the heavy I/O on data hdd, I/O on the system hdd is horrible too, even if they are the same, fast disks, and nothing is copied etc etc. Anyway I will try the newer kernel too just I need nvidia's "binary blob" so I can't use my desktop system if the newer kernel won't play nice with the nvidia driver :(

Revision history for this message

Rocko (rockorequin) wrote on 2010-01-20:

#267

nvidia dkms installer Edit (4.2 KiB, text/x-sh)

@LGB: To use nvidia in the latest kernel, I normally download the latest nvidia driver from either:

64 bit: ftp://download.nvidia.com/XFree86/Linux-x86_64/

32 bit: ftp://download.nvidia.com/XFree86/Linux-x86/

You want the file with the highest number (...pkg1.run or ...pkg2.run).

Then install it manually (instructions here are for the 195.30 beta driver, which works fine on my PC):

1. Remove any existing nvidia drivers, eg the restricted Ubuntu modules.

2. Either reboot into recovery mode; or get a tty console (eg CTRL-ATL-F1), log in, and do "sudo stop gdm" to kill X (make sure you save any data first!).

3. Install by executing the file you downloaded, eg "sudo sh NVIDIA-Linux-x86_64-195.30-pkg2.run". If 64 bit, tell it to install the 32 bit libraries as well or 32 bit games won't work. If you already have /etc/X11/xorg.conf set up for nvidia, there's no need to let the installer alter it.

4. Reboot.

An optional last step if it works fine is to install to dkms with (eg) "sudo sh installdkms.sh 195.30" (195.30 is the nvidia version in this example, and the script is attached). Then it recompiles the nvidia module automatically whenever you install a new kernel (and it will compile it for the stock 2.6.31 kernel, too).

Revision history for this message

Thomas Pilarski (thomas.pi) wrote on 2010-01-27:

#268

The heavy io problem is partial a ubuntu problem. While copying some files from one hard disk to another with ubuntu (Karmic), my system becomes completely unresponsive with the ppa (32) and the 2.6.31 kernel. The same copy operation with Fedora 12 and the kernel 2.6.32 produces no freezes.

Revision history for this message

djr013 (djr013) wrote on 2010-01-29:

#269

The latest Lucid kernel ("2.6.32-12.16", 64bit) seems to have fixed this longstanding problem for me. Also, for this reason or another during the update, the initial RAM usage appeared to go down a good ~64MB. Before, this machine was mysteriously less usable than an otherwise older and slower one (even after accounting for this having half the RAM). Of course, this bug is fairly general, and could have a few potential causes, so my case may not apply to all who reported this one.

Revision history for this message

LGB [Gábor Lénárt] (lgb) wrote on 2010-02-08:

#270

I've upgraded to lucid. Otherwise it runs quite well, but this I/O problem is still here or even worse! Now a single apt dist-upgrade (with 30 packages to update or so) made my X frozen (no mouse can be moved, the music player just repeats a single second, I guess from some kind of buffer only, etc) for long minutes. I've managed to switch to text console (CTRL-ALT-F1) and after login, according to top, the system load was above 11, and almost all CPU was in "wait" state.

Linux oxygene 2.6.32-12-generic #17-Ubuntu SMP Fri Feb 5 08:14:39 UTC 2010 i686 GNU/Linux

After end of apt-get, everything returned into the normal way of working ...

Revision history for this message

LGB [Gábor Lénárt] (lgb) wrote on 2010-02-15:

#271

Download full text (14.4 KiB)

During those "cannot-do-anything" times, I even got kernel messages. One example:

[998882.032583] INFO: task gvfsd-trash:2063 blocked for more than 120 seconds.
[998882.032589] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[998882.032594] gvfsd-trash D 0001ae75 0 2063 1 0x00000000
[998882.032602] d59f7da0 00000086 00000000 0001ae75 00000000 c087c6c0 d5825c5c c087c6c0
[998882.032614] 7a06b2df 00038c42 c087c6c0 c087c6c0 d5825c5c c087c6c0 c087c6c0 dd11a200
[998882.032628] 7a04da23 00038c42 d58259b0 cb0f4078 cb0f407c ffffffff d59f7dcc c05a48a6
[998882.032637] Call Trace:
[998882.032650] [<c05a48a6>] __mutex_lock_slowpath+0xc6/0x130
[998882.032654] [<c05a47c5>] mutex_lock+0x25/0x40
[998882.032660] [<c020e1fe>] real_lookup+0x2e/0x110
[998882.032664] [<c020fc05>] do_lookup+0x95/0xc0
[998882.032669] [<c021043d>] __link_path_walk+0x54d/0xb60
[998882.032673] [<c020f246>] ? path_to_nameidata+0x36/0x50
[998882.032677] [<c0210bf6>] path_walk+0x46/0xa0
[998882.032681] [<c0210d59>] do_path_lookup+0x59/0x90
[998882.032685] [<c02118a1>] user_path_at+0x41/0x80
[998882.032691] [<c01304cc>] ? kmap_atomic_prot+0x4c/0xf0
[998882.032696] [<c01e2f60>] ? __do_fault+0x3a0/0x4b0
[998882.032702] [<c02099ca>] vfs_fstatat+0x3a/0x70
[998882.032706] [<c0209a60>] vfs_lstat+0x20/0x30
[998882.032710] [<c0209a89>] sys_lstat64+0x19/0x30
[998882.032715] [<c05a7f2b>] ? do_page_fault+0x19b/0x380
[998882.032721] [<c0237492>] ? sys_inotify_add_watch+0xc2/0x100
[998882.032727] [<c010344c>] syscall_call+0x7/0xb
[999002.032195] INFO: task gvfsd-trash:2063 blocked for more than 120 seconds.
[999002.032202] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[999002.032208] gvfsd-trash D 0001ae75 0 2063 1 0x00000000
[999002.032218] d59f7da0 00000086 00000000 0001ae75 00000000 c087c6c0 d5825c5c c087c6c0
[999002.032232] 7a06b2df 00038c42 c087c6c0 c087c6c0 d5825c5c c087c6c0 c087c6c0 dd11a200
[999002.032245] 7a04da23 00038c42 d58259b0 cb0f4078 cb0f407c ffffffff d59f7dcc c05a48a6
[999002.032258] Call Trace:
[999002.032274] [<c05a48a6>] __mutex_lock_slowpath+0xc6/0x130
[999002.032283] [<c05a47c5>] mutex_lock+0x25/0x40
[999002.032291] [<c020e1fe>] real_lookup+0x2e/0x110
[999002.032298] [<c020fc05>] do_lookup+0x95/0xc0
[999002.032304] [<c021043d>] __link_path_walk+0x54d/0xb60
[999002.032312] [<c020f246>] ? path_to_nameidata+0x36/0x50
[999002.032318] [<c0210bf6>] path_walk+0x46/0xa0
[999002.032324] [<c0210d59>] do_path_lookup+0x59/0x90
[999002.032331] [<c02118a1>] user_path_at+0x41/0x80
[999002.032338] [<c01304cc>] ? kmap_atomic_prot+0x4c/0xf0
[999002.032348] [<c01e2f60>] ? __do_fault+0x3a0/0x4b0
[999002.032354] [<c02099ca>] vfs_fstatat+0x3a/0x70
[999002.032358] [<c0209a60>] vfs_lstat+0x20/0x30
[999002.032362] [<c0209a89>] sys_lstat64+0x19/0x30
[999002.032367] [<c05a7f2b>] ? do_page_fault+0x19b/0x380
[999002.032374] [<c0237492>] ? sys_inotify_add_watch+0xc2/0x100
[999002.032379] [<c010344c>] syscall_call+0x7/0xb
[999122.032597] INFO: task gvfsd-trash:2063 blocked for more than 120 seconds.
[999122.032603] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[99912...

During those "cannot-do-anything" times, I even got kernel messages. One example:

[998882.032583] INFO: task gvfsd-trash:2063 blocked for more than 120 seconds.
[998882.032589] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[998882.032594] gvfsd-trash   D 0001ae75     0  2063      1 0x00000000
[998882.032602]  d59f7da0 00000086 00000000 0001ae75 00000000 c087c6c0 d5825c5c c087c6c0
[998882.032614]  7a06b2df 00038c42 c087c6c0 c087c6c0 d5825c5c c087c6c0 c087c6c0 dd11a200
[998882.032628]  7a04da23 00038c42 d58259b0 cb0f4078 cb0f407c ffffffff d59f7dcc c05a48a6
[998882.032637] Call Trace:
[998882.032650]  [<c05a48a6>] __mutex_lock_slowpath+0xc6/0x130
[998882.032654]  [<c05a47c5>] mutex_lock+0x25/0x40
[998882.032660]  [<c020e1fe>] real_lookup+0x2e/0x110
[998882.032664]  [<c020fc05>] do_lookup+0x95/0xc0
[998882.032669]  [<c021043d>] __link_path_walk+0x54d/0xb60
[998882.032673]  [<c020f246>] ? path_to_nameidata+0x36/0x50
[998882.032677]  [<c0210bf6>] path_walk+0x46/0xa0
[998882.032681]  [<c0210d59>] do_path_lookup+0x59/0x90
[998882.032685]  [<c02118a1>] user_path_at+0x41/0x80
[998882.032691]  [<c01304cc>] ? kmap_atomic_prot+0x4c/0xf0
[998882.032696]  [<c01e2f60>] ? __do_fault+0x3a0/0x4b0
[998882.032702]  [<c02099ca>] vfs_fstatat+0x3a/0x70
[998882.032706]  [<c0209a60>] vfs_lstat+0x20/0x30
[998882.032710]  [<c0209a89>] sys_lstat64+0x19/0x30
[998882.032715]  [<c05a7f2b>] ? do_page_fault+0x19b/0x380
[998882.032721]  [<c0237492>] ? sys_inotify_add_watch+0xc2/0x100
[998882.032727]  [<c010344c>] syscall_call+0x7/0xb
[999002.032195] INFO: task gvfsd-trash:2063 blocked for more than 120 seconds.
[999002.032202] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[999002.032208] gvfsd-trash   D 0001ae75     0  2063      1 0x00000000
[999002.032218]  d59f7da0 00000086 00000000 0001ae75 00000000 c087c6c0 d5825c5c c087c6c0
[999002.032232]  7a06b2df 00038c42 c087c6c0 c087c6c0 d5825c5c c087c6c0 c087c6c0 dd11a200
[999002.032245]  7a04da23 00038c42 d58259b0 cb0f4078 cb0f407c ffffffff d59f7dcc c05a48a6
[999002.032258] Call Trace:
[999002.032274]  [<c05a48a6>] __mutex_lock_slowpath+0xc6/0x130
[999002.032283]  [<c05a47c5>] mutex_lock+0x25/0x40
[999002.032291]  [<c020e1fe>] real_lookup+0x2e/0x110
[999002.032298]  [<c020fc05>] do_lookup+0x95/0xc0
[999002.032304]  [<c021043d>] __link_path_walk+0x54d/0xb60
[999002.032312]  [<c020f246>] ? path_to_nameidata+0x36/0x50
[999002.032318]  [<c0210bf6>] path_walk+0x46/0xa0
[999002.032324]  [<c0210d59>] do_path_lookup+0x59/0x90
[999002.032331]  [<c02118a1>] user_path_at+0x41/0x80
[999002.032338]  [<c01304cc>] ? kmap_atomic_prot+0x4c/0xf0
[999002.032348]  [<c01e2f60>] ? __do_fault+0x3a0/0x4b0
[999002.032354]  [<c02099ca>] vfs_fstatat+0x3a/0x70
[999002.032358]  [<c0209a60>] vfs_lstat+0x20/0x30
[999002.032362]  [<c0209a89>] sys_lstat64+0x19/0x30
[999002.032367]  [<c05a7f2b>] ? do_page_fault+0x19b/0x380
[999002.032374]  [<c0237492>] ? sys_inotify_add_watch+0xc2/0x100
[999002.032379]  [<c010344c>] syscall_call+0x7/0xb
[999122.032597] INFO: task gvfsd-trash:2063 blocked for more than 120 seconds.
[999122.032603] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[999122.032608] gvfsd-trash   D 0001ae75     0  2063      1 0x00000000
[999122.032616]  d59f7da0 00000086 00000000 0001ae75 00000000 c087c6c0 d5825c5c c087c6c0
[999122.032632]  7a06b2df 00038c42 c087c6c0 c087c6c0 d5825c5c c087c6c0 c087c6c0 dd11a200
[999122.032640]  7a04da23 00038c42 d58259b0 cb0f4078 cb0f407c ffffffff d59f7dcc c05a48a6
[999122.032649] Call Trace:
[999122.032662]  [<c05a48a6>] __mutex_lock_slowpath+0xc6/0x130
[999122.032667]  [<c05a47c5>] mutex_lock+0x25/0x40
[999122.032673]  [<c020e1fe>] real_lookup+0x2e/0x110
[999122.032677]  [<c020fc05>] do_lookup+0x95/0xc0
[999122.032681]  [<c021043d>] __link_path_walk+0x54d/0xb60
[999122.032686]  [<c020f246>] ? path_to_nameidata+0x36/0x50
[999122.032691]  [<c0210bf6>] path_walk+0x46/0xa0
[999122.032695]  [<c0210d59>] do_path_lookup+0x59/0x90
[999122.032699]  [<c02118a1>] user_path_at+0x41/0x80
[999122.032705]  [<c01304cc>] ? kmap_atomic_prot+0x4c/0xf0
[999122.032710]  [<c01e2f60>] ? __do_fault+0x3a0/0x4b0
[999122.032716]  [<c02099ca>] vfs_fstatat+0x3a/0x70
[999122.032721]  [<c0209a60>] vfs_lstat+0x20/0x30
[999122.032725]  [<c0209a89>] sys_lstat64+0x19/0x30
[999122.032730]  [<c05a7f2b>] ? do_page_fault+0x19b/0x380
[999122.032736]  [<c0237492>] ? sys_inotify_add_watch+0xc2/0x100
[999122.032742]  [<c010344c>] syscall_call+0x7/0xb
[999242.032601] INFO: task gvfsd-trash:2063 blocked for more than 120 seconds.
[999242.032609] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[999242.032614] gvfsd-trash   D 0001ae75     0  2063      1 0x00000000
[999242.032622]  d59f7da0 00000086 00000000 0001ae75 00000000 c087c6c0 d5825c5c c087c6c0
[999242.032635]  7a06b2df 00038c42 c087c6c0 c087c6c0 d5825c5c c087c6c0 c087c6c0 dd11a200
[999242.032645]  7a04da23 00038c42 d58259b0 cb0f4078 cb0f407c ffffffff d59f7dcc c05a48a6
[999242.032654] Call Trace:
[999242.032667]  [<c05a48a6>] __mutex_lock_slowpath+0xc6/0x130
[999242.032672]  [<c05a47c5>] mutex_lock+0x25/0x40
[999242.032678]  [<c020e1fe>] real_lookup+0x2e/0x110
[999242.032682]  [<c020fc05>] do_lookup+0x95/0xc0
[999242.032686]  [<c021043d>] __link_path_walk+0x54d/0xb60
[999242.032691]  [<c020f246>] ? path_to_nameidata+0x36/0x50
[999242.032695]  [<c0210bf6>] path_walk+0x46/0xa0
[999242.032699]  [<c0210d59>] do_path_lookup+0x59/0x90
[999242.032704]  [<c02118a1>] user_path_at+0x41/0x80
[999242.032709]  [<c01304cc>] ? kmap_atomic_prot+0x4c/0xf0
[999242.032715]  [<c01e2f60>] ? __do_fault+0x3a0/0x4b0
[999242.032720]  [<c02099ca>] vfs_fstatat+0x3a/0x70
[999242.032725]  [<c0209a60>] vfs_lstat+0x20/0x30
[999242.032729]  [<c0209a89>] sys_lstat64+0x19/0x30
[999242.032734]  [<c05a7f2b>] ? do_page_fault+0x19b/0x380
[999242.032741]  [<c0237492>] ? sys_inotify_add_watch+0xc2/0x100
[999242.032746]  [<c010344c>] syscall_call+0x7/0xb
[999362.032591] INFO: task gvfsd-trash:2063 blocked for more than 120 seconds.
[999362.032597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[999362.032601] gvfsd-trash   D 0001ae75     0  2063      1 0x00000000
[999362.032609]  d59f7da0 00000086 00000000 0001ae75 00000000 c087c6c0 d5825c5c c087c6c0
[999362.032621]  7a06b2df 00038c42 c087c6c0 c087c6c0 d5825c5c c087c6c0 c087c6c0 dd11a200
[999362.032635]  7a04da23 00038c42 d58259b0 cb0f4078 cb0f407c ffffffff d59f7dcc c05a48a6
[999362.032645] Call Trace:
[999362.032658]  [<c05a48a6>] __mutex_lock_slowpath+0xc6/0x130
[999362.032663]  [<c05a47c5>] mutex_lock+0x25/0x40
[999362.032668]  [<c020e1fe>] real_lookup+0x2e/0x110
[999362.032673]  [<c020fc05>] do_lookup+0x95/0xc0
[999362.032678]  [<c021043d>] __link_path_walk+0x54d/0xb60
[999362.032683]  [<c020f246>] ? path_to_nameidata+0x36/0x50
[999362.032687]  [<c0210bf6>] path_walk+0x46/0xa0
[999362.032691]  [<c0210d59>] do_path_lookup+0x59/0x90
[999362.032696]  [<c02118a1>] user_path_at+0x41/0x80
[999362.032702]  [<c01304cc>] ? kmap_atomic_prot+0x4c/0xf0
[999362.032708]  [<c01e2f60>] ? __do_fault+0x3a0/0x4b0
[999362.032714]  [<c02099ca>] vfs_fstatat+0x3a/0x70
[999362.032718]  [<c0209a60>] vfs_lstat+0x20/0x30
[999362.032722]  [<c0209a89>] sys_lstat64+0x19/0x30
[999362.032728]  [<c05a7f2b>] ? do_page_fault+0x19b/0x380
[999362.032735]  [<c0237492>] ? sys_inotify_add_watch+0xc2/0x100
[999362.032740]  [<c010344c>] syscall_call+0x7/0xb
[999482.032233] INFO: task gvfsd-trash:2063 blocked for more than 120 seconds.
[999482.032240] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[999482.032246] gvfsd-trash   D 0001ae75     0  2063      1 0x00000000
[999482.032255]  d59f7da0 00000086 00000000 0001ae75 00000000 c087c6c0 d5825c5c c087c6c0
[999482.032269]  7a06b2df 00038c42 c087c6c0 c087c6c0 d5825c5c c087c6c0 c087c6c0 dd11a200
[999482.032282]  7a04da23 00038c42 d58259b0 cb0f4078 cb0f407c ffffffff d59f7dcc c05a48a6
[999482.032294] Call Trace:
[999482.032311]  [<c05a48a6>] __mutex_lock_slowpath+0xc6/0x130
[999482.032317]  [<c05a47c5>] mutex_lock+0x25/0x40
[999482.032325]  [<c020e1fe>] real_lookup+0x2e/0x110
[999482.032332]  [<c020fc05>] do_lookup+0x95/0xc0
[999482.032338]  [<c021043d>] __link_path_walk+0x54d/0xb60
[999482.032345]  [<c020f246>] ? path_to_nameidata+0x36/0x50
[999482.032351]  [<c0210bf6>] path_walk+0x46/0xa0
[999482.032357]  [<c0210d59>] do_path_lookup+0x59/0x90
[999482.032364]  [<c02118a1>] user_path_at+0x41/0x80
[999482.032373]  [<c01304cc>] ? kmap_atomic_prot+0x4c/0xf0
[999482.032383]  [<c01e2f60>] ? __do_fault+0x3a0/0x4b0
[999482.032391]  [<c02099ca>] vfs_fstatat+0x3a/0x70
[999482.032397]  [<c0209a60>] vfs_lstat+0x20/0x30
[999482.032402]  [<c0209a89>] sys_lstat64+0x19/0x30
[999482.032410]  [<c05a7f2b>] ? do_page_fault+0x19b/0x380
[999482.032418]  [<c0237492>] ? sys_inotify_add_watch+0xc2/0x100
[999482.032426]  [<c010344c>] syscall_call+0x7/0xb
[999602.032611] INFO: task gvfsd-trash:2063 blocked for more than 120 seconds.
[999602.032619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[999602.032624] gvfsd-trash   D 0001ae75     0  2063      1 0x00000000
[999602.032633]  d59f7da0 00000086 00000000 0001ae75 00000000 c087c6c0 d5825c5c c087c6c0
[999602.032648]  7a06b2df 00038c42 c087c6c0 c087c6c0 d5825c5c c087c6c0 c087c6c0 dd11a200
[999602.032659]  7a04da23 00038c42 d58259b0 cb0f4078 cb0f407c ffffffff d59f7dcc c05a48a6
[999602.032672] Call Trace:
[999602.032686]  [<c05a48a6>] __mutex_lock_slowpath+0xc6/0x130
[999602.032691]  [<c05a47c5>] mutex_lock+0x25/0x40
[999602.032697]  [<c020e1fe>] real_lookup+0x2e/0x110
[999602.032701]  [<c020fc05>] do_lookup+0x95/0xc0
[999602.032705]  [<c021043d>] __link_path_walk+0x54d/0xb60
[999602.032710]  [<c020f246>] ? path_to_nameidata+0x36/0x50
[999602.032714]  [<c0210bf6>] path_walk+0x46/0xa0
[999602.032718]  [<c0210d59>] do_path_lookup+0x59/0x90
[999602.032722]  [<c02118a1>] user_path_at+0x41/0x80
[999602.032728]  [<c01304cc>] ? kmap_atomic_prot+0x4c/0xf0
[999602.032736]  [<c01e2f60>] ? __do_fault+0x3a0/0x4b0
[999602.032744]  [<c02099ca>] vfs_fstatat+0x3a/0x70
[999602.032752]  [<c0209a60>] vfs_lstat+0x20/0x30
[999602.032756]  [<c0209a89>] sys_lstat64+0x19/0x30
[999602.032762]  [<c05a7f2b>] ? do_page_fault+0x19b/0x380
[999602.032768]  [<c0237492>] ? sys_inotify_add_watch+0xc2/0x100
[999602.032774]  [<c010344c>] syscall_call+0x7/0xb
[999722.032579] INFO: task gvfsd-trash:2063 blocked for more than 120 seconds.
[999722.032585] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[999722.032589] gvfsd-trash   D 0001ae75     0  2063      1 0x00000000
[999722.032598]  d59f7da0 00000086 00000000 0001ae75 00000000 c087c6c0 d5825c5c c087c6c0
[999722.032610]  7a06b2df 00038c42 c087c6c0 c087c6c0 d5825c5c c087c6c0 c087c6c0 dd11a200
[999722.032624]  7a04da23 00038c42 d58259b0 cb0f4078 cb0f407c ffffffff d59f7dcc c05a48a6
[999722.032633] Call Trace:
[999722.032646]  [<c05a48a6>] __mutex_lock_slowpath+0xc6/0x130
[999722.032652]  [<c05a47c5>] mutex_lock+0x25/0x40
[999722.032658]  [<c020e1fe>] real_lookup+0x2e/0x110
[999722.032662]  [<c020fc05>] do_lookup+0x95/0xc0
[999722.032666]  [<c021043d>] __link_path_walk+0x54d/0xb60
[999722.032672]  [<c020f246>] ? path_to_nameidata+0x36/0x50
[999722.032679]  [<c0210bf6>] path_walk+0x46/0xa0
[999722.032686]  [<c0210d59>] do_path_lookup+0x59/0x90
[999722.032694]  [<c02118a1>] user_path_at+0x41/0x80
[999722.032701]  [<c01304cc>] ? kmap_atomic_prot+0x4c/0xf0
[999722.032710]  [<c01e2f60>] ? __do_fault+0x3a0/0x4b0
[999722.032717]  [<c02099ca>] vfs_fstatat+0x3a/0x70
[999722.032723]  [<c0209a60>] vfs_lstat+0x20/0x30
[999722.032728]  [<c0209a89>] sys_lstat64+0x19/0x30
[999722.032734]  [<c05a7f2b>] ? do_page_fault+0x19b/0x380
[999722.032742]  [<c0237492>] ? sys_inotify_add_watch+0xc2/0x100
[999722.032754]  [<c010344c>] syscall_call+0x7/0xb
[999842.032611] INFO: task gvfsd-trash:2063 blocked for more than 120 seconds.
[999842.032624] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[999842.032629] gvfsd-trash   D 0001ae75     0  2063      1 0x00000000
[999842.032638]  d59f7da0 00000086 00000000 0001ae75 00000000 c087c6c0 d5825c5c c087c6c0
[999842.032651]  7a06b2df 00038c42 c087c6c0 c087c6c0 d5825c5c c087c6c0 c087c6c0 dd11a200
[999842.032667]  7a04da23 00038c42 d58259b0 cb0f4078 cb0f407c ffffffff d59f7dcc c05a48a6
[999842.032678] Call Trace:
[999842.032693]  [<c05a48a6>] __mutex_lock_slowpath+0xc6/0x130
[999842.032700]  [<c05a47c5>] mutex_lock+0x25/0x40
[999842.032706]  [<c020e1fe>] real_lookup+0x2e/0x110
[999842.032712]  [<c020fc05>] do_lookup+0x95/0xc0
[999842.032718]  [<c021043d>] __link_path_walk+0x54d/0xb60
[999842.032725]  [<c020f246>] ? path_to_nameidata+0x36/0x50
[999842.032729]  [<c0210bf6>] path_walk+0x46/0xa0
[999842.032733]  [<c0210d59>] do_path_lookup+0x59/0x90
[999842.032737]  [<c02118a1>] user_path_at+0x41/0x80
[999842.032743]  [<c01304cc>] ? kmap_atomic_prot+0x4c/0xf0
[999842.032749]  [<c01e2f60>] ? __do_fault+0x3a0/0x4b0
[999842.032754]  [<c02099ca>] vfs_fstatat+0x3a/0x70
[999842.032758]  [<c0209a60>] vfs_lstat+0x20/0x30
[999842.032762]  [<c0209a89>] sys_lstat64+0x19/0x30
[999842.032768]  [<c05a7f2b>] ? do_page_fault+0x19b/0x380
[999842.032774]  [<c0237492>] ? sys_inotify_add_watch+0xc2/0x100
[999842.032779]  [<c010344c>] syscall_call+0x7/0xb
[999962.032595] INFO: task gvfsd-trash:2063 blocked for more than 120 seconds.
[999962.032601] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[999962.032606] gvfsd-trash   D 0001ae75     0  2063      1 0x00000000
[999962.032614]  d59f7da0 00000086 00000000 0001ae75 00000000 c087c6c0 d5825c5c c087c6c0
[999962.032630]  7a06b2df 00038c42 c087c6c0 c087c6c0 d5825c5c c087c6c0 c087c6c0 dd11a200
[999962.032639]  7a04da23 00038c42 d58259b0 cb0f4078 cb0f407c ffffffff d59f7dcc c05a48a6
[999962.032648] Call Trace:
[999962.032661]  [<c05a48a6>] __mutex_lock_slowpath+0xc6/0x130
[999962.032665]  [<c05a47c5>] mutex_lock+0x25/0x40
[999962.032671]  [<c020e1fe>] real_lookup+0x2e/0x110
[999962.032676]  [<c020fc05>] do_lookup+0x95/0xc0
[999962.032680]  [<c021043d>] __link_path_walk+0x54d/0xb60
[999962.032684]  [<c020f246>] ? path_to_nameidata+0x36/0x50
[999962.032689]  [<c0210bf6>] path_walk+0x46/0xa0
[999962.032693]  [<c0210d59>] do_path_lookup+0x59/0x90
[999962.032697]  [<c02118a1>] user_path_at+0x41/0x80
[999962.032702]  [<c01304cc>] ? kmap_atomic_prot+0x4c/0xf0
[999962.032708]  [<c01e2f60>] ? __do_fault+0x3a0/0x4b0
[999962.032713]  [<c02099ca>] vfs_fstatat+0x3a/0x70
[999962.032718]  [<c0209a60>] vfs_lstat+0x20/0x30
[999962.032722]  [<c0209a89>] sys_lstat64+0x19/0x30
[999962.032727]  [<c05a7f2b>] ? do_page_fault+0x19b/0x380
[999962.032733]  [<c0237492>] ? sys_inotify_add_watch+0xc2/0x100
[999962.032739]  [<c010344c>] syscall_call+0x7/0xb

Revision history for this message

psypher (psypher246) wrote on 2010-03-25:

#272

Please could the severity/importance of this bug be raised. I would consider a bug which has been prevalent for almost 3 years, which caused extreme slowdown of the entire desktop when there is any kind of high disk activity, to be pretty serious. This drastically affect my usage and productivity of the desktop on a daily basis. Up until this time I have always thought it's just how linux is and the benefit of all the other great features outweighs a bit of slow down. Well it seems to be getting worse and as I start to use even more of the potential of my PC's I am getting a little tired of it. Recently been testing ubuntuone extensively and I suspect a big portion of my extremely slow index and read issues, of the thousands of files I have in my ubuntuone folder, is actually caused by this bug. Although there are still some improvements which could be made in that process which the ubuntuone team are actively working on and doing some great work. There at least someone is working on big issues but sadly it seems only due to the commercial potential of ubuntuone.

This seems to be a kernel issue as per this bug report: https://bugzilla.kernel.org/show_bug.cgi?id=12309
But the status of that bug is confusing as it's marked as closed due to insufficient information.

Does anyone know who is working this? I have attempted to contact the person assigned as well as Ben Gamari, who I see is subscribed to this bug on launchpad as well, for some clarification. I will patiently wait their response. What testing needs to be done and what can be done from a non-developers perspective to fix this.

IMO this is the worst bug in linux right now and I think a bit more attention must be brought to it, more than just a slashdot article: http://it.slashdot.org/article.pl?sid=09/01/15/049201

I would not recommend Ubuntu or Linux to any new users until this bug is fixed. It is embarrassing when trying to praise all the benefits of using linux to a Windows user when their entire desktop locks up when trying to do simple things like run a backup or unrar a file.

In the past I have attributed slowness issues in several other applications as application specific problems. There have been bugs logged for:

Unison, Firefox, downthemall FF plugin, unrar, ubuntuone, flash, gnome, VMWare, Virtualbox, qemu and kvm etc etc.

I think all of these issues could be attributed to this one problem. There are too many apps that experience slowness and grey outs for it not to be related. this happens between disks or on the same disk, between disk type like usb, ide or sata. So we must not confuse the problem. If there is any kind of disk IO the machine freezes. As the bug above says: Large I/O operations result in slow performance and high iowait times. That is the problem as far I can tell.

I offer any help required to fix this bug. Just let me know what i need to do. But please lets raise the importance. This is a critical issue, not medium.

Thanks guys, keep up the great work, linux still rocks :)