2.6.27 SATA drives not accessible at boot time, 2.6.24 working

Bug #294123 reported by mklemm
40
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Linux
Expired
Unknown
linux (Ubuntu)
Won't Fix
Medium
Unassigned
Nominated for Hardy by Morgan Jones
Nominated for Intrepid by mklemm

Bug Description

System info:

Hardware:
Mainboard MSI 965P Neo2, Intel 965 chipset, ICH8 southbridge, chipset 82801H, One SATA Samsung Harddisk 300GB, One SATA optical drive, both connected to the chipset's SATA channels 3 and 4 (behaviour is unchanged if connected to channels 1 and 2).
SATA is running in IDE mode, AHCI mode is not supported by the hardware.

Software:
Affects all kernel versions from version 2.6.26 to 2.6.28-13-generic and (untested) maybe later versions also
Doesn't affect kernel version 2.6.24-19-generic, Ubuntu 8.04 or ubuntu 8.10, vanilla 2.6.25, or earlier versions

Symptoms:
Booting ubuntu 8.10 with the included kernel 2.6.27-7-generic from either the live CD or as a system on disk upgraded via apt-get dist-upgrade stops at the point where the root partition is to be mounted. The BusyBox initramfs shell prompt appears after a timeout period, and when listing the "/dev" directory I can see that it doesn't contain any disk related device nodes, even the "dev/disk" directory is missing.
The output of dmesg is showing that all four SATA channels are correctly recognised and initialised, but they all show a message
"SATA link down", just as if no disk was connected to them. There are no further SATA/ATA related error messages in the dmesg output.

Reproducibility:
Booting the same system with a still-installed 2.6.24-19 from Ubuntu 8.04 works without any problems. The dmesg output is mostly the same as with 2.6.27, the only notable difference is that instead of the message "ata3: SATA link down" and "ata4: SATA link down" it now show the correct initialisation messages for my drives in the same place in the dmesg output.

Reproducibility UPDATE:
Issue persists for Ubuntu 9.04 with kernel 2.6.28-13-generic, either precompiled version or locally compiled.

Reproducibility UPDATE #2:
Ussue also persists for Ubuntu 9.10 beta with kernel 2.6.31-14

Reproducibility UPDATE #3:
Issue persists for Ubuntu 10.04 with kernel 2.6.32-22-server

Manually trying to load the sd_mod module from the initramfs prompt didn't bring the missing device nodes into existence either.
Connecting a USB flash memory storage device and mounting it in the BusyBox initramfs shell works, however (in which case the necessary USB-disk-related device node is created automatically under /dev), and this way I was able to capture the dmesg output and attach it below.

The attached files are:
1: The output of lspci to further specify my hardware configuration
2: The output of dmesg from the initramfs shell of kernel 2.6.27
3: As a comparison the output of dmesg from the working kernel 2.6.24 setup

Revision history for this message
mklemm (mirko-cm-klemm) wrote :
Revision history for this message
mklemm (mirko-cm-klemm) wrote :

Here is the dmesg output captured from the initramfs shell. Note: the sda disk correctly initialized and mounted in this listing is only the USB flash memory storage device which i plugged in to capture the dmesg output... The internal drives are still unusable.

Revision history for this message
mklemm (mirko-cm-klemm) wrote :

And here is the dmesg output from a WORKING 2.6.24 kernel, just for comparison. We can see that there are no really interesting differences besides the "ata3: SATA link down" and "ata4: SATA link down" messages.

Revision history for this message
mklemm (mirko-cm-klemm) wrote :

Ifound the following thread on the interet, which seems to be related:

http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-04/msg05440.html

But for the problem they mention there, they already put in a patch for 2.6.25, so maybe if similar symptoms are still there in 2.6.27,. it is soemthing else.

It sounds as if it was a problem with the ata_piix driver

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi mklemm,

This may be related to or even a duplicate of bug 290153 . Care to take a look. There's a workaround mentioned there that you may want to try. Thanks.

Changed in linux:
status: New → Incomplete
Revision history for this message
mklemm (mirko-cm-klemm) wrote :

Thank you, I just checked the bug and the workaround, but my bug seems to be a bit different, also the workaround didn't work for me...
However, I found that when I boot wit "all_generic_ide", the system comes up. But why is this only necessary on 2.6.27 and not on 2.6.24? Also, the network hardware doesn't work then, and the keyboard configuration is, errm, "odd" (no arrow keys, no right ALT ...) but that's a different story...
And isn't there a performance penalty involved with the all_generic_ide option?

Changed in linux:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → Medium
status: Incomplete → Triaged
Revision history for this message
mklemm (mirko-cm-klemm) wrote :

Could anyone post here as soon as the problem has been reproduced by someone else? I'm feeling a bit lost currently, and I'm not sure whether the "all_generic_ide" option is a fully viable solution rather than just a workaround to get it up and running at all...

Revision history for this message
Morgan Jones (maclover201) wrote :

Yeah, I'm having this problem too. Does anyone have their system working with 2.6.27-9?

Revision history for this message
Angel Carpintero (sack) wrote :

I had this problem too as i reported in bug 289754 . Stable kernel fixes this issues from 2.6.27.5 and 2.6.28-rc3
Today i have installed 2.6.27-10-generic that solves the issue :)

Revision history for this message
Morgan Jones (maclover201) wrote :

Cool. When I get back from the mountains I'll upgrade. Does it show up in aptitude or need to be downloaded somewhere else?

Revision history for this message
mklemm (mirko-cm-klemm) wrote :

Thank you for the hint to bug 289754 .
Unfortunately, however, the issue isn't solved with 2.6.27-10-generic for me. Also, it seems that bug 289754 is about an issue with the driver for the nvidia chipset SATA controller, where in my case it is the intel ICH8 chipset driver (ata_piix)...
Also, if you look at the dmesg output attached to bug 289754 , you can see that in that case SATA error messages occured *after* the SATA link has already been reported "up". In the case here, however, there are no error messages, it simply says "SATA link down", just as if no disks were connected.

Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Revision history for this message
mklemm (mirko-cm-klemm) wrote :

Additional info:
It seems to be between kernel versions 2.6.25 and 2.6.26 where it stops working. I am tied to ata_piix, since my ICH8 chipset is a "non-raid" version, which doesn't support AHCI mode, so I have to run the controller in IDE mode.
I already tried a patch to force the chipset into AHCI mode, which supposedly works for mainboards or laptops which haven't got the BIOS option to switch between AHCI or IDE mode for the SATA controller, but this path didn't work for me.

Looking at the diffs in libata and ata_piix between 2.6.25 and 2.6.26, I can see quite a bunch of changes, but as I'm not at all familiar with ATA/IDE/SATA driver programming, I couldn't tell which change actually is responsible for the issue I'm experiencing...

Hope someone in the kernel team is already aware of that and tries to create a fix...

Revision history for this message
Morgan Jones (maclover201) wrote :

Kind of an update here-

I tried using the second-latest release of the Linux kernel - version 2.6.27.10. I compiled from source and installed. It works with the following boot options:
all_generic_ide
rootdelay=240
It does work, but takes longer to boot up. The all_generic_ide option doesn't appear to have taken any of the computer's speed away, although running on native IDE would likely be faster.
Anyway, I tried installing the 2.6.28 release of the Linux kernel as well. Not fixed there, either. It also seems not to recognize the rootdelay= option(!), which makes it time out and drop to the initramfs prompt. If I can eventually get 2.6.28 working, I will post a solution.
Due to the problem, I figure I'd better make a tutorial for getting 2.6.27.10 working for those of you that want to use it.

Open a terminal.
Type the following commands. They are interchangable for kernel versions. You need to be root, so su root or just sudo it all. Also, you need GCC. (aptitude install gcc). The make commands take a LONG time (hours), so just be patient or go do something else. Make and make modules are the longest.
cd ~
wget ftp://ftp.kernel.org/pub/linux/kernel/v2.6/linux-2.6.27.10.tar.bz2
bunzip2 -dvv linux-2.6.27.10.tar.bz2
tar xvf linux-2.6.27.10.tar
cd linux-2.6.27.10
make menuconfig (select the kernel options you want and click "Save an alternate configuration file" before exiting)
make clean
make
make modules
make modules_install
make install
cd /boot
update-initramfs -c -v -k 2.6.27.10 (takes a while)
update-grub
nano /boot/grub/menu.lst
    --- Add the following entries to both the normal and recovery modes for the new kernel (separated by spaces; 2.6.27.10).
         rootdelay=240
         all_generic_ide
    --- Also, don't forget to update your default entry if it's changed.
reboot

Hope this helped those of you that actually want a working 2.6.27 kernel. It takes a while to boot up though but works, so there is a tradeoff.

Revision history for this message
Morgan Jones (maclover201) wrote :

And also:

"Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks."

Bad news...

Revision history for this message
mklemm (mirko-cm-klemm) wrote :

Thank you Morgan for the description of the workaround. I also got my system up and running with the "all_generic_ide" option, but it feels substantially slower than under the 2.6.24 or 2.6.25 kernel, where ata_piix takes care of it.... At boot time, the disk ist "noisier", it sounds as if there are much more physical disk accesses than with ata_piix. As I said, I'm not a driver programmer, but I would assume that the ata_piix driver supports more efficient access models, maybe Native Command Queueing, and other things that can increase disk performance. The speed difference is also noticeable after logging into the gnome desktop (all the applets and menu icons are loaded much faster under 2.6.24 without "all_generic_ide"), and when compiling the kernel.

Revision history for this message
mklemm (mirko-cm-klemm) wrote :

Sorry, I must make a correction to my last statement: After having done some tests with bonnie, and looking at the hdparm output, it really doesn't seem to make too much difference whether all_generic_ide or ata_piix is used.
hdparm also shows that most features of the hard drives are enabled, and hdparm -t gives reasonably transfer rates. Only the processor load seems to be quite high, but even in tha the difference between ata_piix and all_generic_ide doesn't seem to be at all significant...

So, I think I#ll leave my system running with the all_generic_ide option, and thats it...

Revision history for this message
Morgan Jones (maclover201) wrote :

The transfer rates are reasonable, but I noticed my system seems to run really slow...

I got it working without all_generic_ide or ata_piix though!

Delete all_generic_ide and increase your rootdelay= to 240. It takes a little longer to boot (negligible) with no all_generic_ide, but seems to run faster.

I'll get back to you on 2.6.28.

Revision history for this message
Morgan Jones (maclover201) wrote :

By the way, I filed a bug report for 2.6.28 on the rootdelay problem. If they acknowledge it, which I've found that others are having the same problem, hopefully they'll make a patch up to 2.6.28-1 once they have enough fixes. If 2.6.28 STILL doesn't work once the rootdelay thing is sorted out (which it isn't already; it should) I will file yet another bug report to the devs. In fact, I probably will tonight.

Revision history for this message
Morgan Jones (maclover201) wrote :

I also filed a bug report for this bug.

This bug: http://bugzilla.kernel.org/show_bug.cgi?id=12470
Root delay bug: http://bugzilla.kernel.org/show_bug.cgi?id=12452

Revision history for this message
mklemm (mirko-cm-klemm) wrote :

Sorry that I dare to express my disagreement with your analysis, Morgan. I think we're in fact seeing two different issues here.
When I look into the dmesg output you posted to the kernel bug report, I can see "SRST failed" messages from the kernel driver, which - as you wrote - in fact seem to indicate a timeout problem.

However, in my dmesg output, I don't see any error messages - the "SATA link down" messages just suggest that there are no drives connected at all, which obviously isn't true, since I can boot with 2.6.24 (or even 2.6.25) without any issues. So, in the end, I think although symptoms are quite the same, the low-level cause of this might be somewhat different...

Revision history for this message
Morgan Jones (maclover201) wrote :

What kernel version? 2.6.27.10_generic or custom compiled? This is very strange...

Changed in linux:
status: Unknown → In Progress
Revision history for this message
mklemm (mirko-cm-klemm) wrote :

My kernel is the 2.6.27-10-generic from the latest intrepid update... It was like this from 2.6.27-7-generic onward, and the same with a custom compiled 2.6.26 "vanilla" kernel. A custom compiled 2.6.25 worked, however, as well as the standard intrepid "linux-ports-2.6.25".

Revision history for this message
Steinar Bang (sb-dod) wrote :

I've encountered this problem after upgrading from 8.04 to 8.10.

The 2.6.27-11-generic and 2.6.27-9-generic kernels installed by 8.10 fails in exactly the manner described in this bug report, while the 2.6.24-23-generic kernel installed by 8.04 works.

Revision history for this message
Steinar Bang (sb-dod) wrote :

Forgot to say. This is a computer with an Intel Corporation 82371 AB/EB/MB PIIX4 IDE (rev01) IDE controller, and a single ATA (not SATA) hard disk, a CDROM drive, and a floppy drive.

Revision history for this message
Bryan Wu (cooloney) wrote :

Unfortunately it seems this bug is still an issue. Can you confirm this issue exists with the most recent Jaunty Jackalope 9.04 release - http://www.ubuntu.com/news/ubuntu-9.04-desktop . If the issue remains in Jaunty, Please run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux-image-2.6.28-11-generic <bug #>

If you could also test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine this issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance

Changed in linux (Ubuntu):
status: Triaged → Incomplete
tags: added: needs-kernel-logs needs-upstream-testing
Revision history for this message
mklemm (mirko-cm-klemm) wrote :

Yes, this issue still exists in 9.04 with kernel 2.6.28-13
Plus: The all_generic_ide option isn't supported anymore, maybe the generic ide driver was excluded from the standard kernel build.
Kinda annoying this issue...

mklemm (mirko-cm-klemm)
tags: removed: needs-upstream-testing
description: updated
mklemm (mirko-cm-klemm)
description: updated
tags: added: 2.6.28 2.6.28-13-generic 9.04
Revision history for this message
mklemm (mirko-cm-klemm) wrote :

Also affects Karmic beta with kernel 2.6.31-14-generic - exactly the dame results and dmesg output. Also, all_generic_ide doesn't work for this distribution, so the only known workaround doesn't work anymore...

tags: added: 2.6.31 2.6.31-14-generic 9.10
description: updated
Revision history for this message
mklemm (mirko-cm-klemm) wrote :

linux-kernel-bugs #12470 IMHO describes a different issue (timeout), when you look at the dmesg output, you will find that in #12470 the controller responds with a message that the drives are slow to respnd, whilc in this bug here the controller doesn't even make an attempt to connecto to a drive and simply says "SATA link down"

Revision history for this message
mklemm (mirko-cm-klemm) wrote :

I just did som investigation of the source code of ata_piix.c and although I'm not familiar with driver programming I think the issue is that ata_piix sees my sata controller, which is in IDE mode, and cannot be switched to AHCI, as a controller in AHCI mode.
In the dmesg output I see

MAP [P0 P2 P1 P3]

whereas in the sourcecode in kernel 2.6.31.4 drivers/ata/ata_piix.x, line 394, it looks like it should actually read

MAP [P0 P2 IDE IDE]

for a controller in IDE mode, according to the source comments.

Is there a way to force the ata_piix driver into IDE mode in some way?

Revision history for this message
mklemm (mirko-cm-klemm) wrote :

I just found a workaround for 2.6.31-14:

with the kernel parameter

libata.force=1:norst,2:norst,3:norst,4:norst

all the disks come up working with the ata_piix driver.

During boot, there is an error message from the kernel, but this doesn't seem to hurt...

Revision history for this message
PabloRQ (pablo-romeroquinteros) wrote :

Same bug.
Installed Ubuntu 9.10 through wubi in a Dell Latitude D520.

I've booted with no acpi options and I could boot.

How to do it:
- In grub, select the option "Ubuntu, Linux 2.6.31-14-generic"
- Edit the options
- Erase 'quiet splash'
- Add 'noapic nolapic noacpi acpi=off apic=off' (some of them are useless, but i never remember wich!)
- Ctrl+x to boot

Regards

Revision history for this message
Lord_Alba (lupo-alberto75) wrote :

Hi All,

I have the same problem on a Dell Inspiron 6400 with an Intel SATA controller 82801G (ICH7 Family) on a Seagate SATA Disk ST96812AS.

i'm gonna write here my experience, hoping that this can help having this bad bug solved as soon as possible

yesterday i executed the dist-upgrade from kubuntu 9.04 to 9.10 (last time i installed kubuntu from the install cd was more than 2 yrs ago)

upgrade ran fine till kernel install, then failed and adviced me to revert to 9.04
i just did a reboot, system wss still funcioning but had lost sound card, touchpad was not working, and was extremly slow

moreover, kde notified me with a lot of broked dependencies.

in any case, i rebooted another time (now using the new 2.6.31 - i had to manually edit menu.lst and insert the new kern entries), and system
was more or less usable again.

quite annoyed by this problems, i decided to do a fresly brand new kubuntu 9.10 install.
downloaded the i386 plain version, but it could not see the disk.
so i tried out the alternate cd, but again no joy.
also with knoppix (ver 6, using 2.6.28) disk cannot be seen by livecd.

with the kubuntu alternate cd i tried to: (list not complete)

1) append rootdelay=90 at kernel boot
2) append libata.force=1:norst,2:norst,3:norst,4:norst at kernel boot
3) append ata_piix.blacklist=yes
4) disable acpi ecc, as F6 suggest

tried also to manually insmod more different drivers, analize and compare (through multiple reboots) between kubuntu installed and seeing my harddisk and all the livecds which cannot see it (or better, in /dev/.udev/link there is a reference to the device, but cannot made it useful)
... tried to understand how udev works, googled a lot in multiple directions ... but nothing :(

[POLEMIC MODE]

i'm a linux enthusiast, and for sure this won't make me decide to change to winz ... but how can u think a plain user can react to such a problem?
i don't think this is a great adv for our beloved kern+OS :(
i was quite angry wasting my sat trying to solve a problem caused by a supposed supersecure upgrade (blame on kubuntu), then i started thinking that linux is wonderful, but it would be better if it won't lose time with "strange" arch, giving some more love to the longlasting i386 bugs (blame on linux) ...

[/POLEMIC MODE]

... finally i decided to write in launchpad and report this bug (i never do it, blame on me!)

Alberto

Revision history for this message
mklemm (mirko-cm-klemm) wrote :

Problem persists for Lucid, see updated description above

tags: added: 10.04 2.6.32-22-generic 2.6.32-22-server jaunty karmic lucid
description: updated
Revision history for this message
mklemm (mirko-cm-klemm) wrote :

Seems that the reset issued by the driver on init somehow "disconnects" the drives. At least, that's my interpretation of the fact that "libata.force=1:norst" makes it work...
Does anyone have a more technically funded explanation of this? Maybe one of the developers of ata_piix ?

Revision history for this message
Brad Figg (brad-figg) wrote : Unsupported series, setting status to "Won't Fix".

This bug was filed against a series that is no longer supported and so is being marked as Won't Fix. If this issue still exists in a supported series, please file a new bug.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: Incomplete → Won't Fix
Changed in linux:
status: In Progress → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.