How do I fix a boot error: "failed command: READ FPDMA QUEUED"

Asked by Daniel Day on 2010-08-24

Using the Lucid 10.04.1 release, a fresh install, on a Gigabyte GA-890GPA-UD3H motherboard. My SATA controllers are configured as AHCI (although same happens when set as native IDE).

This error comes up several dozen times during boot until it exceeds an error count; and sometimes on one drive/sata channel, sometimes on another. It always occurs, I have rebooted about a dozen times today trying various things; but the system does boot successfully. From my dmesg log file:

[ 2.431135] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
[ 2.431164] ata1.00: irq_stat 0x40000001
[ 2.431191] ata1.00: failed command: READ FPDMA QUEUED
[ 2.431219] ata1.00: cmd 60/08:00:00:2e:ba/00:00:03:00:00/40 tag 0 ncq 4096 in
[ 2.431220] res 41/14:00:00:2e:ba/00:00:03:00:00/40 Emask 0x481 (invalid argument) <F>
[ 2.431275] ata1.00: status: { DRDY ERR }
[ 2.431300] ata1.00: error: { IDNF ABRT }
[ 2.432190] ata1.00: configured for UDMA/133
[ 2.432222] ata1: EH complete

The final iteration is:
[ 5.016932] EXT4-fs error (device sda1): ext4_find_entry: reading directory #262261 offset 0
[ 5.077636] ata1: failed to read log page 10h (errno=-5)
[ 5.077665] ata1.00: NCQ disabled due to excessive errors
[ 5.077667] ata1.00: exception Emask 0x1 SAct 0x3 SErr 0x0 action 0x6 frozen
[ 5.077693] ata1.00: irq_stat 0x40000001
[ 5.077719] ata1.00: failed command: READ FPDMA QUEUED
[ 5.077747] ata1.00: cmd 60/28:00:e0:11:80/00:00:02:00:00/40 tag 0 ncq 20480 in
[ 5.077748] res 51/04:08:80:2b:ba/00:00:03:00:00/40 Emask 0x1 (device error)
[ 5.077801] ata1.00: status: { DRDY ERR }
[ 5.077826] ata1.00: error: { ABRT }
[ 5.077851] ata1.00: failed command: READ FPDMA QUEUED
[ 5.077878] ata1.00: cmd 60/08:08:80:2b:ba/00:00:03:00:00/40 tag 1 ncq 4096 in
[ 5.077879] res 51/04:08:80:2b:ba/00:00:03:00:00/40 Emask 0x1 (device error)
[ 5.077934] ata1.00: status: { DRDY ERR }
[ 5.077958] ata1.00: error: { ABRT }
[ 5.077985] ata1: hard resetting link
[ 5.720028] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 5.723447] ata1.00: configured for UDMA/133
[ 5.723453] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 5.723455] sd 0:0:0:0: [sda] Sense Key : Aborted Command [current] [descriptor]
[ 5.723458] Descriptor sense data with sense descriptors (in hex):
[ 5.723459] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 5.723463] 03 ba 2b 80
[ 5.723465] sd 0:0:0:0: [sda] Add. Sense: No additional sense information
[ 5.723467] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 02 80 11 e0 00 00 28 00
[ 5.723471] end_request: I/O error, dev sda, sector 41947616
[ 5.723513] ata1: EH complete
[ 5.723515] EXT4-fs error (device sda1): __ext4_get_inode_loc: unable to read inode block - inode=1317328, block=5243324

I'm totally new here and I hope this is the correct forum for this question, if not, my apologies.
Cheers and thanks in advance...
Dan

Question information

Language:
English Edit question
Status:
Solved
For:
Ubuntu Edit question
Assignee:
No assignee Edit question
Solved by:
actionparsnip
Solved:
2010-08-26
Last query:
2010-08-26
Last reply:
2010-08-24

https://bugs.launchpad.net/ubuntu/+bug/550559

Looks like disabling NCQ is a way forward. Sounds like a Marvel controller issue / bug.

Daniel Day (daniel-day) said : #2

Parsnip, thank you for your help!
OK, I'll give that a go. I did see that solution after some web searching, but I noted that after the boot disabled NCQ - see dmesg log above, line 5.077665- the error still came up 3 times more, so I thought I'd ask if there were other resolutions.
I found a command to disable NCQ- from the Wiki entry on Libata at https://ata.wiki.kernel.org/index.php/Libata_FAQ:
$ echo 1 > /sys/block/sdX/device/queue_depth

I wonder if you could help me, where do I put it (so to speak) so that it runs every boot?

Cheers,
Dan

Daniel Day (daniel-day) said : #3

BTW, this motherboard uses the AMD SB850 for SATA controller; and the problem has come up on my ssd system disk (OCZ Agility 30GB), my data disk (Hitachi P7K 500 GB) and my backup disk (an old Hitachi 2.5" 250 GB), not sure why it switches.
Dan

Daniel Day (daniel-day) said : #4

OK, I have puzzled it out... tried 3 methods and the third worked for me. Disabling NCQ on my SSD system disk (OCZ Agility) with the kernel command line solved the problem. Funny that the errors showed up first on my Hitachi spinning disk; then when I took it out, switched to the ssd. Oh well.

For other newbies with this problem:

The actual command is inserted in /etc/default/grub:

GRUB_CMDLINE_LINUX="libata.force=1.00:noncq"

where 1.00 is the ATA port and device number of the misbehaving drive.

Then in terminal run: "sudo update-grub" to make the changes.
Check the /boot/grub/grub.cfg file to see that the changes are in there. Presto, reboot!

New log results, see /var/log/dmesg:

[ 1.700190] ata1.00: FORCE: horkage modified (noncq)
[ 1.700368] ata1.00: HPA unlocked: 62531183 -> 62533296, native 62533296
[ 1.700395] ata1.00: ATA-8: OCZ-AGILITY, 1.6, max UDMA/133
[ 1.700421] ata1.00: 62533296 sectors, multi 1: LBA48 NCQ (not used)
[ 1.701750] ata2.00: HPA unlocked: 488388911 -> 488397168, native 488397168
[ 1.701972] ata2.00: ATA-8: Hitachi HTS543225L9A300, FBEOC40C, max UDMA/133
[ 1.701999] ata2.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[ 1.702249] ata1.00: configured for UDMA/133
[ 1.703265] ata2.00: configured for UDMA/133
[ 1.711655] ata5.00: ATA-8: Hitachi HDP725050GLA360, GM4OA50E, max UDMA/133
[ 1.711683] ata5.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[ 1.712694] ata5.00: configured for UDMA/133

Note: I only had to disable NCQ for the SSD (ATA1), no idea why the problem was showing up prior on the Hitachi drive (ATA5).

Daniel Day (daniel-day) said : #5

Thanks actionparsnip, that solved my question.

Great share, hopefully this will help others.

Glad you got the gold.

Daniel Day (daniel-day) said : #7

Today I have the boot issue again. It looks like the order that udev starts the controllers (if that's the right term) is somewhat arbitrary, so the ATA port assignments shift. Today, my SSD is ata3, not ata1, and so it started with NCQ and I got the errors and the boot delay.

My question now is, does anyone know of a way to disable NCQ with an option or rule for udev, so that it applies to all devices?

Here are two consecutive boots:
This one (yesterday) starts up AHCI first and gave the J-Micron PATA controller ata7&8:

[ 1.383604] udev: starting version 151
[ 1.403300] ahci 0000:00:11.0: version 3.0
[ 1.403311] alloc irq_desc for 19 on node 0
[ 1.403313] alloc kstat_irqs on node 0
[ 1.403322] ahci 0000:00:11.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[ 1.403391] alloc irq_desc for 27 on node 0
[ 1.403392] alloc kstat_irqs on node 0
[ 1.403400] ahci 0000:00:11.0: irq 27 for MSI/MSI-X
[ 1.403481] ahci 0000:00:11.0: AHCI 0001.0200 32 slots 6 ports 6 Gbps 0x3f impl SATA mode
[ 1.403511] ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part

This one (today) assigned the J-Micron first and gave it ata 1&2:
[ 1.383445] udev: starting version 151
[ 1.404241] pata_jmicron 0000:05:00.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
[ 1.404299] pata_jmicron 0000:05:00.1: setting latency timer to 64
[ 1.405248] ahci 0000:00:11.0: version 3.0
[ 1.405257] alloc irq_desc for 19 on node 0
[ 1.405258] alloc kstat_irqs on node 0
[ 1.405266] ahci 0000:00:11.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[ 1.405331] alloc irq_desc for 27 on node 0
[ 1.405332] alloc kstat_irqs on node 0
[ 1.405339] ahci 0000:00:11.0: irq 27 for MSI/MSI-X
[ 1.405417] ahci 0000:00:11.0: AHCI 0001.0200 32 slots 6 ports 6 Gbps 0x3f impl SATA mode
[ 1.405447] ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part

So because the grub command is specific to a numbered port (ata1), my boot problem reappears since the SSD is now on ata3.
More weird.

Desmond Utara (dutara) said : #8

It's been a while since this thread was active, but it helped me out today.
> The actual command is inserted in /etc/default/grub:
> GRUB_CMDLINE_LINUX="libata.force=1.00:noncq"

This did it for me on Ubuntu 10.04

I had a world of weirdness with 11.04 and have gone retro to 10.04. I'll wait and see how 12.04 looks next year ;^)
I also realized something with my hardware (a CQ40 laptop with Hitachi ATA), that it didn't take well to the journaled file systems, so rolling back to the ext2 file system cleared up some grief, and with this fix I can now boot Ubuntu in less than a minute. Thanks DD!
Yeah, I should take this knowledge and try 11.04 again, but after having a 2-week hobby of re-installing and picking the nits out of the thing and having it go to doo-doo every time I think that's enough.

I ran into the same issue after upgrading from 14.04 to 16.04 64 bit and the line mentioned above did the trick.
Thank you very much for it. That shortens the booting at least by 30 seconds.