Boot crash with Trusty 3.13

Bug #1757193 reported by Juerg Haefliger
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
High
Unassigned
Trusty
Fix Released
High
Juerg Haefliger

Bug Description

== SRU Justification ==
Custom compilation of the Trusty 3.13 kernel codebase results in a (reproducible) QEMU boot crash (see below).

== Fix ==
Replace UBUNTU SAUCE patch with proper upstream commit:
548acf19234d ("x86/mm: Expand the exception table logic to allow new handling options")

== Regression Potential ==
Medium. The patch is quite large but the backport was a simple context adjustment. Ran the x86 selftests and perf NMI tests for several hours to verify stability.

== Test Case ==
Compile the Trusty 3.13 kernel code using the default config (make defconfig) and run the resulting kernel in QMEU. Crashes every time.

Original bug description:

While doing kernel testing using the Trusty 3.13 code base, I get the following boot crash with QEMU:

[ 0.338393] BUG: unable to handle kernel paging request at ffffffff014142f0
[ 0.338987] IP: [<ffffffff014142f0>] 0xffffffff014142f0
[ 0.339388] PGD 180f067 PUD 0
[ 0.339388] Oops: 0010 [#1] SMP
[ 0.339388] Modules linked in:
[ 0.339388] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.11-ckt39-trusty #6
[ 0.339388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 0.339388] task: ffff88003f708000 ti: ffff88003f6fa000 task.ti: ffff88003f6fa000
[ 0.339388] RIP: 0010:[<ffffffff014142f0>] [<ffffffff014142f0>] 0xffffffff014142f0
[ 0.339388] RSP: 0000:ffff88003f6fbe98 EFLAGS: 00050246
[ 0.339388] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 0.339388] RDX: 0000000000000000 RSI: ffff88003deb9eb4 RDI: ffffffff818b8590
[ 0.339388] RBP: ffff88003f6fbe98 R08: 0000000000000000 R09: ffff88003fa14ae0
[ 0.339388] R10: ffffffff81264c68 R11: ffffea0000fdd000 R12: ffffffff818b8590
[ 0.339388] R13: 00000000000000ad R14: 0000000000000000 R15: 0000000000000000
[ 0.339388] FS: 0000000000000000(0000) GS:ffff88003fa00000(0000) knlGS:0000000000000000
[ 0.339388] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.339388] CR2: ffffffff014142f0 CR3: 000000000180c000 CR4: 0000000000360770
[ 0.339388] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.339388] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.339388] Stack:
[ 0.339388] ffff88003f6fbf08 ffffffff81000402 ffff88003f6fbf00 ffffffff81065f88
[ 0.339388] ffff88003f6fbef0 ffff88003ffd96a1 ffffffff817e9d28 000000ad00060006
[ 0.339388] ffffffff817b013d ffffffff8196cef0 ffffffff8196d018 0000000000000006
[ 0.339388] Call Trace:
[ 0.339388] [<ffffffff81000402>] do_one_initcall+0xf2/0x140
[ 0.339388] [<ffffffff81065f88>] ? parse_args+0x1e8/0x320
[ 0.339388] [<ffffffff8189df8f>] kernel_init_freeable+0x14c/0x1d1
[ 0.339388] [<ffffffff8189d842>] ? do_early_param+0x88/0x88
[ 0.339388] [<ffffffff813fac20>] ? rest_init+0x80/0x80
[ 0.339388] [<ffffffff813fac29>] kernel_init+0x9/0x120
[ 0.339388] [<ffffffff8140fcae>] ret_from_fork+0x6e/0xa0
[ 0.339388] [<ffffffff813fac20>] ? rest_init+0x80/0x80
[ 0.339388] Code: Bad RIP value.
[ 0.339388] RIP [<ffffffff014142f0>] 0xffffffff014142f0
[ 0.339388] RSP <ffff88003f6fbe98>
[ 0.339388] CR2: ffffffff014142f0
[ 0.339388] ---[ end trace a71242bdac7e8632 ]---
[ 0.339388] note: swapper/0[1] exited with preempt_count 1
[ 0.357079] swapper/0 (1) used greatest stack depth: 5424 bytes left
[ 0.357539] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[ 0.357539]
[ 0.358073] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)

Git bisect identified the following commit as the culprit:
commit 56764fdc3a847371531b8044155c70412fc5be76
Author: Andy Whitcroft <email address hidden>
Date: Thu Feb 22 11:24:00 2018 +0100

    UBUNTU: SAUCE: x86, extable: fix uaccess fixup detection

    BugLink: http://bugs.launchpad.net/bugs/1750786

    The existing code intends to identify a subset of fixups which need
    special handling, uaccess related faults need to record the failure.
    This is done by adjusting the fixup code pointer by a (random) constant
    0x7ffffff0. This is detected in fixup_exception by comparing the two
    pointers. The intent of this code is to detect the the delta between
    the original code and its fixup code being greater than the constant.
    However, the code as written triggers undefined comparison behaviour.
    In this kernel this prevents the condition triggering, leading to panics
    when jumping to the corrupted fixup address.

    Convert the code to better implement the intent. Convert both of the
    offsets to final addresses and compare the delta between those. Also add
    a massive comment to explain all of this including the implicit assumptions
    on order of the segments that this comparison implies.

    Fixes: 706276543b69 ("x86, extable: Switch to relative exception table entries")
    Signed-off-by: Andy Whitcroft <email address hidden>
    Acked-by: Colin Ian King <email address hidden>
    Acked-by: Khalid Elmously <email address hidden>
    Signed-off-by: Kleber Sacilotto de Souza <email address hidden>

CVE References

Revision history for this message
Juerg Haefliger (juergh) wrote :

The upstream commit that fixes 706276543b69 ("x86, extable: Switch to relative exception table entries") (which the problematic commit tries to do as well) is:

ommit 548acf19234dbda5a52d5a8e7e205af46e9da840
Author: Tony Luck <email address hidden>
Date: Wed Feb 17 10:20:12 2016 -0800

    x86/mm: Expand the exception table logic to allow new handling options

    Huge amounts of help from Andy Lutomirski and Borislav Petkov to
    produce this. Andy provided the inspiration to add classes to the
    exception table with a clever bit-squeezing trick, Boris pointed
    out how much cleaner it would all be if we just had a new field.

    Linus Torvalds blessed the expansion with:

      ' I'd rather not be clever in order to save just a tiny amount of space
        in the exception table, which isn't really criticial for anybody. '

    The third field is another relative function pointer, this one to a
    handler that executes the actions.

    We start out with three handlers:

     1: Legacy - just jumps the to fixup IP
     2: Fault - provide the trap number in %ax to the fixup code
     3: Cleaned up legacy for the uaccess error hack

    Signed-off-by: Tony Luck <email address hidden>
    Reviewed-by: Borislav Petkov <email address hidden>
    Cc: Linus Torvalds <email address hidden>
    Cc: Peter Zijlstra <email address hidden>
    Cc: Thomas Gleixner <email address hidden>
    Link: http://lkml.kernel.org/r/f6af78fcbd348cf4939875cfda9c19689b5e50b8<email address hidden>
    Signed-off-by: Ingo Molnar <email address hidden>

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1757193

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Changed in linux (Ubuntu Trusty):
status: New → Triaged
Changed in linux (Ubuntu):
status: Incomplete → Triaged
Changed in linux (Ubuntu Trusty):
importance: Undecided → High
Changed in linux (Ubuntu):
importance: Undecided → High
Juerg Haefliger (juergh)
Changed in linux (Ubuntu Trusty):
assignee: nobody → Juerg Haefliger (juergh)
Juerg Haefliger (juergh)
description: updated
Revision history for this message
Stefan Bader (smb) wrote :

I am assuming this is only needed for trusty and so invalid for the development task.

Changed in linux (Ubuntu):
status: Triaged → Invalid
Changed in linux (Ubuntu Trusty):
status: Triaged → Fix Committed
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'. If the problem still exists, change the tag 'verification-needed-trusty' to 'verification-failed-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Confirmed to be fixed with Trusty kernel linux 3.13.0-145.194.

tags: added: verification-done-trusty
removed: verification-needed-trusty
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.13.0-145.194

---------------
linux (3.13.0-145.194) trusty; urgency=medium

  * linux: 3.13.0-145.194 -proposed tracker (LP: #1761430)

  * intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-
    image-4.13.0-37-generic) (LP: #1759920) // CVE-2017-5715 (Spectre v2 Intel)
    - Revert "UBUNTU: SAUCE: x86/mm: Only set IBPB when the new thread cannot
      ptrace current thread"
    - x86/speculation: Use Indirect Branch Prediction Barrier in context switch

  * DKMS driver builds fail with: Cannot use CONFIG_STACK_VALIDATION=y, please
    install libelf-dev, libelf-devel or elfutils-libelf-devel (LP: #1760876)
    - [Packaging] include the retpoline extractor in the headers

  * retpoline hints: primary infrastructure and initial hints (LP: #1758856)
    - [Packaging] retpoline-extract: flag *0xNNN(%reg) branches
    - x86/speculation, objtool: Annotate indirect calls/jumps for objtool
    - x86/speculation, objtool: Annotate indirect calls/jumps for objtool on 32bit
    - x86/paravirt, objtool: Annotate indirect calls
    - x86/asm: Stop depending on ptrace.h in alternative.h
    - [Packaging] retpoline -- add safe usage hint support
    - [Packaging] retpoline-check -- only report additions
    - [Packaging] retpoline -- widen indirect call/jmp detection
    - [Packaging] retpoline -- elide %rip relative indirections
    - [Packaging] retpoline -- clear hint information from packages
    - SAUCE: modpost: add discard to non-allocatable whitelist
    - KVM: x86: Make indirect calls in emulator speculation safe
    - KVM: VMX: Make indirect call speculation safe
    - x86/boot, objtool: Annotate indirect jump in secondary_startup_64()
    - SAUCE: early/late -- annotate indirect calls in early/late initialisation
      code
    - SAUCE: vga_set_mode -- avoid jump tables
    - [Config] retpoline -- switch to new format
    - [Packaging] retpoline hints -- handle missing files when RETPOLINE not
      enabled
    - [Packaging] final-checks -- remove check for empty retpoline files

  * retpoline: ignore %cs:0xNNN constant indirections (LP: #1752655)
    - [Packaging] retpoline -- elide %cs:0xNNNN constants on i386

  * Boot crash with Trusty 3.13 (LP: #1757193)
    - Revert "UBUNTU: SAUCE: x86, extable: fix uaccess fixup detection"
    - x86/mm: Expand the exception table logic to allow new handling options

  * Segmentation fault in ldt_gdt_64 (LP: #1755817) // CVE-2017-5754
    - x86/kvm: Rename VMX's segment access rights defines
    - x86/signal/64: Fix SS if needed when delivering a 64-bit signal

 -- Kleber Sacilotto de Souza <email address hidden> Thu, 05 Apr 2018 16:26:39 +0200

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.