Linux 4.15.0-23 crashes during the boot process with a "Unable to handle kernel NULL pointer dereference" message

Bug #1777338 reported by Adam Novak
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Joseph Salisbury
Bionic
Fix Released
High
Joseph Salisbury

Bug Description

== SRU Justification ==
Mainline commit 1f50ddb4f418 introduced a regression. That commit added
speculative_store_bypass_ht_init() to the per-CPU initialization sequence.
However, speculative_store_bypass_ht_init() needs to be called on each CPU for
PV guests, as well.

The regresssion prevents systems from booting.

The patch to fix this regression has also been cc'd to upstream stable,
but it has not landed in Bionic as of yet.

== Fix ==
74899d92e666 ("x86/xen: Add call of speculative_store_bypass_ht_init() to PV paths")

== Regression Potential ==
Low. This patch fixes a current regressionThis patch has also been
submitted to upstream stable, so it has had additional upstream review.

== Test Case ==
A test kernel was built with this patch and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

I went to boot my computer today and it wouldn't boot.

I get an "Unable to handle kernel NULL pointer dereference" message during the boot process, and, a bit after that, a message from the kernel watchdog about CPU #0 being stuck. Then the boot process stops completely.

I was able to boot the system by telling Grub to load 4.15.0-22, which works perfectly fine, so there has been a regression.

I am running Ubuntu as a Xen dom0, if that matters. I haven't tried booting the offending kernel version without Xen.

I'm not sure where, if anywhere, these messages go on disk, for posting.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-23-generic 4.15.0-23.25
ProcVersionSignature: Ubuntu 4.15.0-22.24-generic 4.15.17
Uname: Linux 4.15.0-22-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.9-0ubuntu7.1
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Sun Jun 17 10:12:58 2018
InstallationDate: Installed on 2017-08-06 (314 days ago)
InstallationMedia: Ubuntu 17.04 "Zesty Zapus" - Release amd64 (20170412)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-signed
UpgradeStatus: Upgraded to bionic on 2018-05-29 (19 days ago)

CVE References

Revision history for this message
Adam Novak (interfect) wrote :
Revision history for this message
Adam Novak (interfect) wrote :

This also affects the mainline kernel build 4.17.0-041700.201806041953 that I was testing for another bug.

I've attached a photo of the screen with the issue occurring, in that version.

Booting not under Xen seems to work around the issue, and the system comes up, but that's not useful for me because I need Xen.

affects: linux-signed (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: Undecided → High
status: New → In Progress
Changed in linux (Ubuntu Bionic):
importance: Undecided → High
status: New → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you see if this issue exists in the 4.15.0-24 kernel:
https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/15010188

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Also a good idea to update the BIOS.

Revision history for this message
Adam Novak (interfect) wrote : Re: [Bug 1777338] Re: Linux 4.15.0-23 crashes during the boot process with a "Unable to handle kernel NULL pointer dereference" message

I updated my BIOS and tested the -24 kernel, as was recommended. It
definitely doesn't work any better. It still has the null dereference
problem, and then it prints a bunch of smp_call_function_too_many errors,
apparently forever.

On Tue, Jun 19, 2018, 08:11 Joseph Salisbury <email address hidden>
wrote:

> Can you see if this issue exists in the 4.15.0-24 kernel:
>
> https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/15010188
>
> Note about installing test kernels:
> • If the test kernel is prior to 4.15(Bionic) you need to install the
> linux-image and linux-image-extra .deb packages.
> • If the test kernel is 4.15(Bionic) or newer, you need to install the
> linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.
>
> Thanks in advance!
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1777338
>
> Title:
> Linux 4.15.0-23 crashes during the boot process with a "Unable to
> handle kernel NULL pointer dereference" message
>
> Status in linux package in Ubuntu:
> In Progress
> Status in linux source package in Bionic:
> In Progress
>
> Bug description:
> I went to boot my computer today and it wouldn't boot.
>
> I get an "Unable to handle kernel NULL pointer dereference" message
> during the boot process, and, a bit after that, a message from the
> kernel watchdog about CPU #0 being stuck. Then the boot process stops
> completely.
>
> I was able to boot the system by telling Grub to load 4.15.0-22, which
> works perfectly fine, so there has been a regression.
>
> I am running Ubuntu as a Xen dom0, if that matters. I haven't tried
> booting the offending kernel version without Xen.
>
> I'm not sure where, if anywhere, these messages go on disk, for
> posting.
>
> ProblemType: Bug
> DistroRelease: Ubuntu 18.04
> Package: linux-image-4.15.0-23-generic 4.15.0-23.25
> ProcVersionSignature: Ubuntu 4.15.0-22.24-generic 4.15.17
> Uname: Linux 4.15.0-22-generic x86_64
> NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
> ApportVersion: 2.20.9-0ubuntu7.1
> Architecture: amd64
> CurrentDesktop: ubuntu:GNOME
> Date: Sun Jun 17 10:12:58 2018
> InstallationDate: Installed on 2017-08-06 (314 days ago)
> InstallationMedia: Ubuntu 17.04 "Zesty Zapus" - Release amd64 (20170412)
> ProcEnviron:
> TERM=xterm-256color
> PATH=(custom, no user)
> XDG_RUNTIME_DIR=<set>
> LANG=en_US.UTF-8
> SHELL=/bin/bash
> SourcePackage: linux-signed
> UpgradeStatus: Upgraded to bionic on 2018-05-29 (19 days ago)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777338/+subscriptions
>

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I started a kernel bisect between v4.15.0-22 and v4.15.0-23. The kernel bisect will require testing of about 7-10 test kernels.

I built the first test kernel, up to the following commit:
8eca6add0defde203282476d7969a7c13bbd7d91

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1777338

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Adam Novak (interfect) wrote :

Hello,

Thanks for putting this together. I tried to get the kernels you
built, but I'm having trouble establishing a secure (https) connection
to kernel.ubuntu.com. It looks like the server might not offer https.

Can you give me the expected hashes of the files so I can verify the downloads?

Thanks,
-Adam

On Wed, Jun 27, 2018 at 10:34 AM, Joseph Salisbury
<email address hidden> wrote:
> I started a kernel bisect between v4.15.0-22 and v4.15.0-23. The kernel
> bisect will require testing of about 7-10 test kernels.
>
> I built the first test kernel, up to the following commit:
> 8eca6add0defde203282476d7969a7c13bbd7d91
>
> The test kernel can be downloaded from:
> http://kernel.ubuntu.com/~jsalisbury/lp1777338
>
> Can you test that kernel and report back if it has the bug or not? I
> will build the next test kernel based on your test results.
>
>
> Thanks in advance
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1777338
>
> Title:
> Linux 4.15.0-23 crashes during the boot process with a "Unable to
> handle kernel NULL pointer dereference" message
>
> Status in linux package in Ubuntu:
> In Progress
> Status in linux source package in Bionic:
> In Progress
>
> Bug description:
> I went to boot my computer today and it wouldn't boot.
>
> I get an "Unable to handle kernel NULL pointer dereference" message
> during the boot process, and, a bit after that, a message from the
> kernel watchdog about CPU #0 being stuck. Then the boot process stops
> completely.
>
> I was able to boot the system by telling Grub to load 4.15.0-22, which
> works perfectly fine, so there has been a regression.
>
> I am running Ubuntu as a Xen dom0, if that matters. I haven't tried
> booting the offending kernel version without Xen.
>
> I'm not sure where, if anywhere, these messages go on disk, for
> posting.
>
> ProblemType: Bug
> DistroRelease: Ubuntu 18.04
> Package: linux-image-4.15.0-23-generic 4.15.0-23.25
> ProcVersionSignature: Ubuntu 4.15.0-22.24-generic 4.15.17
> Uname: Linux 4.15.0-22-generic x86_64
> NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
> ApportVersion: 2.20.9-0ubuntu7.1
> Architecture: amd64
> CurrentDesktop: ubuntu:GNOME
> Date: Sun Jun 17 10:12:58 2018
> InstallationDate: Installed on 2017-08-06 (314 days ago)
> InstallationMedia: Ubuntu 17.04 "Zesty Zapus" - Release amd64 (20170412)
> ProcEnviron:
> TERM=xterm-256color
> PATH=(custom, no user)
> XDG_RUNTIME_DIR=<set>
> LANG=en_US.UTF-8
> SHELL=/bin/bash
> SourcePackage: linux-signed
> UpgradeStatus: Upgraded to bionic on 2018-05-29 (19 days ago)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777338/+subscriptions

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Here are the md5sum hashes of the files needed to install the test kernel:

d5f16dcf0080db1268d3b3477c911cd7 linux-image-unsigned-4.15.0-23-generic_4.15.0-23.26~lp1777338Commit8eca6add0_amd64.deb

1d7504d8e691ba64d552a2d507179882 linux-modules-4.15.0-23-generic_4.15.0-23.26~lp1777338Commit8eca6add0_amd64.deb

b004fa6244e3e038b0d67c86e0218536 linux-modules-extra-4.15.0-23-generic_4.15.0-23.26~lp1777338Commit8eca6add0_amd64.deb

Revision history for this message
Adam Novak (interfect) wrote :

OK, I've tested this kernel, and it works just fine under Xen, from
what I can tell. The system comes up just fine:

Linux octagon 4.15.0-23-generic #26~lp1777338Commit8eca6add0 SMP Wed
Jun 27 15:50:08 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

The offending commit must be later.

On Thu, Jul 5, 2018 at 8:32 AM, Joseph Salisbury
<email address hidden> wrote:
> Here are the md5sum hashes of the files needed to install the test
> kernel:
>
> d5f16dcf0080db1268d3b3477c911cd7 linux-image-
> unsigned-4.15.0-23-generic_4.15.0-23.26~lp1777338Commit8eca6add0_amd64.deb
>
> 1d7504d8e691ba64d552a2d507179882 linux-
> modules-4.15.0-23-generic_4.15.0-23.26~lp1777338Commit8eca6add0_amd64.deb
>
> b004fa6244e3e038b0d67c86e0218536 linux-modules-
> extra-4.15.0-23-generic_4.15.0-23.26~lp1777338Commit8eca6add0_amd64.deb
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1777338
>
> Title:
> Linux 4.15.0-23 crashes during the boot process with a "Unable to
> handle kernel NULL pointer dereference" message
>
> Status in linux package in Ubuntu:
> In Progress
> Status in linux source package in Bionic:
> In Progress
>
> Bug description:
> I went to boot my computer today and it wouldn't boot.
>
> I get an "Unable to handle kernel NULL pointer dereference" message
> during the boot process, and, a bit after that, a message from the
> kernel watchdog about CPU #0 being stuck. Then the boot process stops
> completely.
>
> I was able to boot the system by telling Grub to load 4.15.0-22, which
> works perfectly fine, so there has been a regression.
>
> I am running Ubuntu as a Xen dom0, if that matters. I haven't tried
> booting the offending kernel version without Xen.
>
> I'm not sure where, if anywhere, these messages go on disk, for
> posting.
>
> ProblemType: Bug
> DistroRelease: Ubuntu 18.04
> Package: linux-image-4.15.0-23-generic 4.15.0-23.25
> ProcVersionSignature: Ubuntu 4.15.0-22.24-generic 4.15.17
> Uname: Linux 4.15.0-22-generic x86_64
> NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
> ApportVersion: 2.20.9-0ubuntu7.1
> Architecture: amd64
> CurrentDesktop: ubuntu:GNOME
> Date: Sun Jun 17 10:12:58 2018
> InstallationDate: Installed on 2017-08-06 (314 days ago)
> InstallationMedia: Ubuntu 17.04 "Zesty Zapus" - Release amd64 (20170412)
> ProcEnviron:
> TERM=xterm-256color
> PATH=(custom, no user)
> XDG_RUNTIME_DIR=<set>
> LANG=en_US.UTF-8
> SHELL=/bin/bash
> SourcePackage: linux-signed
> UpgradeStatus: Upgraded to bionic on 2018-05-29 (19 days ago)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777338/+subscriptions

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
fc8704280f2ada9f61f08a2d5adc0dab169cc207

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1777338

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

MD5SUM of files:
d76993d34e1203375090a47c4de8ebfe linux-image-unsigned-4.15.0-23-generic_4.15.0-23.26~lp1777338Commitfc8704280f_amd64.deb

5b89b2341b858bbe2bcae88e3efaf8c8 linux-modules-4.15.0-23-generic_4.15.0-23.26~lp1777338Commitfc8704280f_amd64.deb

6928b4ad5f2a861ad674ff88a4745471 linux-modules-extra-4.15.0-23-generic_4.15.0-23.26~lp1777338Commitfc8704280f_amd64.deb

Revision history for this message
Adam Novak (interfect) wrote :
Download full text (3.1 KiB)

Thanks!

I have tested this kernel; it doesn't work, so the problem is between
fc8704280f2ada9f61f08a2d5adc0dab169cc207 and
8eca6add0defde203282476d7969a7c13bbd7d91.

I've gotten set up with a kernel build environment; I think I can
finish the git bisect myself, but looking at the commits in that
range, I have no hope of actually fixing the underlying bug myself.
I'm building 3f6a3b035f91a22c0d3bd27630bf61eac9c8cf6c now.

On Tue, Jul 17, 2018 at 11:24 AM, Joseph Salisbury
<email address hidden> wrote:
> I built the next test kernel, up to the following commit:
> fc8704280f2ada9f61f08a2d5adc0dab169cc207
>
> The test kernel can be downloaded from:
> http://kernel.ubuntu.com/~jsalisbury/lp1777338
>
> Can you test that kernel and report back if it has the bug or not? I
> will build the next test kernel based on your test results.
>
> Thanks in advance
>
>
> MD5SUM of files:
> d76993d34e1203375090a47c4de8ebfe linux-image-unsigned-4.15.0-23-generic_4.15.0-23.26~lp1777338Commitfc8704280f_amd64.deb
>
> 5b89b2341b858bbe2bcae88e3efaf8c8 linux-
> modules-4.15.0-23-generic_4.15.0-23.26~lp1777338Commitfc8704280f_amd64.deb
>
> 6928b4ad5f2a861ad674ff88a4745471 linux-modules-
> extra-4.15.0-23-generic_4.15.0-23.26~lp1777338Commitfc8704280f_amd64.deb
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1777338
>
> Title:
> Linux 4.15.0-23 crashes during the boot process with a "Unable to
> handle kernel NULL pointer dereference" message
>
> Status in linux package in Ubuntu:
> In Progress
> Status in linux source package in Bionic:
> In Progress
>
> Bug description:
> I went to boot my computer today and it wouldn't boot.
>
> I get an "Unable to handle kernel NULL pointer dereference" message
> during the boot process, and, a bit after that, a message from the
> kernel watchdog about CPU #0 being stuck. Then the boot process stops
> completely.
>
> I was able to boot the system by telling Grub to load 4.15.0-22, which
> works perfectly fine, so there has been a regression.
>
> I am running Ubuntu as a Xen dom0, if that matters. I haven't tried
> booting the offending kernel version without Xen.
>
> I'm not sure where, if anywhere, these messages go on disk, for
> posting.
>
> ProblemType: Bug
> DistroRelease: Ubuntu 18.04
> Package: linux-image-4.15.0-23-generic 4.15.0-23.25
> ProcVersionSignature: Ubuntu 4.15.0-22.24-generic 4.15.17
> Uname: Linux 4.15.0-22-generic x86_64
> NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
> ApportVersion: 2.20.9-0ubuntu7.1
> Architecture: amd64
> CurrentDesktop: ubuntu:GNOME
> Date: Sun Jun 17 10:12:58 2018
> InstallationDate: Installed on 2017-08-06 (314 days ago)
> InstallationMedia: Ubuntu 17.04 "Zesty Zapus" - Release amd64 (20170412)
> ProcEnviron:
> TERM=xterm-256color
> PATH=(custom, no user)
> XDG_RUNTIME_DIR=<set>
> LANG=en_US.UTF-8
> SHELL=/bin/bash
> SourcePackage: linux-signed
> UpgradeStatus: Upgraded to bionic on 2018-05-29 (19 days ago)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1...

Read more...

Revision history for this message
Adam Novak (interfect) wrote :
Download full text (3.4 KiB)

OK, I tried that one and it still exhibited the issue. I'm going to
try 91762b4035d9da8c266e2cb3dbc552052434bbf0 next.

On Fri, Jul 20, 2018 at 9:08 PM, Adam Novak <email address hidden> wrote:
> Thanks!
>
> I have tested this kernel; it doesn't work, so the problem is between
> fc8704280f2ada9f61f08a2d5adc0dab169cc207 and
> 8eca6add0defde203282476d7969a7c13bbd7d91.
>
> I've gotten set up with a kernel build environment; I think I can
> finish the git bisect myself, but looking at the commits in that
> range, I have no hope of actually fixing the underlying bug myself.
> I'm building 3f6a3b035f91a22c0d3bd27630bf61eac9c8cf6c now.
>
> On Tue, Jul 17, 2018 at 11:24 AM, Joseph Salisbury
> <email address hidden> wrote:
>> I built the next test kernel, up to the following commit:
>> fc8704280f2ada9f61f08a2d5adc0dab169cc207
>>
>> The test kernel can be downloaded from:
>> http://kernel.ubuntu.com/~jsalisbury/lp1777338
>>
>> Can you test that kernel and report back if it has the bug or not? I
>> will build the next test kernel based on your test results.
>>
>> Thanks in advance
>>
>>
>> MD5SUM of files:
>> d76993d34e1203375090a47c4de8ebfe linux-image-unsigned-4.15.0-23-generic_4.15.0-23.26~lp1777338Commitfc8704280f_amd64.deb
>>
>> 5b89b2341b858bbe2bcae88e3efaf8c8 linux-
>> modules-4.15.0-23-generic_4.15.0-23.26~lp1777338Commitfc8704280f_amd64.deb
>>
>> 6928b4ad5f2a861ad674ff88a4745471 linux-modules-
>> extra-4.15.0-23-generic_4.15.0-23.26~lp1777338Commitfc8704280f_amd64.deb
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1777338
>>
>> Title:
>> Linux 4.15.0-23 crashes during the boot process with a "Unable to
>> handle kernel NULL pointer dereference" message
>>
>> Status in linux package in Ubuntu:
>> In Progress
>> Status in linux source package in Bionic:
>> In Progress
>>
>> Bug description:
>> I went to boot my computer today and it wouldn't boot.
>>
>> I get an "Unable to handle kernel NULL pointer dereference" message
>> during the boot process, and, a bit after that, a message from the
>> kernel watchdog about CPU #0 being stuck. Then the boot process stops
>> completely.
>>
>> I was able to boot the system by telling Grub to load 4.15.0-22, which
>> works perfectly fine, so there has been a regression.
>>
>> I am running Ubuntu as a Xen dom0, if that matters. I haven't tried
>> booting the offending kernel version without Xen.
>>
>> I'm not sure where, if anywhere, these messages go on disk, for
>> posting.
>>
>> ProblemType: Bug
>> DistroRelease: Ubuntu 18.04
>> Package: linux-image-4.15.0-23-generic 4.15.0-23.25
>> ProcVersionSignature: Ubuntu 4.15.0-22.24-generic 4.15.17
>> Uname: Linux 4.15.0-22-generic x86_64
>> NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
>> ApportVersion: 2.20.9-0ubuntu7.1
>> Architecture: amd64
>> CurrentDesktop: ubuntu:GNOME
>> Date: Sun Jun 17 10:12:58 2018
>> InstallationDate: Installed on 2017-08-06 (314 days ago)
>> InstallationMedia: Ubuntu 17.04 "Zesty Zapus" - Release amd64 (20170412)
>> ProcEnviron:
>> TERM=xterm-256color
>> PATH=(custom, ...

Read more...

Revision history for this message
Adam Novak (interfect) wrote :
Download full text (3.8 KiB)

That one works. On to 5856293c78e552c012835e667d66775bba20b4f7.

I suspect 3f6a3b035f91 and abd39ac1da07 may be the real problem, since
the CPU I am using is an AMD Ryzen chip, and those are tinkering
specifically with how the kernel handles those.

On Fri, Jul 20, 2018 at 10:23 PM, Adam Novak <email address hidden> wrote:
> OK, I tried that one and it still exhibited the issue. I'm going to
> try 91762b4035d9da8c266e2cb3dbc552052434bbf0 next.
>
> On Fri, Jul 20, 2018 at 9:08 PM, Adam Novak <email address hidden> wrote:
>> Thanks!
>>
>> I have tested this kernel; it doesn't work, so the problem is between
>> fc8704280f2ada9f61f08a2d5adc0dab169cc207 and
>> 8eca6add0defde203282476d7969a7c13bbd7d91.
>>
>> I've gotten set up with a kernel build environment; I think I can
>> finish the git bisect myself, but looking at the commits in that
>> range, I have no hope of actually fixing the underlying bug myself.
>> I'm building 3f6a3b035f91a22c0d3bd27630bf61eac9c8cf6c now.
>>
>> On Tue, Jul 17, 2018 at 11:24 AM, Joseph Salisbury
>> <email address hidden> wrote:
>>> I built the next test kernel, up to the following commit:
>>> fc8704280f2ada9f61f08a2d5adc0dab169cc207
>>>
>>> The test kernel can be downloaded from:
>>> http://kernel.ubuntu.com/~jsalisbury/lp1777338
>>>
>>> Can you test that kernel and report back if it has the bug or not? I
>>> will build the next test kernel based on your test results.
>>>
>>> Thanks in advance
>>>
>>>
>>> MD5SUM of files:
>>> d76993d34e1203375090a47c4de8ebfe linux-image-unsigned-4.15.0-23-generic_4.15.0-23.26~lp1777338Commitfc8704280f_amd64.deb
>>>
>>> 5b89b2341b858bbe2bcae88e3efaf8c8 linux-
>>> modules-4.15.0-23-generic_4.15.0-23.26~lp1777338Commitfc8704280f_amd64.deb
>>>
>>> 6928b4ad5f2a861ad674ff88a4745471 linux-modules-
>>> extra-4.15.0-23-generic_4.15.0-23.26~lp1777338Commitfc8704280f_amd64.deb
>>>
>>> --
>>> You received this bug notification because you are subscribed to the bug
>>> report.
>>> https://bugs.launchpad.net/bugs/1777338
>>>
>>> Title:
>>> Linux 4.15.0-23 crashes during the boot process with a "Unable to
>>> handle kernel NULL pointer dereference" message
>>>
>>> Status in linux package in Ubuntu:
>>> In Progress
>>> Status in linux source package in Bionic:
>>> In Progress
>>>
>>> Bug description:
>>> I went to boot my computer today and it wouldn't boot.
>>>
>>> I get an "Unable to handle kernel NULL pointer dereference" message
>>> during the boot process, and, a bit after that, a message from the
>>> kernel watchdog about CPU #0 being stuck. Then the boot process stops
>>> completely.
>>>
>>> I was able to boot the system by telling Grub to load 4.15.0-22, which
>>> works perfectly fine, so there has been a regression.
>>>
>>> I am running Ubuntu as a Xen dom0, if that matters. I haven't tried
>>> booting the offending kernel version without Xen.
>>>
>>> I'm not sure where, if anywhere, these messages go on disk, for
>>> posting.
>>>
>>> ProblemType: Bug
>>> DistroRelease: Ubuntu 18.04
>>> Package: linux-image-4.15.0-23-generic 4.15.0-23.25
>>> ProcVersionSignature: Ubuntu 4.15.0-22.24-generic 4.15.17
>>> Uname: Linux 4.15.0-22-generic x86_64
>...

Read more...

Revision history for this message
Adam Novak (interfect) wrote :
Download full text (4.2 KiB)

OK, I have finished the bisect.

3f6a3b035f91a22c0d3bd27630bf61eac9c8cf6c is the first bad commit. I
tested it and it displays the problem, and I tested
abd39ac1da07b433fc570332ee9ad938b5071760 right before it and that one
boots fine.

On Sat, Jul 21, 2018 at 9:42 AM, Adam Novak <email address hidden> wrote:
> That one works. On to 5856293c78e552c012835e667d66775bba20b4f7.
>
> I suspect 3f6a3b035f91 and abd39ac1da07 may be the real problem, since
> the CPU I am using is an AMD Ryzen chip, and those are tinkering
> specifically with how the kernel handles those.
>
> On Fri, Jul 20, 2018 at 10:23 PM, Adam Novak <email address hidden> wrote:
>> OK, I tried that one and it still exhibited the issue. I'm going to
>> try 91762b4035d9da8c266e2cb3dbc552052434bbf0 next.
>>
>> On Fri, Jul 20, 2018 at 9:08 PM, Adam Novak <email address hidden> wrote:
>>> Thanks!
>>>
>>> I have tested this kernel; it doesn't work, so the problem is between
>>> fc8704280f2ada9f61f08a2d5adc0dab169cc207 and
>>> 8eca6add0defde203282476d7969a7c13bbd7d91.
>>>
>>> I've gotten set up with a kernel build environment; I think I can
>>> finish the git bisect myself, but looking at the commits in that
>>> range, I have no hope of actually fixing the underlying bug myself.
>>> I'm building 3f6a3b035f91a22c0d3bd27630bf61eac9c8cf6c now.
>>>
>>> On Tue, Jul 17, 2018 at 11:24 AM, Joseph Salisbury
>>> <email address hidden> wrote:
>>>> I built the next test kernel, up to the following commit:
>>>> fc8704280f2ada9f61f08a2d5adc0dab169cc207
>>>>
>>>> The test kernel can be downloaded from:
>>>> http://kernel.ubuntu.com/~jsalisbury/lp1777338
>>>>
>>>> Can you test that kernel and report back if it has the bug or not? I
>>>> will build the next test kernel based on your test results.
>>>>
>>>> Thanks in advance
>>>>
>>>>
>>>> MD5SUM of files:
>>>> d76993d34e1203375090a47c4de8ebfe linux-image-unsigned-4.15.0-23-generic_4.15.0-23.26~lp1777338Commitfc8704280f_amd64.deb
>>>>
>>>> 5b89b2341b858bbe2bcae88e3efaf8c8 linux-
>>>> modules-4.15.0-23-generic_4.15.0-23.26~lp1777338Commitfc8704280f_amd64.deb
>>>>
>>>> 6928b4ad5f2a861ad674ff88a4745471 linux-modules-
>>>> extra-4.15.0-23-generic_4.15.0-23.26~lp1777338Commitfc8704280f_amd64.deb
>>>>
>>>> --
>>>> You received this bug notification because you are subscribed to the bug
>>>> report.
>>>> https://bugs.launchpad.net/bugs/1777338
>>>>
>>>> Title:
>>>> Linux 4.15.0-23 crashes during the boot process with a "Unable to
>>>> handle kernel NULL pointer dereference" message
>>>>
>>>> Status in linux package in Ubuntu:
>>>> In Progress
>>>> Status in linux source package in Bionic:
>>>> In Progress
>>>>
>>>> Bug description:
>>>> I went to boot my computer today and it wouldn't boot.
>>>>
>>>> I get an "Unable to handle kernel NULL pointer dereference" message
>>>> during the boot process, and, a bit after that, a message from the
>>>> kernel watchdog about CPU #0 being stuck. Then the boot process stops
>>>> completely.
>>>>
>>>> I was able to boot the system by telling Grub to load 4.15.0-22, which
>>>> works perfectly fine, so there has been a regression.
>>>>
>>>> I am running Ubuntu as a Xen dom0, if that matters. I ha...

Read more...

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for finishing up the bisect!

Would it be possible for you to test the proposed kernel and post back if it resolves this bug?
See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed.

Thank you in advance!

Revision history for this message
Adam Novak (interfect) wrote :

I'm not finding any kernel packages in bionic-proposed; maybe they were
released already?

I've installed and tested 4.15.0-29 from the normal repos; it has the same
null pointer dereference at address 8 issue.

On Tue, Jul 24, 2018, 09:17 Joseph Salisbury <email address hidden>
wrote:

> Thanks for finishing up the bisect!
>
> Would it be possible for you to test the proposed kernel and post back if
> it resolves this bug?
> See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
> to enable and use -proposed.
>
> Thank you in advance!
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1777338
>
> Title:
> Linux 4.15.0-23 crashes during the boot process with a "Unable to
> handle kernel NULL pointer dereference" message
>
> Status in linux package in Ubuntu:
> In Progress
> Status in linux source package in Bionic:
> In Progress
>
> Bug description:
> I went to boot my computer today and it wouldn't boot.
>
> I get an "Unable to handle kernel NULL pointer dereference" message
> during the boot process, and, a bit after that, a message from the
> kernel watchdog about CPU #0 being stuck. Then the boot process stops
> completely.
>
> I was able to boot the system by telling Grub to load 4.15.0-22, which
> works perfectly fine, so there has been a regression.
>
> I am running Ubuntu as a Xen dom0, if that matters. I haven't tried
> booting the offending kernel version without Xen.
>
> I'm not sure where, if anywhere, these messages go on disk, for
> posting.
>
> ProblemType: Bug
> DistroRelease: Ubuntu 18.04
> Package: linux-image-4.15.0-23-generic 4.15.0-23.25
> ProcVersionSignature: Ubuntu 4.15.0-22.24-generic 4.15.17
> Uname: Linux 4.15.0-22-generic x86_64
> NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
> ApportVersion: 2.20.9-0ubuntu7.1
> Architecture: amd64
> CurrentDesktop: ubuntu:GNOME
> Date: Sun Jun 17 10:12:58 2018
> InstallationDate: Installed on 2017-08-06 (314 days ago)
> InstallationMedia: Ubuntu 17.04 "Zesty Zapus" - Release amd64 (20170412)
> ProcEnviron:
> TERM=xterm-256color
> PATH=(custom, no user)
> XDG_RUNTIME_DIR=<set>
> LANG=en_US.UTF-8
> SHELL=/bin/bash
> SourcePackage: linux-signed
> UpgradeStatus: Upgraded to bionic on 2018-05-29 (19 days ago)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777338/+subscriptions
>

Revision history for this message
Adam Novak (interfect) wrote :

I've been advised by Juergen Gross that not having "x86/xen: Add call of speculative_store_bypass_ht_init() to PV paths" from the mainline kernel might contribute to this problem. I am trying to pull that in now.

Revision history for this message
Adam Novak (interfect) wrote :

OK, I grabbed commit 74899d92e66663dc7671a8017b3146dcd4735f3b "x86/xen: Add call of speculative_store_bypass_ht_init() to PV paths" from <email address hidden>:torvalds/linux.git and cherry-picked it on top of the tag Ubuntu-4.15.0-29.31, and built the kernel, and now it boots fine as dom0! I think that that patch fixes this bug.

How can I get this commit to be pulled into the main Ubuntu kernel development branch, to be released in the official kernel builds?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commit 74899d92e66663dc7671a8017b3146dcd4735f3b. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1777338

Can you test this kernel and see if it resolves this bug?

If this kernel also fixes the bug, I'll submit an SRU request.

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

Revision history for this message
Adam Novak (interfect) wrote :

Thanks for preparing this kernel; I am downloading it now. What are
the expected hashes of the files, so that I can verify the insecure
downloads?

On Tue, Aug 7, 2018 at 8:49 AM, Joseph Salisbury
<email address hidden> wrote:
> I built a test kernel with commit 74899d92e66663dc7671a8017b3146dcd4735f3b. The test kernel can be downloaded from:
> http://kernel.ubuntu.com/~jsalisbury/lp1777338
>
> Can you test this kernel and see if it resolves this bug?
>
> If this kernel also fixes the bug, I'll submit an SRU request.
>
>
> Note about installing test kernels:
> • If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
> • If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.
>
> Thanks in advance!
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1777338
>
> Title:
> Linux 4.15.0-23 crashes during the boot process with a "Unable to
> handle kernel NULL pointer dereference" message
>
> Status in linux package in Ubuntu:
> In Progress
> Status in linux source package in Bionic:
> In Progress
>
> Bug description:
> I went to boot my computer today and it wouldn't boot.
>
> I get an "Unable to handle kernel NULL pointer dereference" message
> during the boot process, and, a bit after that, a message from the
> kernel watchdog about CPU #0 being stuck. Then the boot process stops
> completely.
>
> I was able to boot the system by telling Grub to load 4.15.0-22, which
> works perfectly fine, so there has been a regression.
>
> I am running Ubuntu as a Xen dom0, if that matters. I haven't tried
> booting the offending kernel version without Xen.
>
> I'm not sure where, if anywhere, these messages go on disk, for
> posting.
>
> ProblemType: Bug
> DistroRelease: Ubuntu 18.04
> Package: linux-image-4.15.0-23-generic 4.15.0-23.25
> ProcVersionSignature: Ubuntu 4.15.0-22.24-generic 4.15.17
> Uname: Linux 4.15.0-22-generic x86_64
> NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
> ApportVersion: 2.20.9-0ubuntu7.1
> Architecture: amd64
> CurrentDesktop: ubuntu:GNOME
> Date: Sun Jun 17 10:12:58 2018
> InstallationDate: Installed on 2017-08-06 (314 days ago)
> InstallationMedia: Ubuntu 17.04 "Zesty Zapus" - Release amd64 (20170412)
> ProcEnviron:
> TERM=xterm-256color
> PATH=(custom, no user)
> XDG_RUNTIME_DIR=<set>
> LANG=en_US.UTF-8
> SHELL=/bin/bash
> SourcePackage: linux-signed
> UpgradeStatus: Upgraded to bionic on 2018-05-29 (19 days ago)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777338/+subscriptions

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The hashes are:

63d7e9ca7710792e1603953b0ec14337 linux-image-unsigned-4.15.0-31-generic_4.15.0-31.34~lp1777338_amd64.deb

8cca692f07f9edbb56dec15375bc9979 linux-modules-4.15.0-31-generic_4.15.0-31.34~lp1777338_amd64.deb

df24a98ec4769f33a9831a5aaeb9f437 linux-modules-extra-4.15.0-31-generic_4.15.0-31.34~lp1777338_amd64.deb

Revision history for this message
Adam Novak (interfect) wrote :

OK, I have tested the provided kernel and it works for me under Xen.
Thank you for fixing the bug! I look forward to seeing this in the
real releases.

[anovak@octagon ~]$ uname -a
Linux octagon 4.15.0-31-generic #34~lp1777338 SMP Tue Aug 7 14:57:12
UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

On Mon, Aug 13, 2018 at 8:48 AM, Joseph Salisbury
<email address hidden> wrote:
> The hashes are:
>
> 63d7e9ca7710792e1603953b0ec14337 linux-image-
> unsigned-4.15.0-31-generic_4.15.0-31.34~lp1777338_amd64.deb
>
> 8cca692f07f9edbb56dec15375bc9979 linux-
> modules-4.15.0-31-generic_4.15.0-31.34~lp1777338_amd64.deb
>
> df24a98ec4769f33a9831a5aaeb9f437 linux-modules-
> extra-4.15.0-31-generic_4.15.0-31.34~lp1777338_amd64.deb
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1777338
>
> Title:
> Linux 4.15.0-23 crashes during the boot process with a "Unable to
> handle kernel NULL pointer dereference" message
>
> Status in linux package in Ubuntu:
> In Progress
> Status in linux source package in Bionic:
> In Progress
>
> Bug description:
> I went to boot my computer today and it wouldn't boot.
>
> I get an "Unable to handle kernel NULL pointer dereference" message
> during the boot process, and, a bit after that, a message from the
> kernel watchdog about CPU #0 being stuck. Then the boot process stops
> completely.
>
> I was able to boot the system by telling Grub to load 4.15.0-22, which
> works perfectly fine, so there has been a regression.
>
> I am running Ubuntu as a Xen dom0, if that matters. I haven't tried
> booting the offending kernel version without Xen.
>
> I'm not sure where, if anywhere, these messages go on disk, for
> posting.
>
> ProblemType: Bug
> DistroRelease: Ubuntu 18.04
> Package: linux-image-4.15.0-23-generic 4.15.0-23.25
> ProcVersionSignature: Ubuntu 4.15.0-22.24-generic 4.15.17
> Uname: Linux 4.15.0-22-generic x86_64
> NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
> ApportVersion: 2.20.9-0ubuntu7.1
> Architecture: amd64
> CurrentDesktop: ubuntu:GNOME
> Date: Sun Jun 17 10:12:58 2018
> InstallationDate: Installed on 2017-08-06 (314 days ago)
> InstallationMedia: Ubuntu 17.04 "Zesty Zapus" - Release amd64 (20170412)
> ProcEnviron:
> TERM=xterm-256color
> PATH=(custom, no user)
> XDG_RUNTIME_DIR=<set>
> LANG=en_US.UTF-8
> SHELL=/bin/bash
> SourcePackage: linux-signed
> UpgradeStatus: Upgraded to bionic on 2018-05-29 (19 days ago)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777338/+subscriptions

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :
description: updated
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Adam Novak (interfect) wrote :

I don't seem to have access to change the tag, but I can confirm that 4.15.0-34-generic from -proposed solves the problem. So the tag should be verification-done-bionic.

Changed in linux (Ubuntu):
status: In Progress → Fix Committed
tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (32.9 KiB)

This bug was fixed in the package linux - 4.15.0-34.37

---------------
linux (4.15.0-34.37) bionic; urgency=medium

  * linux: 4.15.0-34.37 -proposed tracker (LP: #1788744)

  * Bionic update: upstream stable patchset 2018-08-09 (LP: #1786352)
    - MIPS: c-r4k: Fix data corruption related to cache coherence
    - MIPS: ptrace: Expose FIR register through FP regset
    - MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
    - KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
    - affs_lookup(): close a race with affs_remove_link()
    - fs: don't scan the inode cache before SB_BORN is set
    - aio: fix io_destroy(2) vs. lookup_ioctx() race
    - ALSA: timer: Fix pause event notification
    - do d_instantiate/unlock_new_inode combinations safely
    - mmc: sdhci-iproc: remove hard coded mmc cap 1.8v
    - mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
    - mmc: sdhci-iproc: add SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus
    - libata: Blacklist some Sandisk SSDs for NCQ
    - libata: blacklist Micron 500IT SSD with MU01 firmware
    - xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
    - drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
    - arm64: lse: Add early clobbers to some input/output asm operands
    - powerpc/64s: Clear PCR on boot
    - IB/hfi1: Use after free race condition in send context error path
    - IB/umem: Use the correct mm during ib_umem_release
    - idr: fix invalid ptr dereference on item delete
    - Revert "ipc/shm: Fix shmat mmap nil-page protection"
    - ipc/shm: fix shmat() nil address after round-down when remapping
    - mm/kasan: don't vfree() nonexistent vm_area
    - kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
    - kasan: fix memory hotplug during boot
    - kernel/sys.c: fix potential Spectre v1 issue
    - KVM: s390: vsie: fix < 8k check for the itdba
    - KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
    - kvm: x86: IA32_ARCH_CAPABILITIES is always supported
    - powerpc/64s: Improve RFI L1-D cache flush fallback
    - powerpc/pseries: Restore default security feature flags on setup
    - powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
    - MIPS: generic: Fix machine compatible matching
    - mac80211: mesh: fix wrong mesh TTL offset calculation
    - ARC: Fix malformed ARC_EMUL_UNALIGNED default
    - ptr_ring: prevent integer overflow when calculating size
    - arm64: dts: rockchip: fix rock64 gmac2io stability issues
    - arm64: dts: rockchip: correct ep-gpios for rk3399-sapphire
    - libata: Fix compile warning with ATA_DEBUG enabled
    - selftests: sync: missing CFLAGS while compiling
    - selftest/vDSO: fix O=
    - selftests: pstore: Adding config fragment CONFIG_PSTORE_RAM=m
    - selftests: memfd: add config fragment for fuse
    - ARM: OMAP2+: timer: fix a kmemleak caused in omap_get_timer_dt
    - ARM: OMAP3: Fix prm wake interrupt for resume
    - ARM: OMAP2+: Fix sar_base inititalization for HS omaps
    - ARM: OMAP1: clock: Fix debugfs_create_*() usage
    - tls: retrun the correct IV in getsockopt
    - xhci: workaround for AMD Promontory disabled ports w...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
Po-Hsu Lin (cypressyew)
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.