crash:master

Last commit made on 2024-04-29
Get this branch:
git clone -b master https://git.launchpad.net/crash

Branch merges

Branch information

Name:
master
Repository:
lp:crash

Recent commits

9104e87... by Lianbo Jiang <email address hidden>

Mark start of 8.0.6 development phase with version 8.0.5++

Signed-off-by: Lianbo Jiang <email address hidden>

ceaccee... by Kazuhito Hagio <email address hidden>

crash-8.0.4 -> crash-8.0.5

Signed-off-by: Kazuhito Hagio <email address hidden>

eedf12d... by Lianbo Jiang <email address hidden>

gdb: fix "p" command to print module variables correctly

Some objects format may potentially support copy relocations, but
currently the maybe_copied is always initialized to 0 in the symbol().
And the type is 'mst_file_bss', not always the 'mst_bss' or 'mst_data'
in the lookup_minimal_symbol_linkage(). For example:

(gdb) p *msymbol
$42 = {<general_symbol_info> = {m_name = 0x349812f "test_no_static", value = {ivalue = 8, block = 0x8,
      bytes = 0x8 <error: Cannot access memory at address 0x8>, address = 8, common_block = 0x8, chain = 0x8}, language_specific = {
      obstack = 0x0, demangled_name = 0x0}, m_language = language_auto, ada_mangled = 0, section = 20}, size = 4,
  filename = 0x6db3440 "test_sanity.c", type = mst_file_bss, created_by_gdb = 0, target_flag_1 = 0, target_flag_2 = 0, has_size = 1,
  maybe_copied = 0, name_set = 1, hash_next = 0x0, demangled_hash_next = 0x0}

This causes a problem that the 'p' command cannot work well as expected,
and emits an error or a bogus value:

  crash> mod -s test_sanity /home/test_sanity.ko
       MODULE NAME BASE SIZE OBJECT FILE
  ffffffffc1084040 test_sanity ffffffffc1082000 16384 /home/test_sanity.ko
  crash> p test_no_static
  p: gdb request failed: p test_no_static
  crash>

The issue occurs with Linux 6.2 and later or kernels that have kernel
commit 80e4c1cd42ff ("x86/retbleed: Add X86_FEATURE_CALL_DEPTH") and
configured with CONFIG_CALL_DEPTH_TRACKING=y, including RHEL9.3 and
later kernels.

With the patch:
  crash> mod -s test_sanity /home/test_sanity.ko
       MODULE NAME BASE SIZE OBJECT FILE
  ffffffffc1084040 test_sanity ffffffffc1082000 16384 /home/test_sanity.ko
  crash> p test_no_static
  test_no_static = $1 = 5
  crash>

Signed-off-by: Lianbo Jiang <email address hidden>

7d4daf0... by "Aureau, Georges (Kernel Tools ERT)" <email address hidden>

x86_64: Fix "bt" command to handle IRQ exception frames properly

On x86_64, there are cases where crash cannot handle IRQ exception
frames properly. For example, with RHEL9.3 kernel, "bt" command fails
with with "WARNING possibly bogus exception frame":

  crash> bt -c 30
  PID: 2898241 TASK: ff4cb0ce0da0c680 CPU: 30 COMMAND: "star-ccm+"
   #0 [fffffe4658d88e58] crash_nmi_callback at ffffffffa00675e8
   #1 [fffffe4658d88e68] nmi_handle at ffffffffa002ebab
  ...
  --- <NMI exception stack> ---
  ...
  #13 [ff5eba269937cf90] __do_softirq at ffffffffa0c6c007
  #14 [ff5eba269937cfe0] __irq_exit_rcu at ffffffffa010ef61
  #15 [ff5eba269937cff0] sysvec_apic_timer_interrupt at ffffffffa0c58ca2
  --- <IRQ stack> ---
      RIP: 0000000000000010 RSP: 0000000000000018 RFLAGS: ff5eba26ddc9f7e8
      RAX: 0000000000000a20 RBX: ff5eba26ddc9f940 RCX: 0000000000001000
      RDX: ffffffb559980000 RSI: ff4cb12d67207400 RDI: ffffffffffffffff
      RBP: 0000000000001000 R8: ff5eba26ddc9f940 R9: ff5eba26ddc9f8af
      R10: 0000000000000003 R11: 0000000000000a20 R12: ff5eba26ddc9f8b0
      R13: 000000283c07f000 R14: ff4cb0f5a29a1c00 R15: 0000000000000001
      ORIG_RAX: ffffffffa07c4e60 CS: 0206 SS: 7000001cf0380001
  bt: WARNING: possibly bogus exception frame

Running "crash" with "--machdep irq_eframe_link=0xffffffffffffffe8"
option (i.e. thus irq_eframe_link = -24) works properly:

  PID: 2898241 TASK: ff4cb0ce0da0c680 CPU: 30 COMMAND: "star-ccm+"
   #0 [fffffe4658d88e58] crash_nmi_callback at ffffffffa00675e8
   #1 [fffffe4658d88e68] nmi_handle at ffffffffa002ebab
  ...
  --- <NMI exception stack> ---
  ...
  #13 [ff5eba269937cf90] __do_softirq at ffffffffa0c6c007
  #14 [ff5eba269937cfe0] __irq_exit_rcu at ffffffffa010ef61
  #15 [ff5eba269937cff0] sysvec_apic_timer_interrupt at ffffffffa0c58ca2
  --- <IRQ stack> ---
  #16 [ff5eba26ddc9f738] asm_sysvec_apic_timer_interrupt at ffffffffa0e00e06
      [exception RIP: alloc_pte.constprop.0+32]
      RIP: ffffffffa07c4e60 RSP: ff5eba26ddc9f7e8 RFLAGS: 00000206
      RAX: ff5eba26ddc9f940 RBX: 0000000000001000 RCX: 0000000000000a20
      RDX: 0000000000001000 RSI: ffffffb559980000 RDI: ff4cb12d67207400
      RBP: ff5eba26ddc9f8b0 R8: ff5eba26ddc9f8af R9: 0000000000000003
      R10: 0000000000000a20 R11: ff5eba26ddc9f940 R12: 000000283c07f000
      R13: ff4cb0f5a29a1c00 R14: 0000000000000001 R15: ff4cb0f5a29a1bf8
      ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
  #17 [ff5eba26ddc9f830] iommu_v1_map_pages at ffffffffa07c5648
  #18 [ff5eba26ddc9f8f8] __iommu_map at ffffffffa07d7803
  #19 [ff5eba26ddc9f990] iommu_map_sg at ffffffffa07d7b71
  #20 [ff5eba26ddc9f9f0] iommu_dma_map_sg at ffffffffa07ddcc9
  #21 [ff5eba26ddc9fa90] __dma_map_sg_attrs at ffffffffa01b5205
  ...

Some background:

asm_common_interrupt:
      callq error_entry
      movq %rax,%rsp
      movq %rsp,%rdi
      movq 0x78(%rsp),%rsi
      movq $-0x1,0x78(%rsp)
      call common_interrupt # rsp pointing to regs

common_interrupt:
      pushq %r12
      pushq %rbp
      pushq %rbx
      [...]
      movq hardirq_stack_ptr,%r11
      movq %rsp,(%r11)
      movq %r11,%rsp
      [...]
      call __common_interrupt # rip:common_interrupt

So frame_size(rip:common_interrupt) = 32 (3 push + ret).

Hence "machdep->machspec->irq_eframe_link = -32;" (see x86_64_irq_eframe_link_init()).

Now:

asm_sysvec_apic_timer_interrupt:
      pushq $-0x1
      callq error_entry
      movq %rax,%rsp
      movq %rsp,%rdi
      callq sysvec_apic_timer_interrupt

sysvec_apic_timer_interrupt:
      pushq %r12
      pushq %rbp
      [...]
      movq hardirq_stack_ptr,%r11
      movq %rsp,(%r11)
      movq %r11,%rsp
      [...]
      call __sysvec_apic_timer_interrupt # rip:sysvec_apic_timer_interrupt

Here frame_size(rip:sysvec_apic_timer_interrupt) = 24 (2 push + ret)

We should also notice that:

rip = *(hardirq_stack_ptr - 8)
rsp = *(hardirq_stack_ptr)
regs = rsp - frame_size(rip)

But x86_64_get_framesize() does not work with IRQ handlers (returns 0).
So not many options other than hardcoding the most likely value and
looking around it. Actually x86_64_irq_eframe_link() was trying -32
(default), and then -40, but not -24.

Signed-off-by: Georges Aureau <email address hidden>

ced754d... by Tao Liu <email address hidden>

Fix segmentation fault in value_search_module_6_4()

The following segmentation fault occurred during session initialization:

  $ crash vmlinx vmcore
  ...
  please wait... (determining panic task)Segmentation fault

Here is the backtrace of the crash-utility:

  (gdb) bt
  #0 value_search_module_6_4 (value=18446603338276298752, offset=0x7ffffffface0) at symbols.c:5564
  #1 0x0000555555812bd0 in value_to_symstr (value=18446603338276298752,
      buf=buf@entry=0x7fffffffb9c0 "", radix=10, radix@entry=0) at symbols.c:5872
  #2 0x00005555557694a2 in display_memory (addr=<optimized out>, count=2048, flag=208,
      memtype=memtype@entry=1, opt=opt@entry=0x0) at memory.c:1740
  #3 0x0000555555769e1f in raw_stack_dump (stackbase=<optimized out>, size=<optimized out>)
      at memory.c:2194
  #4 0x00005555557923ff in get_active_set_panic_task () at task.c:8639
  #5 0x00005555557930d2 in get_dumpfile_panic_task () at task.c:7628
  #6 0x00005555557a89d3 in panic_search () at task.c:7380
  #7 get_panic_context () at task.c:6267
  #8 task_init () at task.c:687
  #9 0x00005555557305b3 in main_loop () at main.c:787
  ...

This is due to lack of existence check on module symbol table. Not all
mod_mem_type will be existent for a module, e.g. in the following module
case:

  (gdb) p lm->symtable[0]
  $1 = (struct syment *) 0x4dcbad0
  (gdb) p lm->symtable[1]
  $2 = (struct syment *) 0x4dcbb70
  (gdb) p lm->symtable[2]
  $3 = (struct syment *) 0x4dcbc10
  (gdb) p lm->symtable[3]
  $4 = (struct syment *) 0x0
  (gdb) p lm->symtable[4]
  $5 = (struct syment *) 0x4dcbcb0
  (gdb) p lm->symtable[5]
  $6 = (struct syment *) 0x4dcbd00
  (gdb) p lm->symtable[6]
  $7 = (struct syment *) 0x0

MOD_RO_AFTER_INIT(3) and MOD_INIT_RODATA(6) do not exist, which should
be skipped, otherwise the segmentation fault will happen.

Fixes: 7750e61fdb2a ("Support module memory layout change on Linux 6.4")
Closes: https://github.com/crash-utility/crash/issues/176
Reported-by: Naveen Chaudhary <email address hidden>
Signed-off-by: Tao Liu <email address hidden>

ce47cb8... by Lianbo Jiang <email address hidden>

x86_64: Fix for "bt" command incorrectly printing "bogus exception frame" warning

The "bogus exception frame" warning was observed again on a specific
vmcore, and the remaining frame was truncated on x86_64 machine, when
executing the "bt" command as below:

  crash> bt 0 -c 8
  PID: 0 TASK: ffff9948c08f5640 CPU: 8 COMMAND: "swapper/8"
   #0 [fffffe1788788e58] crash_nmi_callback at ffffffff972672bb
   #1 [fffffe1788788e68] nmi_handle at ffffffff9722eb8e
   #2 [fffffe1788788eb0] default_do_nmi at ffffffff97e51cd0
   #3 [fffffe1788788ed0] exc_nmi at ffffffff97e51ee1
   #4 [fffffe1788788ef0] end_repeat_nmi at ffffffff980015f9
      [exception RIP: __update_load_avg_se+13]
      RIP: ffffffff9736b16d RSP: ffffbec3c08acc78 RFLAGS: 00000046
      RAX: 0000000000000000 RBX: ffff994c2f2b1a40 RCX: ffffbec3c08acdc0
      RDX: ffff9948e4fe1d80 RSI: ffff994c2f2b1a40 RDI: 0000001d7ad7d55d
      RBP: ffffbec3c08acc88 R8: 0000001d921fca6f R9: ffff994c2f2b1328
      R10: 00000000fffd0010 R11: ffffffff98e060c0 R12: 0000001d7ad7d55d
      R13: 0000000000000005 R14: ffff994c2f2b19c0 R15: 0000000000000001
      ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
  --- <NMI exception stack> ---
   #5 [ffffbec3c08acc78] __update_load_avg_se at ffffffff9736b16d
   #6 [ffffbec3c08acce0] enqueue_entity at ffffffff9735c9ab
   #7 [ffffbec3c08acd28] enqueue_task_fair at ffffffff9735cef8
  ...
  #18 [ffffbec3c08acf90] blk_complete_reqs at ffffffff977978d0
  #19 [ffffbec3c08acfa0] __do_softirq at ffffffff97e66f7a
  #20 [ffffbec3c08acff0] do_softirq at ffffffff9730f6ef
  --- <IRQ stack> ---
  #21 [ffffbec3c022ff18] do_idle at ffffffff97368288
      [exception RIP: unknown or invalid address]
      RIP: 0000000000000000 RSP: 0000000000000000 RFLAGS: 00000000
      RAX: 0000000000000000 RBX: 000000089726a2d0 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffffffff9726a3dd R8: 0000000000000000 R9: 0000000000000000
      R10: ffffffff9720015a R11: e48885e126bc1600 R12: 0000000000000000
      R13: ffffffff973684a9 R14: 0000000000000094 R15: 0000000040000000
      ORIG_RAX: 0000000000000000 CS: 0000 SS: 0000
  bt: WARNING: possibly bogus exception frame
  crash>

Actually there is no exception frame, when called from do_softirq().
With the patch:

  crash> bt 0 -c 8
  ...
  #18 [ffffbec3c08acf90] blk_complete_reqs at ffffffff977978d0
  #19 [ffffbec3c08acfa0] __do_softirq at ffffffff97e66f7a
  #20 [ffffbec3c08acff0] do_softirq at ffffffff9730f6ef
  --- <IRQ stack> ---
  #21 [ffffbec3c022ff28] cpu_startup_entry at ffffffff973684a9
  #22 [ffffbec3c022ff38] start_secondary at ffffffff9726a3dd
  #23 [ffffbec3c022ff50] secondary_startup_64_no_verify at ffffffff9720015a
  crash>

Reported-by: Jie Li <email address hidden>
Signed-off-by: Lianbo Jiang <email address hidden>

5b24e36... by Aditya Gupta <email address hidden>

get vmalloc start address from vmcoreinfo

Below error is noticed when running crash on vmcore collected from a linux-next
kernel crash (linux-next tag next-20240121):

  # crash /boot/vmlinuz-6.8.0-rc5-next-20240221 ./vmcore
  ...
  For help, type "help".
  Type "apropos word" to search for commands related to "word"...

  crash: page excluded: kernel virtual address: c00000000219a2c0 type: "vmlist"

This occured since getting the vmalloc area base address doesn't work in
crash now, due to 'vmap_area_list' being removed in the linux kernel
6.9-rc1 with below commit:

    commit 55c49fee57af99f3c663e69dedc5b85e691bbe50
         mm/vmalloc: remove vmap_area_list

As an alternative, the commit introduced 'VMALLOC_START' in vmcoreinfo to
get base address of vmalloc area, use it to return vmallow start address
instead of depending on vmap_area_list and vmlist.

Reported-by: Sachin Sant <email address hidden>
Signed-off-by: Aditya Gupta <email address hidden>
Tested-by: Sachin Sant <email address hidden>
Acked-by: Hari Bathini <email address hidden>

18bf18c... by Huang Shijie <email address hidden>

arm64: Add support for vmemmap symbol in vmcoreinfo

With kernel commit d3246b6ee42a ("crash_core: export vmemmap when
CONFIG_SPARSEMEM_VMEMMAP is enabled") in Linux 6.9-rc1 and later, we can
use the vmemmap symbol in vmcoreinfo to optimize machdep->is_page_ptr.
vmemmap is just an array of struct page after all.

This patch tries to:
  1.) Get the "vmemmap" from the vmcore file. If it's available,
      arm64_vmemmap_is_page_ptr is set to machdep->is_page_ptr.
  2.) Implement the fast page_to_pfn code in arm64_vmemmap_is_page_ptr.
  3.) Dump it in "help -m".

With the patch, "files -p" command for the inode of 441M vmlinux takes
only 3 seconds, while 185 seconds without the patch.

Signed-off-by: Huang Shijie <email address hidden>

3f205d1... by Ming Wang <email address hidden>

LoongArch64: Fixed link errors when build on LOONGARCH64 machine

The following link error exists when building with LOONGARCH64
machine:

/usr/bin/ld: proc-service.o: in function `.LVL71':
proc-service.c:(.text+0x324): undefined reference to `fill_gregset ...
/usr/bin/ld: proc-service.o: in function `.LVL77':
proc-service.c:(.text+0x364): undefined reference to `supply_gregset ...
/usr/bin/ld: proc-service.o: in function `.LVL87':
proc-service.c:(.text+0x3c4): undefined reference to `fill_fpregset ...
/usr/bin/ld: proc-service.o: in function `.LVL93':
proc-service.c:(.text+0x404): undefined reference to `supply_fpregset
collect2: error: ld returned 1 exit status

The cause of the error is that the definition of a function such as
fill_gregset is not implemented. This patch is used to fix this error.

[ kh: added rm command for gdb files added and modified multiple times. ]

Reported-by: Xiujie Jiang <email address hidden>
Signed-off-by: Ming Wang <email address hidden>
Signed-off-by: Kazuhito Hagio <email address hidden>

cc30490... by Kazuhito Hagio <email address hidden>

gdb-10.2.patch: Fix duplicated code by re-applying patch

When adding a patch to gdb-10.2.patch, a LOONGARCH64 build will fail
with the following redefinition errors. There is need to remove the
gdb-10.2 directory before rebuilding. It's because the patch command
cannot detect previously applied patches for newly created loongarch
files and those files get duplicated code.

  $ git am /tmp/0001-LoongArch64-Fixed-link-errors-when-build-on-LOO.patch
  Applying: LoongArch64: Fixed link errors when build on LOONGARCH64 machine
  $ make -j 16 warn target=LOONGARCH64
  ...
  patching file gdb-10.2/bfd/configure.ac
  Reversed (or previously applied) patch detected! Skipping patch.
  1 out of 1 hunk ignored
  patching file gdb-10.2/bfd/cpu-loongarch.c <<-- cannot detect previously applied patch
  patching file gdb-10.2/bfd/elf-bfd.h
  patching file gdb-10.2/bfd/elf.c
  ...
  libtool: compile: gcc -DHAVE_CONFIG_H -I. -DBINDIR=\"/usr/local/bin\" ...
  cpu-loongarch.c:86:33: error: redefinition of 'bfd_loongarch32_arch'
     static const bfd_arch_info_type bfd_loongarch32_arch =
                                     ^~~~~~~~~~~~~~~~~~~~
  ...
  make: *** [Makefile:254: all] Error 2

To fix this, change the file path of newly created files from "*.orig"
to "/dev/null" so that patch command can detect previously applied
patches.

Signed-off-by: Kazuhito Hagio <email address hidden>