Page 1 of 1

Using ramoops/pstore to capture kernel panics

Posted: Sat Dec 02, 2017 6:07 pm
by notro
I'm doing gadget kernel development on a Zero where I use the uart for network. This means I can't see the NULL pointer deref that I inevitably will trigger. It will just die without telling me why. Yes I know, the stores have a Zero Wireless now, so I could get that console :-)

Luckily for me the kernel has the necessary bits builtin and to my suprise it did actually work.

Kernel config:

Code: Select all

$ zgrep PSTORE /proc/config.gz
CONFIG_PSTORE=y
CONFIG_PSTORE_ZLIB_COMPRESS=y
# CONFIG_PSTORE_LZO_COMPRESS is not set
# CONFIG_PSTORE_LZ4_COMPRESS is not set
CONFIG_PSTORE_CONSOLE=y
# CONFIG_PSTORE_PMSG is not set
# CONFIG_PSTORE_FTRACE is not set
CONFIG_PSTORE_RAM=y

DT overlay:

Code: Select all

/dts-v1/;
/plugin/;

/ {
	compatible = "brcm,bcm2835";

	[email protected] {
		target-path = "/";
		__overlay__ {
			reserved-memory {
				#address-cells = <1>;
				#size-cells = <1>;
				ranges;

				ramoops: [email protected] {
					compatible = "ramoops";
					reg = <0x0b000000 0x10000>; /* 64kB */
					record-size = <0x4000>; /* 16kB */
					console-size = <0>; /* disabled by default */
				};
			};
		};
	};

	__overrides__ {
		total-size = <&ramoops>,"reg:4";
		record-size = <&ramoops>,"record-size:0";
		console-size = <&ramoops>,"console-size:0";
	};
};

Check that it's loaded:

Code: Select all

$ dmesg | egrep 'pstore|ramoops'
[    0.097928] pstore: using zlib compression
[    0.097991] pstore: Registered ramoops as persistent store backend
[    0.098031] ramoops: attached [email protected], ecc: 0/0

$ sudo grep "" /sys/module/ramoops/parameters/*
/sys/module/ramoops/parameters/console_size:0
/sys/module/ramoops/parameters/dump_oops:1
/sys/module/ramoops/parameters/ecc:0
/sys/module/ramoops/parameters/ftrace_size:0
/sys/module/ramoops/parameters/mem_address:184549376
/sys/module/ramoops/parameters/mem_size:65536
/sys/module/ramoops/parameters/mem_type:0
/sys/module/ramoops/parameters/pmsg_size:0
/sys/module/ramoops/parameters/record_size:16384

Test by setting panic reboot timeout to 10 seconds and trigger a panic:

Code: Select all

$ sudo sh -c "echo 10 > /proc/sys/kernel/panic"
$ sudo sh -c "echo c > /proc/sysrq-trigger"
[   81.809076] sysrq: SysRq : Trigger a crash
[   81.816251] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[   81.830647] pgd = d6c9c000
[   81.836479] [00000000] *pgd=00000000
[   81.843303] Internal error: Oops: 805 [#1] ARM
[   81.850946] Modules linked in: gpio_backlight mi0283qt mipi_dbi tinydrm backlight bcm2835_rng rng_core
[   81.866822] CPU: 0 PID: 685 Comm: sh Not tainted 4.14.0-rc3+ #7
[   81.876142] Hardware name: BCM2835
[   81.882888] task: d7785400 task.stack: d6d66000
[   81.890821] PC is at sysrq_handle_crash+0x2c/0x38
[   81.898971] LR is at sysrq_handle_crash+0x2c/0x38
[   81.907005] pc : [<c042660c>]    lr : [<c042660c>]    psr: 60000013
[   81.916687] sp : d6d67e68  ip : d6d67e68  fp : d6d67e7c
[   81.925267] r10: 00000000  r9 : d6d66000  r8 : 00000000
[   81.933788] r7 : 00000063  r6 : 00000003  r5 : 00000001  r4 : 00000000
[   81.943607] r3 : 00000000  r2 : c0c1a338  r1 : c0c0e424  r0 : 00000063
[   81.953388] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   81.963837] Control: 00c5387d  Table: 16c9c008  DAC: 00000051
[   81.972893] Process sh (pid: 685, stack limit = 0xd6d66208)
[   81.981773] Stack: (0xd6d67e68 to 0xd6d68000)
[   81.989443] 7e60:                   c0c3fa70 c0c0e484 d6d67ea4 d6d67e80 c0426b90 c04265ec
[   82.004386] 7e80: 00000000 00000002 d76af300 00000001 00000000 00000002 d6d67ebc d6d67ea8
[   82.019689] 7ea0: c0427000 c0426b0c d6d67f80 00000000 d6d67edc d6d67ec0 c027cdc8 c0426fa0
[   82.035044] 7ec0: d6d05cc0 c027cd50 01ff4210 d6d67f80 d6d67f4c d6d67ee0 c0229bc4 c027cd5c
[   82.050756] 7ee0: 01ff6214 c0c07d34 d6d67fb0 00c5387d 00002000 000001ff d6d67fac d6d67f08
[   82.066796] 7f00: c0101218 c0113948 c0155dc0 c015807c d742f000 c0229e08 d742f000 c0158c9c
[   82.082895] 7f20: d6d67f44 c022b8c8 d6d67f4c 00000002 d6d05cc0 01ff4210 d6d67f80 c01080c4
[   82.099235] 7f40: d6d67f7c d6d67f50 c0229e54 c0229b94 c0244e0c c0244d78 d6d05cc0 d6d05cc0
[   82.116012] 7f60: 00000002 01ff4210 c01080c4 d6d66000 d6d67fa4 d6d67f80 c022a014 c0229d94
[   82.133122] 7f80: 00000000 00000000 01ff4210 00000002 00000001 00000004 00000000 d6d67fa8
[   82.150582] 7fa0: c0107f00 c0229fc8 01ff4210 00000002 00000001 01ff4210 00000002 00472874
[   82.168377] 7fc0: 01ff4210 00000002 00000001 00000004 00000002 00000000 0047a5ac bedf3a34
[   82.186311] 7fe0: 00000000 bedf3814 004719dc b6e7289c 60000010 00000001 00000000 00000000
[   82.204488] [<c042660c>] (sysrq_handle_crash) from [<c0426b90>] (__handle_sysrq+0x90/0x124)
[   82.222959] [<c0426b90>] (__handle_sysrq) from [<c0427000>] (write_sysrq_trigger+0x6c/0x7c)
[   82.241430] [<c0427000>] (write_sysrq_trigger) from [<c027cdc8>] (proc_reg_write+0x78/0x8c)
[   82.259924] [<c027cdc8>] (proc_reg_write) from [<c0229bc4>] (__vfs_write+0x3c/0x130)
[   82.277801] [<c0229bc4>] (__vfs_write) from [<c0229e54>] (vfs_write+0xcc/0x158)
[   82.290303] [<c0229e54>] (vfs_write) from [<c022a014>] (SyS_write+0x58/0x98)
[   82.302510] [<c022a014>] (SyS_write) from [<c0107f00>] (ret_fast_syscall+0x0/0x48)
[   82.319947] Code: e3a04000 e5835000 ee074f9a ebf3bc33 (e5c45000)
[   82.331582] ---[ end trace b2f0b85a6d453129 ]---
[   82.351320] Kernel panic - not syncing: Fatal exception
[   82.371279] Rebooting in 10 seconds..
Uncompressing Linux... done, booting the kernel.
[    0.000000] Booting Linux on physical CPU 0x0
<..>

After reboot, look at the log and delete it (or it stays):

Code: Select all

# cat /sys/fs/pstore/dmesg-ramoops-0
Oops#1 Part1
<6>[    0.000000] Booting Linux on physical CPU 0x0
<5>[    0.000000] Linux version 4.14.0-rc3+ ([email protected]) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #7 Sat Dec 2 14:45:34 CET 2017
<...>
<6>[   17.380893] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
<6>[   19.000970] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1
<6>[   81.809076] sysrq: SysRq : Trigger a crash
<1>[   81.816251] Unable to handle kernel NULL pointer dereference at virtual address 00000000
<1>[   81.830647] pgd = d6c9c000
<1>[   81.836479] [00000000] *pgd=00000000
<0>[   81.843303] Internal error: Oops: 805 [#1] ARM
<4>[   81.850946] Modules linked in: gpio_backlight mi0283qt mipi_dbi tinydrm backlight bcm2835_rng rng_core
<4>[   81.866822] CPU: 0 PID: 685 Comm: sh Not tainted 4.14.0-rc3+ #7
<4>[   81.876142] Hardware name: BCM2835
<4>[   81.882888] task: d7785400 task.stack: d6d66000
<4>[   81.890821] PC is at sysrq_handle_crash+0x2c/0x38
<4>[   81.898971] LR is at sysrq_handle_crash+0x2c/0x38
<4>[   81.907005] pc : [<c042660c>]    lr : [<c042660c>]    psr: 60000013
<4>[   81.916687] sp : d6d67e68  ip : d6d67e68  fp : d6d67e7c
<4>[   81.925267] r10: 00000000  r9 : d6d66000  r8 : 00000000
<4>[   81.933788] r7 : 00000063  r6 : 00000003  r5 : 00000001  r4 : 00000000
<4>[   81.943607] r3 : 00000000  r2 : c0c1a338  r1 : c0c0e424  r0 : 00000063
<4>[   81.953388] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
<4>[   81.963837] Control: 00c5387d  Table: 16c9c008  DAC: 00000051
<0>[   81.972893] Process sh (pid: 685, stack limit = 0xd6d66208)
<0>[   81.981773] Stack: (0xd6d67e68 to 0xd6d68000)
<0>[   81.989443] 7e60:                   c0c3fa70 c0c0e484 d6d67ea4 d6d67e80 c0426b90 c04265ec
<0>[   82.004386] 7e80: 00000000 00000002 d76af300 00000001 00000000 00000002 d6d67ebc d6d67ea8
<0>[   82.019689] 7ea0: c0427000 c0426b0c d6d67f80 00000000 d6d67edc d6d67ec0 c027cdc8 c0426fa0
<0>[   82.035044] 7ec0: d6d05cc0 c027cd50 01ff4210 d6d67f80 d6d67f4c d6d67ee0 c0229bc4 c027cd5c
<0>[   82.050756] 7ee0: 01ff6214 c0c07d34 d6d67fb0 00c5387d 00002000 000001ff d6d67fac d6d67f08
<0>[   82.066796] 7f00: c0101218 c0113948 c0155dc0 c015807c d742f000 c0229e08 d742f000 c0158c9c
<0>[   82.082895] 7f20: d6d67f44 c022b8c8 d6d67f4c 00000002 d6d05cc0 01ff4210 d6d67f80 c01080c4
<0>[   82.099235] 7f40: d6d67f7c d6d67f50 c0229e54 c0229b94 c0244e0c c0244d78 d6d05cc0 d6d05cc0
<0>[   82.116012] 7f60: 00000002 01ff4210 c01080c4 d6d66000 d6d67fa4 d6d67f80 c022a014 c0229d94
<0>[   82.133122] 7f80: 00000000 00000000 01ff4210 00000002 00000001 00000004 00000000 d6d67fa8
<0>[   82.150582] 7fa0: c0107f00 c0229fc8 01ff4210 00000002 00000001 01ff4210 00000002 00472874
<0>[   82.168377] 7fc0: 01ff4210 00000002 00000001 00000004 00000002 00000000 0047a5ac bedf3a34
<0>[   82.186311] 7fe0: 00000000 bedf3814 004719dc b6e7289c 60000010 00000001 00000000 00000000
<4>[   82.204488] [<c042660c>] (sysrq_handle_crash) from [<c0426b90>] (__handle_sysrq+0x90/0x124)
<4>[   82.222959] [<c0426b90>] (__handle_sysrq) from [<c0427000>] (write_sysrq_trigger+0x6c/0x7c)
<4>[   82.241430] [<c0427000>] (write_sysrq_trigger) from [<c027cdc8>] (proc_reg_write+0x78/0x8c)
<4>[   82.259924] [<c027cdc8>] (proc_reg_write) from [<c0229bc4>] (__vfs_write+0x3c/0x130)
<4>[   82.277801] [<c0229bc4>] (__vfs_write) from [<c0229e54>] (vfs_write+0xcc/0x158)
<4>[   82.290303] [<c0229e54>] (vfs_write) from [<c022a014>] (SyS_write+0x58/0x98)
<4>[   82.302510] [<c022a014>] (SyS_write) from [<c0107f00>] (ret_fast_syscall+0x0/0x48)
<0>[   82.319947] Code: e3a04000 e5835000 ee074f9a ebf3bc33 (e5c45000)
<4>[   82.331582] ---[ end trace b2f0b85a6d453129 ]---

# rm /sys/fs/pstore/*

It even worked when I rebooted by quickly unplugging the power, ram didn't loose it's contents!

ramoops can be useful when we move to vc4, because DRM drivers don't output panic messages (maybe if fbcon is displayed, I don't remember). I did send some patches, but it didn't get any traction.

I haven't tried it, but this seems to be the thing for those that need to log this automatically: https://github.com/abrt/abrt/wiki

Refs:
https://www.kernel.org/doc/html/v4.14/a ... moops.html
https://www.kernel.org/doc/Documentatio ... amoops.txt


So finally my question: Is there a preferred place in ram to put this? I just picked a random address.