daniiy
Posts: 3
Joined: Tue Jun 15, 2021 12:31 pm

Raspberry blocked

Tue Jun 15, 2021 12:34 pm

Hello.

I am doing several tests with different raspberrys and from time to time, the Raspberry crashes. By crashes, I mean that I can't access them via SSH or FTP, but I can ping them from another raspberry.

After these crashes, there are times that without doing anything, after a while they unlock by themselves, but there are times when they don't, and I have to restart them manually.

I would like to know a way to debug the problem so that it does not happen.

User avatar
B.Goode
Posts: 12255
Joined: Mon Sep 01, 2014 4:03 pm
Location: UK

Re: Raspberry blocked

Tue Jun 15, 2021 2:35 pm

daniiy wrote:
Tue Jun 15, 2021 12:34 pm
Hello.

I am doing several tests with different raspberrys and from time to time, the Raspberry crashes. By crashes, I mean that I can't access them via SSH or FTP, but I can ping them from another raspberry.

After these crashes, there are times that without doing anything, after a while they unlock by themselves, but there are times when they don't, and I have to restart them manually.

I would like to know a way to debug the problem so that it does not happen.


It's going to be a mystery to most volunteer helpers here as well....


Unknown models of RPi board, running an unspecified Operating System, connected to some undefined network infrastructure.


Do you have local consoles on the machines? You could run a monitoring tool such as top to watch for odd behaviour prior to or during one of these episodes.

bjtheone
Posts: 1400
Joined: Mon May 20, 2019 11:28 pm
Location: The Frozen North (AKA Canada)

Re: Raspberry blocked

Tue Jun 15, 2021 7:30 pm

Suggestions:

1) disclose way more data on models, os, network config, etc so folks can offer better advice
2) run local monitoring on machines
3) look at log files on local machines
4) run network monitoring to validate connectivity and see when issues are occurring, try and sort out triggering events
5) look for time/process patterns in outages
6) do a deep dive on logs, once you know the timeframe of loss of connectivity
7) look for issues/incidents that line up with your connectivity issues (network backups, large downloads, local cron jobs, dns leases, etc)
8) consider keep alive processes/cron jobs on machines if you can figure out any differences
9) sort out why they sometimes recover (is it multiple issues)

daniiy
Posts: 3
Joined: Tue Jun 15, 2021 12:31 pm

Re: Raspberry blocked

Wed Jun 16, 2021 8:21 am

Hello.

I saw the yesterday syslog, just as this problem occured, and the system was blocked at 19:53:43 as I can see on teh syslog, but then i get the following messages from 20:01:57:

(I cut the log because it was much longer.)

Code: Select all

Jun 15 20:01:57 crd3domar kernel: [63810.020101] rcu: INFO: rcu_sched self-detected stall on CPU
Jun 15 20:01:57 crd3domar kernel: [63810.020125] rcu: 	1-...!: (2099 ticks this GP) idle=a5e/1/0x40000002 softirq=3723424/3723424 fqs=0 
Jun 15 20:01:57 crd3domar kernel: [63810.020139] 	(t=2100 jiffies g=8018113 q=42)
Jun 15 20:01:57 crd3domar kernel: [63810.020154] rcu: rcu_sched kthread starved for 2100 jiffies! g8018113 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
Jun 15 20:01:57 crd3domar kernel: [63810.020165] rcu: 	Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
Jun 15 20:01:57 crd3domar kernel: [63810.020174] rcu: RCU grace-period kthread stack dump:
Jun 15 20:01:57 crd3domar kernel: [63810.020185] task:rcu_sched       state:R  running task     stack:    0 pid:   12 ppid:     2 flags:0x00000000
Jun 15 20:01:57 crd3domar kernel: [63810.020215] Backtrace: 
Jun 15 20:01:57 crd3domar kernel: [63810.020252] [<c0b6d050>] (__schedule) from [<c0b6da28>] (schedule+0x68/0xe4)
Jun 15 20:01:57 crd3domar kernel: [63810.020269]  r10:c137a6e0 r9:eff33540 r8:eff33540 r7:c1205048 r6:c1933ee4 r5:c192ae80
Jun 15 20:01:57 crd3domar kernel: [63810.020279]  r4:ffffe000
Jun 15 20:01:57 crd3domar kernel: [63810.020301] [<c0b6d9c0>] (schedule) from [<c0b71d94>] (schedule_timeout+0x1d0/0x384)
Jun 15 20:01:57 crd3domar kernel: [63810.020312]  r5:c1203d00 r4:0060e039
Jun 15 20:01:57 crd3domar kernel: [63810.020334] [<c0b71bc4>] (schedule_timeout) from [<c029c544>] (rcu_gp_kthread+0x4d0/0xb98)
Jun 15 20:01:57 crd3domar kernel: [63810.020349]  r9:00000000 r8:00000001 r7:ffffe000 r6:c1203d00 r5:00000001 r4:c1276580
Jun 15 20:01:57 crd3domar kernel: [63810.020367] [<c029c074>] (rcu_gp_kthread) from [<c0245a50>] (kthread+0x170/0x174)
Jun 15 20:01:57 crd3domar kernel: [63810.020377]  r7:c1932000
Jun 15 20:01:57 crd3domar kernel: [63810.020394] [<c02458e0>] (kthread) from [<c02000ec>] (ret_from_fork+0x14/0x28)
Jun 15 20:01:57 crd3domar kernel: [63810.020404] Exception stack(0xc1933fb0 to 0xc1933ff8)
Jun 15 20:01:57 crd3domar kernel: [63810.020417] 3fa0:                                     00000000 00000000 00000000 00000000
Jun 15 20:01:57 crd3domar kernel: [63810.020431] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Jun 15 20:01:57 crd3domar kernel: [63810.020443] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
Jun 15 20:01:57 crd3domar kernel: [63810.020458]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c02458e0
Jun 15 20:01:57 crd3domar kernel: [63810.020468]  r4:c18ced80
Jun 15 20:01:57 crd3domar kernel: [63810.020482] Sending NMI from CPU 1 to CPUs 0:
Jun 15 20:01:57 crd3domar kernel: [63810.020813] NMI backtrace for cpu 0
Jun 15 20:01:57 crd3domar kernel: [63810.020819] CPU: 0 PID: 13 Comm: migration/0 Tainted: G         C        5.10.17-v7l+ #1421
Jun 15 20:01:57 crd3domar kernel: [63810.020823] Hardware name: BCM2711
Jun 15 20:01:57 crd3domar kernel: [63810.020827] PC is at rcu_momentary_dyntick_idle+0x34/0x90
Jun 15 20:01:57 crd3domar kernel: [63810.020831] LR is at multi_cpu_stop+0x104/0x17c
Jun 15 20:01:57 crd3domar kernel: [63810.020836] pc : [<c029abd8>]    lr : [<c02e80e4>]    psr: 60000013
Jun 15 20:01:57 crd3domar kernel: [63810.020840] sp : c1935ec0  ip : 00000000  fp : c1935ecc
Jun 15 20:01:57 crd3domar kernel: [63810.020844] r10: 00000000  r9 : a0000013  r8 : c120509c
Jun 15 20:01:57 crd3domar kernel: [63810.020848] r7 : 00000001  r6 : 00000001  r5 : cbfb9dcc  r4 : cbfb9de0
Jun 15 20:01:57 crd3domar kernel: [63810.020852] r3 : eff13240  r2 : eff132c4  r1 : c1935ec0  r0 : 2ee7d000
Jun 15 20:01:57 crd3domar kernel: [63810.020857] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Jun 15 20:01:57 crd3domar kernel: [63810.020861] Control: 30c5383d  Table: 04251b00  DAC: fffffffd
Jun 15 20:01:57 crd3domar kernel: [63810.020866] CPU: 0 PID: 13 Comm: migration/0 Tainted: G         C        5.10.17-v7l+ #1421
Jun 15 20:01:57 crd3domar kernel: [63810.020869] Hardware name: BCM2711
Jun 15 20:01:57 crd3domar kernel: [63810.020873] Backtrace: 
Jun 15 20:01:57 crd3domar kernel: [63810.020881] [<c0b63050>] (dump_backtrace) from [<c0b633e4>] (show_stack+0x20/0x24)
Jun 15 20:01:57 crd3domar kernel: [63810.020885]  r7:ffffffff r6:00000000 r5:60000193 r4:c12e69fc
Jun 15 20:01:57 crd3domar kernel: [63810.020889] [<c0b633c4>] (show_stack) from [<c0b67774>] (dump_stack+0xcc/0xf8)
Jun 15 20:01:57 crd3domar kernel: [63810.020894] [<c0b676a8>] (dump_stack) from [<c0208b6c>] (show_regs+0x1c/0x20)
Jun 15 20:01:57 crd3domar kernel: [63810.020898]  r10:00000000 r9:c1934000 r8:c1810800 r7:00000001 r6:2ee7d000 r5:c1935e70
Jun 15 20:01:57 crd3domar kernel: [63810.020902]  r4:00000000 r3:8343686f
Jun 15 20:01:57 crd3domar kernel: [63810.020907] [<c0208b50>] (show_regs) from [<c076a63c>] (nmi_cpu_backtrace+0xb0/0xf4)
Jun 15 20:01:57 crd3domar kernel: [63810.020911] [<c076a58c>] (nmi_cpu_backtrace) from [<c020ee58>] (do_handle_IPI+0x50/0x340)
Jun 15 20:01:57 crd3domar kernel: [63810.020915]  r5:00000000 r4:00000007
Jun 15 20:01:57 crd3domar kernel: [63810.020919] [<c020ee08>] (do_handle_IPI) from [<c020f170>] (ipi_handler+0x28/0x30)
Jun 15 20:01:57 crd3domar kernel: [63810.020924]  r9:c1934000 r8:c1810800 r7:00000001 r6:2ee7d000 r5:00000017 r4:c1804d80
Jun 15 20:01:57 crd3domar kernel: [63810.020929] [<c020f148>] (ipi_handler) from [<c028dd78>] (handle_percpu_devid_fasteoi_ipi+0x80/0x154)
Jun 15 20:01:57 crd3domar kernel: [63810.020934] [<c028dcf8>] (handle_percpu_devid_fasteoi_ipi) from [<c0286ddc>] (generic_handle_irq+0x44/0x54)
Jun 15 20:01:57 crd3domar kernel: [63810.020938]  r7:00000001 r6:00000000 r5:00000000 r4:c1094d10
Jun 15 20:01:57 crd3domar kernel: [63810.020942] [<c0286d98>] (generic_handle_irq) from [<c028753c>] (__handle_domain_irq+0x6c/0xc4)
Jun 15 20:01:57 crd3domar kernel: [63810.020947] [<c02874d0>] (__handle_domain_irq) from [<c020135c>] (gic_handle_irq+0x90/0xa4)
Jun 15 20:01:57 crd3domar kernel: [63810.020951]  r9:c1934000 r8:c1094d1c r7:c1935e70 r6:f081400c r5:f0814000 r4:c1205b3c
Jun 15 20:01:57 crd3domar kernel: [63810.020956] [<c02012cc>] (gic_handle_irq) from [<c0200abc>] (__irq_svc+0x5c/0x7c)
Jun 15 20:01:57 crd3domar kernel: [63810.020960] Exception stack(0xc1935e70 to 0xc1935eb8)
Jun 15 20:01:57 crd3domar kernel: [63810.020964] 5e60:                                     2ee7d000 c1935ec0 eff132c4 eff13240
Jun 15 20:01:57 crd3domar kernel: [63810.020969] 5e80: cbfb9de0 cbfb9dcc 00000001 00000001 c120509c a0000013 00000000 c1935ecc
Jun 15 20:01:57 crd3domar kernel: [63810.020973] 5ea0: 00000000 c1935ec0 c02e80e4 c029abd8 60000013 ffffffff
Jun 15 20:01:57 crd3domar kernel: [63810.020978]  r9:c1934000 r8:c120509c r7:c1935ea4 r6:ffffffff r5:60000013 r4:c029abd8
Jun 15 20:01:57 crd3domar kernel: [63810.020983] [<c029aba4>] (rcu_momentary_dyntick_idle) from [<c02e80e4>] (multi_cpu_stop+0x104/0x17c)
Jun 15 20:01:57 crd3domar kernel: [63810.020987] [<c02e7fe0>] (multi_cpu_stop) from [<c02e7e60>] (cpu_stopper_thread+0x90/0x164)
Jun 15 20:01:57 crd3domar kernel: [63810.020992]  r10:cbfb9de4 r9:eff0ca90 r8:eff0ca98 r7:cbfb9dcc r6:c02e7fe0 r5:ffffe000
Jun 15 20:01:57 crd3domar kernel: [63810.020995]  r4:eff0ca8c
Jun 15 20:01:57 crd3domar kernel: [63810.021000] [<c02e7dd0>] (cpu_stopper_thread) from [<c024a19c>] (smpboot_thread_fn+0xfc/0x1b8)
Jun 15 20:01:57 crd3domar kernel: [63810.021005]  r10:c1909dcc r9:00000000 r8:00000001 r7:c127d0dc r6:00000000 r5:ffffe000
Jun 15 20:01:57 crd3domar kernel: [63810.021008]  r4:c18ced80
Jun 15 20:01:57 crd3domar kernel: [63810.021012] [<c024a0a0>] (smpboot_thread_fn) from [<c0245a50>] (kthread+0x170/0x174)
Jun 15 20:01:57 crd3domar kernel: [63810.021017]  r9:c18ced80 r8:c024a0a0 r7:c1934000 r6:00000000 r5:c18cee40 r4:c18cee80
Jun 15 20:01:57 crd3domar kernel: [63810.021021] [<c02458e0>] (kthread) from [<c02000ec>] (ret_from_fork+0x14/0x28)
Jun 15 20:01:57 crd3domar kernel: [63810.021025] Exception stack(0xc1935fb0 to 0xc1935ff8)
Jun 15 20:01:57 crd3domar kernel: [63810.021030] 5fa0:                                     00000000 00000000 00000000 00000000
Jun 15 20:01:57 crd3domar kernel: [63810.021035] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Jun 15 20:01:57 crd3domar kernel: [63810.021039] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
Jun 15 20:01:57 crd3domar kernel: [63810.021043]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c02458e0
Jun 15 20:01:57 crd3domar kernel: [63810.021047]  r4:c18cee40
Jun 15 20:01:57 crd3domar kernel: [63810.021499] NMI backtrace for cpu 1
Jun 15 20:01:57 crd3domar kernel: [63810.021513] CPU: 1 PID: 774 Comm: CFtpDomARCRD.ru Tainted: G         C        5.10.17-v7l+ #1421
Jun 15 20:01:57 crd3domar kernel: [63810.021522] Hardware name: BCM2711
Jun 15 20:01:57 crd3domar kernel: [63810.021531] Backtrace: 
Jun 15 20:01:57 crd3domar kernel: [63810.021559] [<c0b63050>] (dump_backtrace) from [<c0b633e4>] (show_stack+0x20/0x24)
Jun 15 20:01:57 crd3domar kernel: [63810.021573]  r7:ffffffff r6:00000000 r5:60000193 r4:c12e69fc
Jun 15 20:01:57 crd3domar kernel: [63810.021590] [<c0b633c4>] (show_stack) from [<c0b67774>] (dump_stack+0xcc/0xf8)
Jun 15 20:01:57 crd3domar kernel: [63810.021608] [<c0b676a8>] (dump_stack) from [<c076a654>] (nmi_cpu_backtrace+0xc8/0xf4)
Jun 15 20:01:57 crd3domar kernel: [63810.021622]  r10:80000193 r9:c0e23070 r8:c0c02058 r7:c0c02060 r6:00000001 r5:00000000
Jun 15 20:01:57 crd3domar kernel: [63810.021633]  r4:00000001 r3:8343686f
Jun 15 20:01:57 crd3domar kernel: [63810.021653] [<c076a58c>] (nmi_cpu_backtrace) from [<c076a7a8>] (nmi_trigger_cpumask_backtrace+0x128/0x140)
Jun 15 20:01:57 crd3domar kernel: [63810.021663]  r5:c1205aec r4:c020ecb0
Jun 15 20:01:57 crd3domar kernel: [63810.021682] [<c076a680>] (nmi_trigger_cpumask_backtrace) from [<c020fbdc>] (arch_trigger_cpumask_backtrace+0x20/0x24)
Jun 15 20:01:57 crd3domar kernel: [63810.021694]  r7:c1276580 r6:c12050a4 r5:c1205148 r4:00000001
Jun 15 20:01:57 crd3domar kernel: [63810.021711] [<c020fbbc>] (arch_trigger_cpumask_backtrace) from [<c0b6511c>] (rcu_dump_cpu_stacks+0x10c/0x144)
Jun 15 20:01:57 crd3domar kernel: [63810.021727] [<c0b65010>] (rcu_dump_cpu_stacks) from [<c029f538>] (rcu_sched_clock_irq+0x7c0/0xa38)
Jun 15 20:01:57 crd3domar kernel: [63810.021741]  r10:c1203d00 r9:2ee91000 r8:00000000 r7:c1096240 r6:c1096240 r5:eff27240
Jun 15 20:01:57 crd3domar kernel: [63810.021750]  r4:c1276580
Jun 15 20:01:57 crd3domar kernel: [63810.021768] [<c029ed78>] (rcu_sched_clock_irq) from [<c02accdc>] (update_process_times+0x70/0x9c)

daniiy
Posts: 3
Joined: Tue Jun 15, 2021 12:31 pm

Re: Raspberry blocked

Fri Jun 18, 2021 6:50 am

Can anyone guide me, please? :|

RonR
Posts: 2260
Joined: Tue Apr 12, 2016 10:29 pm
Location: US

Re: Raspberry blocked

Fri Jun 18, 2021 8:18 am

daniiy wrote:
Fri Jun 18, 2021 6:50 am
Can anyone guide me, please? :|

Here's one approach

Return to “Beginners”