rraszews
Posts: 8
Joined: Sat Sep 01, 2012 3:09 am

RPi 3 (not +) Page Allocation Stalls and Lockup During Heavy IO

Thu Jul 12, 2018 10:19 pm

Hi,

I've been having an issue where my Raspberry Pi 3 is either announcing a Page Allocation Stall or just locking up outright. It doesn't happen every time, but when it does happen, it's coincident with copying a large file from an external USB hard drive to an NFS mount.

I have a daemon running on the pi which regularly checks a website to see when a certain file is published. When it is, it downloads it to the local USB hard drive. Once the download is complete, it moves it to the NFS mount. When the issue triggers, it's copying a file around 1 gb, but like I said, it doesn't happen every time.

I'm running Raspbian Jesse Lite. I started seeing the issue with 4.9.35-v7+. I updated to 4.14.37-v7+ and it seemed like the problem went away; I didn't have any trouble for about a month or so. Then this weekend, I had a power outage that forced it to reboot, and on Wednesday the problem came back.

Here's the timeline:
Saturday Morning: Rebooted.
Tuesday Morning: Successful move of a large(1.6gb) file
Tuesday Evening: Successful move of a small (400mb) file
Wednesday Morning: Page allocation stall while moving a large (1gb) file
Wednesday Evening: Successful moves of two small (300mb) files
Thursday Morning: Page allocation stall while moving a large (1gb) file
Later Thursday Morning: Lockup while moving a large (1.5gb) file
Just now: Successful move of a small (400mb) file

During the lockup, I think the pi still responded to pings, but was otherwise unresponsive. I wasn't at home when it happened this time, so I couldn't check, but in the past, "ping -w1 -c1 hostname >> /dev/null && echo ALIVE" printed "ALIVE" (That's part of my network monitoring script) , the open SSH session froze, and though I had a monitor plugged in, the screen had powered down at the time and didn't wake up when I plugged in a keyboard and tapped on it.

Here's my cmdline.txt:

Code: Select all

 dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait quiet
And /proc/cmdline:

Code: Select all

8250.nr_uarts=0 bcm2708_fb.fbwidth=592 bcm2708_fb.fbheight=448 bcm2708_fb.fbdepth=16 bcm2708_fb.fbswap=1 vc_mem.mem_base=0x3f000000 vc_mem.mem_size=0x3f600000  dwc_otg.lpm_enable=0 console=ttyS0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait quiet
And here's what it logged

Code: Select all

Jul 11 06:53:35 toymaker kernel: [364784.153338] kthreadd: page allocation stalls for 11090ms, order:1, mode:0x15000c0(GFP_KERNEL_ACCOUNT), nodemask=(null)
Jul 11 06:53:35 toymaker kernel: [364784.153370] kthreadd cpuset=/ mems_allowed=0
Jul 11 06:53:35 toymaker kernel: [364784.153395] CPU: 2 PID: 2 Comm: kthreadd Tainted: G         C      4.14.37-v7+ #1111
Jul 11 06:53:35 toymaker kernel: [364784.153400] Hardware name: BCM2835
... (Omitted backtrace for brevity, will paste if asked)
Jul 11 06:53:35 toymaker kernel: [364784.153593] Mem-Info:
Jul 11 06:53:35 toymaker kernel: [364784.153615] active_anon:117408 inactive_anon:103930 isolated_anon:0
Jul 11 06:53:35 toymaker kernel: [364784.153615]  active_file:7018 inactive_file:6808 isolated_file:0
Jul 11 06:53:35 toymaker kernel: [364784.153615]  unevictable:0 dirty:0 writeback:10956 unstable:2129
Jul 11 06:53:35 toymaker kernel: [364784.153615]  slab_reclaimable:2240 slab_unreclaimable:3393
Jul 11 06:53:35 toymaker kernel: [364784.153615]  mapped:1213 shmem:647 pagetables:689 bounce:0
Jul 11 06:53:35 toymaker kernel: [364784.153615]  free:6028 free_pcp:0 free_cma:1696
Jul 11 06:53:35 toymaker kernel: [364784.153632] Node 0 active_anon:469632kB inactive_anon:415720kB active_file:28072kB inactive_file:27232kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:4852kB dirty:0kB writeback:43824kB shmem:2588kB writeback_tmp:0kB unstable:8516kB all_unreclaimable? no
Jul 11 06:53:35 toymaker kernel: [364784.153654] Normal free:24112kB min:16384kB low:20480kB high:24576kB active_anon:469888kB inactive_anon:415520kB active_file:28100kB inactive_file:27232kB unevictable:0kB writepending:43824kB present:1021952kB managed:1000184kB mlocked:0kB kernel_stack:968kB pagetables:2756kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:6784kB
Jul 11 06:53:35 toymaker kernel: [364784.153657] lowmem_reserve[]: 0 0
Jul 11 06:53:35 toymaker kernel: [364784.153672] Normal: 462*4kB (UMEC) 242*8kB (UMEHC) 92*16kB (UMEHC) 44*32kB (UMEHC) 23*64kB (UMEC) 12*128kB (UEHC) 11*256kB (UMEHC) 7*512kB (UMEHC) 2*1024kB (HC) 3*2048kB (HC) 0*4096kB = 24264kB
Jul 11 06:53:35 toymaker kernel: [364784.153765] 15251 total pagecache pages
Jul 11 06:53:35 toymaker kernel: [364784.153774] 771 pages in swap cache
Jul 11 06:53:35 toymaker kernel: [364784.153781] Swap cache stats: add 34278, delete 33507, find 69001/78480
Jul 11 06:53:35 toymaker kernel: [364784.153786] Free swap  = 2064636kB
Jul 11 06:53:35 toymaker kernel: [364784.153791] Total swap = 2097148kB
Jul 11 06:53:35 toymaker kernel: [364784.153795] 255488 pages RAM
Jul 11 06:53:35 toymaker kernel: [364784.153800] 0 pages HighMem/MovableOnly
Jul 11 06:53:35 toymaker kernel: [364784.153804] 5442 pages reserved
Jul 11 06:53:35 toymaker kernel: [364784.153809] 2048 pages cma reserved
There's several more messages along the same lines from other process immediately afterward.

Anyone have any wisdom about how to fix this? Or what more information would help?

rraszews
Posts: 8
Joined: Sat Sep 01, 2012 3:09 am

Re: RPi 3 (not +) Page Allocation Stalls and Lockup During Heavy IO

Fri Jul 13, 2018 9:19 pm

Update: Still happening on 4.14.54-v7+. Didn't happen the first time I did a large file copy, but it did the second time.

Edit:
Another thing I should mention. Every day, I do an rsync of the root filesystem (not the external USB drive) to another system (different from the system that I'm copying the large file to via NFS). There's nothing preventing that from coinciding with the large file copy, but it hasn't happened to work out that way in the time I have logging for. The rsync is run by the remote machine and connects over ssh. I'm going to try switching the NFS filesystem I'm copying my large files to to sshfs to see if I still get stalls/lockups.

Return to “Troubleshooting”

Who is online

Users browsing this forum: Baidu [Spider] and 36 guests