A bit of a "hit" for those of use with machines that are stable at 500Mhzfirmware: platform: Pi3 B+ reduce sdram freq to 450 while investigations are ongoing
See: viewtopic.php?f=28&t=208821

PeterO
A bit of a "hit" for those of use with machines that are stable at 500Mhzfirmware: platform: Pi3 B+ reduce sdram freq to 450 while investigations are ongoing
See: viewtopic.php?f=28&t=208821
It would be interesting to see performance figures comparing the settings, I suspect very little difference.PeterO wrote: ↑Fri Apr 20, 2018 7:51 amFrom https://github.com/raspberrypi/firmware ... 4376902cbdA bit of a "hit" for those of use with machines that are stable at 500Mhzfirmware: platform: Pi3 B+ reduce sdram freq to 450 while investigations are ongoing
See: viewtopic.php?f=28&t=208821![]()
PeterO
You can always add to /boot/config.txt
We know it's not solved for everyone, and we are still working on it. Dropping SDRAM to 400, does that help?e-raser wrote: ↑Fri Apr 20, 2018 10:48 amMine is dead again, all services down. Probably in emergency mode after automatic reboot (no console access currently). And I‘m running latest updates with SDRAM = 450 MHz. Not solved, not even a workaround. When I swap the SD card into my good old gorgeously stable Pi 2, system runs like forever, Pi 3 B+ only made ~ 12 hours since the apt updates initiated (manual) reboot yesterday.
This. Really. Sucks.
I have a 2nd Pi 3 B+ here now and will try this one first. Unfortunately I think we owners can’t check if two Pi‘s are from the same batch right? Well at least I have different resellers and different order times, so finger‘s crossed. But the 1st one will very likely be sent back to the seller (if you don’t need it for analyzing).jamesh wrote: ↑Fri Apr 20, 2018 12:39 pmWe know it's not solved for everyone, and we are still working on it. Dropping SDRAM to 400, does that help?e-raser wrote: ↑Fri Apr 20, 2018 10:48 amMine is dead again, all services down. Probably in emergency mode after automatic reboot (no console access currently). And I‘m running latest updates with SDRAM = 450 MHz. Not solved, not even a workaround. When I swap the SD card into my good old gorgeously stable Pi 2, system runs like forever, Pi 3 B+ only made ~ 12 hours since the apt updates initiated (manual) reboot yesterday.
This. Really. Sucks.
I'll check to see if we need any more, but we have a few in house now, so unlikely. Return to supplier will probably be the best bet. Sorry about that - hope the new one behaves better!e-raser wrote: ↑Fri Apr 20, 2018 2:06 pmI have a 2nd Pi 3 B+ here now and will try this one first. Unfortunately I think we owners can’t check if two Pi‘s are from the same batch right? Well at least I have different resellers and different order times, so finger‘s crossed. But the 1st one will very likely be sent back to the seller (if you don’t need it for analyzing).jamesh wrote: ↑Fri Apr 20, 2018 12:39 pmWe know it's not solved for everyone, and we are still working on it. Dropping SDRAM to 400, does that help?e-raser wrote: ↑Fri Apr 20, 2018 10:48 amMine is dead again, all services down. Probably in emergency mode after automatic reboot (no console access currently). And I‘m running latest updates with SDRAM = 450 MHz. Not solved, not even a workaround. When I swap the SD card into my good old gorgeously stable Pi 2, system runs like forever, Pi 3 B+ only made ~ 12 hours since the apt updates initiated (manual) reboot yesterday.
This. Really. Sucks.
I'm no very low level HW expert, so I'll pass on answering that one!
When we do qualification testing of a new piece of silicon, we make boards with split lots on. These are special SoCs that have been "skewed" in the semiconductor manufacturing process to emulate the maximum differences in performance/speed/power that you would get with production parts.
Code: Select all
Apr 22 16:03:12 raspberry kernel: [61359.155507] swapper/0: page allocation failure: order:0, mode:0x1080020(GFP_ATOMIC), nodemask=(null)
Apr 22 16:03:15 raspberry kernel: [61359.251047] swapper/0 cpuset=/ mems_allowed=0
Apr 22 16:03:15 raspberry kernel: [61359.323400] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G C 4.14.34-v7+ #1110
Apr 22 16:03:15 raspberry kernel: [61359.384753] Hardware name: BCM2835
Apr 22 16:03:15 raspberry kernel: [61359.445324] [<8010ffd8>] (unwind_backtrace) from [<8010c240>] (show_stack+0x20/0x24)
Apr 22 16:03:15 raspberry kernel: [61359.506234] [<8010c240>] (show_stack) from [<807840a4>] (dump_stack+0xd4/0x118)
Apr 22 16:03:15 raspberry kernel: [61359.567230] [<807840a4>] (dump_stack) from [<80228268>] (warn_alloc+0xcc/0x17c)
Apr 22 16:03:15 raspberry kernel: [61359.627754] [<80228268>] (warn_alloc) from [<80229428>] (__alloc_pages_nodemask+0x105c/0x11e0)
Apr 22 16:03:15 raspberry kernel: [61359.688349] [<80229428>] (__alloc_pages_nodemask) from [<8027491c>] (new_slab+0x454/0x558)
Apr 22 16:03:15 raspberry kernel: [61359.749393] [<8027491c>] (new_slab) from [<80276760>] (___slab_alloc.constprop.11+0x228/0x2c0)
Apr 22 16:03:15 raspberry kernel: [61359.810705] [<80276760>] (___slab_alloc.constprop.11) from [<8027683c>] (__slab_alloc.constprop.10+0x44/0x90)
Apr 22 16:03:15 raspberry kernel: [61359.872349] [<8027683c>] (__slab_alloc.constprop.10) from [<80276fd4>] (kmem_cache_alloc+0x1f4/0x230)
Apr 22 16:03:15 raspberry kernel: [61359.934059] [<80276fd4>] (kmem_cache_alloc) from [<806735e8>] (__alloc_skb+0x4c/0x144)
Apr 22 16:03:16 raspberry kernel: [61359.995596] [<806735e8>] (__alloc_skb) from [<806771e8>] (__netdev_alloc_skb+0x50/0x158)
Apr 22 16:03:16 raspberry kernel: [61360.057025] [<806771e8>] (__netdev_alloc_skb) from [<8059ec2c>] (rx_submit.constprop.8+0x34/0x1e4)
Apr 22 16:03:16 raspberry kernel: [61360.118575] [<8059ec2c>] (rx_submit.constprop.8) from [<8059ef80>] (rx_complete+0x1a4/0x1a8)
Apr 22 16:03:16 raspberry kernel: [61360.180008] [<8059ef80>] (rx_complete) from [<805aefe0>] (__usb_hcd_giveback_urb+0x80/0x160)
Apr 22 16:03:16 raspberry kernel: [61360.241375] [<805aefe0>] (__usb_hcd_giveback_urb) from [<805af10c>] (usb_hcd_giveback_urb+0x4c/0xfc)
Apr 22 16:03:16 raspberry kernel: [61360.302644] [<805af10c>] (usb_hcd_giveback_urb) from [<805d9264>] (completion_tasklet_func+0x6c/0x98)
Apr 22 16:03:16 raspberry kernel: [61360.363989] [<805d9264>] (completion_tasklet_func) from [<805e83b4>] (tasklet_callback+0x20/0x24)
Apr 22 16:03:16 raspberry kernel: [61360.425505] [<805e83b4>] (tasklet_callback) from [<80123bbc>] (tasklet_hi_action+0x74/0x10c)
Apr 22 16:03:16 raspberry kernel: [61360.487122] [<80123bbc>] (tasklet_hi_action) from [<80101694>] (__do_softirq+0x18c/0x3d8)
Apr 22 16:03:16 raspberry kernel: [61360.548731] [<80101694>] (__do_softirq) from [<80123794>] (irq_exit+0xe0/0x144)
Apr 22 16:03:16 raspberry kernel: [61360.610510] [<80123794>] (irq_exit) from [<80175534>] (__handle_domain_irq+0x70/0xc4)
Apr 22 16:03:16 raspberry kernel: [61360.672536] [<80175534>] (__handle_domain_irq) from [<80101504>] (bcm2836_arm_irqchip_handle_irq+0xa8/0xac)
Apr 22 16:03:16 raspberry kernel: [61360.734882] [<80101504>] (bcm2836_arm_irqchip_handle_irq) from [<8079fcbc>] (__irq_svc+0x5c/0x7c)
Apr 22 16:03:16 raspberry kernel: [61360.797532] Exception stack(0x80c01ef0 to 0x80c01f38)
Apr 22 16:03:16 raspberry kernel: [61360.860571] 1ee0: 00000000 ee03b258 3a3a9000 00000000
Apr 22 16:03:16 raspberry kernel: [61360.923947] 1f00: 80c00000 80c03dcc 80c03d68 80c88172 00000001 80b60a30 bb7ffa40 80c01f4c
Apr 22 16:03:16 raspberry kernel: [61360.987636] 1f20: 80c04174 80c01f40 80108a4c 80108a50 60000013 ffffffff
Apr 22 16:03:16 raspberry kernel: [61361.051417] [<8079fcbc>] (__irq_svc) from [<80108a50>] (arch_cpu_idle+0x34/0x4c)
Apr 22 16:03:16 raspberry kernel: [61361.115545] [<80108a50>] (arch_cpu_idle) from [<8079f434>] (default_idle_call+0x34/0x48)
Apr 22 16:03:16 raspberry kernel: [61361.180149] [<8079f434>] (default_idle_call) from [<801611cc>] (do_idle+0xd8/0x150)
Apr 22 16:03:16 raspberry kernel: [61361.243845] [<801611cc>] (do_idle) from [<801614e0>] (cpu_startup_entry+0x28/0x2c)
Apr 22 16:03:16 raspberry kernel: [61361.306412] [<801614e0>] (cpu_startup_entry) from [<80799184>] (rest_init+0xbc/0xc0)
Apr 22 16:03:16 raspberry kernel: [61361.367752] [<80799184>] (rest_init) from [<80b00df8>] (start_kernel+0x3d4/0x3e0)
Apr 22 16:03:16 raspberry kernel: [61361.428015] Mem-Info:
Apr 22 16:03:16 raspberry kernel: [61361.487314] active_anon:111703 inactive_anon:111675 isolated_anon:183
Apr 22 16:03:16 raspberry kernel: [61361.487314] active_file:528 inactive_file:624 isolated_file:0
Apr 22 16:03:16 raspberry kernel: [61361.487314] unevictable:440 dirty:0 writeback:8465 unstable:0
Apr 22 16:03:16 raspberry kernel: [61361.487314] slab_reclaimable:4247 slab_unreclaimable:4666
Apr 22 16:03:16 raspberry kernel: [61361.487314] mapped:4424 shmem:4135 pagetables:2055 bounce:0
Apr 22 16:03:16 raspberry kernel: [61361.487314] free:944 free_pcp:328 free_cma:0
Apr 22 16:03:16 raspberry kernel: [61361.835271] Node 0 active_anon:446812kB inactive_anon:446700kB active_file:2112kB inactive_file:2496kB unevictable:1760kB isolated(anon):732kB isolated(file):0kB mapped:17696kB dirty:0kB writeback:33860kB shmem:16540kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Apr 22 16:03:16 raspberry kernel: [61361.955015] Normal free:3776kB min:3900kB low:4872kB high:5844kB active_anon:446812kB inactive_anon:446700kB active_file:2076kB inactive_file:2472kB unevictable:1760kB writepending:33980kB present:983040kB managed:961632kB mlocked:1760kB kernel_stack:2696kB pagetables:8220kB bounce:0kB free_pcp:1312kB local_pcp:352kB free_cma:0kB
Apr 22 16:03:16 raspberry kernel: [61362.146908] lowmem_reserve[]: 0 0
Apr 22 16:03:16 raspberry kernel: [61362.211899] Normal: 10*4kB (H) 10*8kB (H) 10*16kB (H) 4*32kB (H) 4*64kB (H) 1*128kB (H) 1*256kB (H) 1*512kB (H) 0*1024kB 1*2048kB (H) 0*4096kB = 3608kB
Apr 22 16:03:16 raspberry kernel: [61362.277609] 24605 total pagecache pages
Apr 22 16:03:16 raspberry kernel: [61362.343908] 18973 pages in swap cache
Apr 22 16:03:16 raspberry kernel: [61362.409428] Swap cache stats: add 909699, delete 890700, find 9709847/10167255
Apr 22 16:03:16 raspberry kernel: [61362.474711] Free swap = 1013500kB
Apr 22 16:03:16 raspberry kernel: [61362.539562] Total swap = 1572860kB
Apr 22 16:03:16 raspberry kernel: [61362.603691] 245760 pages RAM
Apr 22 16:03:16 raspberry kernel: [61362.668193] 0 pages HighMem/MovableOnly
Apr 22 16:03:16 raspberry kernel: [61362.732620] 5352 pages reserved
Apr 22 16:03:16 raspberry kernel: [61362.796170] 2048 pages cma reserved
Apr 22 16:03:16 raspberry kernel: [61362.858719] SLUB: Unable to allocate memory on node -1, gfp=0x1080020(GFP_ATOMIC)
Apr 22 16:03:16 raspberry kernel: [61362.922256] cache: kmalloc-192, object size: 192, buffer size: 192, default order: 0, min order: 0
Apr 22 16:03:16 raspberry kernel: [61362.986585] node 0: slabs: 616, objs: 12936, free: 0
Thanks, (is that the shmoo plot that you plot out ?)jdb wrote: ↑Sat Apr 21, 2018 11:42 amWhen we do qualification testing of a new piece of silicon, we make boards with split lots on. These are special SoCs that have been "skewed" in the semiconductor manufacturing process to emulate the maximum differences in performance/speed/power that you would get with production parts.
The split lots are skewed fast/slow and you can also buy qualification samples of LPDDR2 that are also skewed fast/slow. By testing the matrix of possibilities (FF/SF/FS/SS for RAM/SoC) you map out the performance at each "corner" of the semiconductor process. The issue we're seeing here didn't pop up in pre-production testing, which makes it look a lot more like a batch failure or some other cluster-type issue.
The fact that people are getting two Pis in a row from the same supplier and both exhibit the issue would be extremely unlikely if the failure was randomly distributed.
Over the last couple of days I have been doing some more testing.jamesh wrote:
Have you tried the 450 SDRAM freq setting and run memtester to exercise it?
Not sure of the exact reason - I'm not a silicon level HW engineer! Probably a timing/voltage issue between the memory controller and the SDRAM.YorkshireTyke wrote: ↑Tue Apr 24, 2018 7:01 amIn reply....Over the last couple of days I have been doing some more testing.jamesh wrote:
Have you tried the 450 SDRAM freq setting and run memtester to exercise it?
(1) flashed NOOBS onto an SDcard, sudo apt-get update/upgrade, then ran memtester. In crashed within 5 minutes!
(2) As above but setting arm_freq=1200 & sdram_freq=450. Running memtester for about 7 hours.
(3) As above but attached my IQaudio PiDAC+ hat. Using it normally as I would as my media player for streaming BBC radio programmes, podcasts & BBC iPlayer catch-up watching a film or two. So far no problems.
So what is the underlying problem that setting arm_freq & sdram_freq seems to fix and is this problem only limited to a few RPi 3B+ boards?
The Shmoo plot maps out the "stable area" of a set of N adjustable hardware parameters - the config.txt sdram_schmoo=0xN setting twiddles various low-level bits inside the SDRAM PHY to adjust timings/thresholds/drive strengths. Iterating over these settings (which are usually not entirely independent of each other) lets you build a plot of stability across the various knobs, chip silicon speed and voltage.bensimmo wrote: ↑Sun Apr 22, 2018 9:16 pm
Thanks, (is that the scmoo plot that you plot out ?)
My RAM was with reference to Jamesh's "I do wonder (guessing here) if it's down to the particular wafer that the SoC came from.".
I asked as most seem to be asking for RAM 'slowing down' and just wondered.
Does dropping the SDRAM to 450 and putting frequency back up to 1400 still work?baallrog wrote: ↑Tue Apr 24, 2018 4:35 pmHi,
I've got the same problem here.
I've got LibreElec on the RPI3B+ and it freeze after some minutes. This is not random at all.
Adding arm_freq=1200 seems to fix this freezes for me.
If you want some details about the board like serial number or something else, I can provide it.
Hope this can help.
And thank you guys for all the work you are doing.
Adding this sdram_freq=450, doesn't help for me.jamesh wrote: ↑Tue Apr 24, 2018 4:47 pmDoes dropping the SDRAM to 450 and putting frequency back up to 1400 still work?baallrog wrote: ↑Tue Apr 24, 2018 4:35 pmHi,
I've got the same problem here.
I've got LibreElec on the RPI3B+ and it freeze after some minutes. This is not random at all.
Adding arm_freq=1200 seems to fix this freezes for me.
If you want some details about the board like serial number or something else, I can provide it.
Hope this can help.
And thank you guys for all the work you are doing.
Agreed, but As you know, sometimes, Corner lot are not controlled as much as we wished for.jdb wrote: ↑Sat Apr 21, 2018 11:42 amWhen we do qualification testing of a new piece of silicon, we make boards with split lots on. These are special SoCs that have been "skewed" in the semiconductor manufacturing process to emulate the maximum differences in performance/speed/power that you would get with production parts.
The split lots are skewed fast/slow and you can also buy qualification samples of LPDDR2 that are also skewed fast/slow. By testing the matrix of possibilities (FF/SF/FS/SS for RAM/SoC) you map out the performance at each "corner" of the semiconductor process. The issue we're seeing here didn't pop up in pre-production testing, which makes it look a lot more like a batch failure or some other cluster-type issue.
The fact that people are getting two Pis in a row from the same supplier and both exhibit the issue would be extremely unlikely if the failure was randomly distributed.
No, but I agree it would be useful information.blachanc wrote: ↑Tue Apr 24, 2018 8:13 pmI guess you cannot (secret sauce) answer this question:
is the info about the specific wafer ID / X and Y coordinates burned in a per die OTP fuse at wafer probe?
I am asking, because that is a very valuable info to have in conjunction with the wafer process parameter.
It makes thing easier when trying to do a correlation to corner lot results.