User avatar
mogul
Posts: 4
Joined: Fri Jul 06, 2012 9:21 am

Pi failing torture test

Sun Jul 08, 2012 5:13 am

Quite often my Pi corrupts it's file system during reboot. When this has happened I have to write a new image to the SD card.

I have tried a wide range of possible combinations within the list below:
- 3 different USB power supplies, TP1 measures 4.85v, 4,92v and 5.01v, I chose to continue with the last two ones.
- 3 different SD cards, a Transcend class10, a Sandisk class4 and a no-name class4
- both HDMI and composite output
- only a keyboard connected to the USB ports
- 3 different distributions, the default debian6-19-04-2012, archlinuxarm-13-06-2012, and 2012-06-18-wheezy-beta
- written images to SD cards from two different machines, a linux and a windows.

To reproduce the problem I do the following:
- install a fresh image to SD card
- run updates and rpi-update (I have tried without doing this also)
- in .ssh/authorized_keys add a key pointing back to my main desktop linux
- from my desktop linux i run a script like this:
N=0; while true; do let N++; echo `date` : $N; ssh root@rpi /sbin/reboot; sleep 90; done
Then it's only a matter of time before the Pi wont boot any longer. It ranges from 3 to 81 turns so far.
The 90 seconds delay seems to be more than enough to let the Pi come alive and stabilize before being hit again.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5749
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Pi failing torture test

Sun Jul 08, 2012 9:23 am

So after it fails to boot, is that permanent? Does power cycling make it boot?
Can you copy the files from boot partition off before the test and then diff them after it's failed, and see which files are different. Check if windows/Linux spots any problems when scanning for errors.
Can you just confirm whether your reboot test ever writes to boot partition? From your description it sounds like it doesn't, so corruption is surprising.

User avatar
mogul
Posts: 4
Joined: Fri Jul 06, 2012 9:21 am

Re: Pi failing torture test

Sun Jul 08, 2012 11:32 am

dom wrote:So after it fails to boot, is that permanent?
Yes, until I load a new image on the SD card and start over
dom wrote:Does power cycling make it boot?
Nope
dom wrote:Can you copy the files from boot partition off before the test and then diff them after it's failed, and see which files are different.
Just did the opposite, sha1sum'ed the 13 files on the can-not-boot-card, reloaded an arch linux image, and before moving the card to the Pi, did the same sha1sum on the files on the boot partition. 13 identical files!
dom wrote:Check if windows/Linux spots any problems when scanning for errors. Can you just confirm whether your reboot test ever writes to boot partition?
Boot partition seems untouched.
I will try to do the same on the root partition, check summing all files there, reboot-loop the pi to death and the sha1sum all the files on the card again. I expect it to be random files that changes.
dom wrote:From your description it sounds like it doesn't, so corruption is surprising.
Perhaps I was not perfectly clear in my initial post. The symptoms looks like it's the root file system that gets corrupt, like if the reboot sequence forgets to unmount or sync the file system before the big reset flushes the kernel.
A friend of mine, TinHead http://letsmakerobots.com/user/3886, suggested to add an additional sync before and after rc.shutdown remounts root as readonly. Did it on the arch linux, it's more hackable.. did not help.

The "can not boot" symptom is not quite identical every time, sometimes the boot sequence sprays error messages insanely fast, other times the console simply ignores anything I type at it. But every time the SSH login is not working. (shat how i detect the problem, my external script fails)

User avatar
mogul
Posts: 4
Joined: Fri Jul 06, 2012 9:21 am

Re: Pi failing torture test

Thu Jul 12, 2012 6:21 am

Now I have come to the conclusion that my Rasp-Pi board is defective. With help from an other Raspberry Pi owner, we did the following test:

[*] wrote the Arch linux image to a 8GB class-10 transcend SD card
[*] made a crontab line to to call an external web-server for tracking and then reboot the board every 5'th minute.
[*] installed the sd-card in a working rasp-pi
[*] after 24 hours the logfile on the external webserver showed 288 hits.

[*] now we replaced the working board with mine. Same psu, same ethernet, same place on desk, same SD card.
[*] After 54 cycles my rasp-pi failed to boot, and the filesystem on the SD card is damaged beyond repair (by fsck at least)

Think it's Farnell time now...

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5749
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Pi failing torture test

Thu Jul 12, 2012 9:05 am

@mogul
I'd be interested if you could repeat the test a few more times to be sure it wasn't just random variation.
I guess you could have unreliable memory, or a dodgy joint on an sdcard pin.
However the corruption of files on FAT partition is something a number of people have seen, whilst having no problems on EXT4 partition which doesn't make a lot of sense, except as a software problem.
Any evidence of corruption on EXT4 partition?

User avatar
jojopi
Posts: 3478
Joined: Tue Oct 11, 2011 8:38 pm

Re: Pi failing torture test

Thu Jul 12, 2012 10:12 am

dom wrote:Any evidence of corruption on EXT4 partition?
As I read it, mogul's corruption is entirely on the ext4 partition. This is what you would expect, since it is much larger and the only filesystem being written in the tests described.

To test for bad ram, I would recommend booting wheezy with a 224/32 split; "sudo apt-get install memtester"; then "sudo memtester 176M".

User avatar
mogul
Posts: 4
Joined: Fri Jul 06, 2012 9:21 am

Re: Pi failing torture test

Thu Jul 12, 2012 2:00 pm

dom wrote: I'd be interested if you could repeat the test a few more times to be sure it wasn't just random variation.
As my first post indicates this has already been done. To this point, according to my notes, I have made more than 15 installations. using different SD-cards, PSU's and linux distributions.
dom wrote: I guess you could have unreliable memory, or a dodgy joint on an sdcard pin.
I don't know much about hardware, you are possible right there, which again just add more weight to my conclusion: board is defective.
dom wrote: However the corruption of files on FAT partition is something a number of people have seen, whilst having no problems on EXT4 partition which doesn't make a lot of sense, except as a software problem.
Any evidence of corruption on EXT4 partition?
[*] System cannot boot, kernel loads fine but fails to mount root filesystem
[*] trying to repair ext4 filesystem on a "real" linux machine yields an extraordinary long list of corrective actions, and afterwards many files have vanished. The errors does not only involve "expected files in RW" but also static library files that should not be open for write.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 5749
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Pi failing torture test

Thu Jul 12, 2012 3:17 pm

@mogul
Seems most likely faulty hardware. (the "more reliable on another pi" test is quite convincing). RMA.

nmalinoski
Posts: 2
Joined: Mon Feb 25, 2013 12:35 am

Re: Pi failing torture test

Mon Feb 25, 2013 12:58 am

Thread, I command you to rise from the dead!

My Pi is exhibiting similar behavior as described, using a brand-new SanDisk 16GB MicroSD card (SU016G?), rated both Class 10 and UHS-1.

I copy boot files to the card (Have tried only BerryBoot and the Raspbian installer), and it boots just fine; system seems to operate with no issues, until I reboot, at which point the green ACT LED comes on and stays stead. Power cycle does not help; I have to pop the card in my PC and re-copy the boot files. If I copy only bootcode.bin, then the ACT light blinks 3 times, signalling that it cannot find start.elf. Copying the boot files only seems to serve as a bandaid; it'll let me boot once, but once I reboot, again, the ACT light comes on and stays on and the board refuses to boot. I feel I should also add that when I run memtester (Image pulled and booted via BerryBoot) from this 16GB drive, it only seems to be detecting half the available RAM--it only allocates about 190MB.

Unlike the original poster, however, I have a 1GB normal SD card that doesn't suffer from this behavior; I can reboot as often as I want, successfully, and memtester requests/allocates 450/422MB, which I believe are the amounts it should be allocating.

I've been running H2testw on the 16GB card for about half an hour now on endless verify without issue, and I didn't have any issues formatting or reading the card in my phone (So it doesn't sound like it'd be a total waste of money if it turns out it isn't compatible with the RasPi). Still, I'm not sure why the Pi would behave differently with the cards.

Any thoughts?

nmalinoski
Posts: 2
Joined: Mon Feb 25, 2013 12:35 am

Re: Pi failing torture test

Tue Feb 26, 2013 5:08 am

Update: I was able to successfully install the card images via Win32 Disk Imager; I tried RiscOS, Arch, then Raspbian using the 16GB SanDisk card, and I was able to successfully reboot all of them.

I still find it strange that BerryBoot would cause my Pi to behave like it did; I'll have to do some experimentation, or just wait for a new version to come out.

Return to “Troubleshooting”