Page 1 of 2

SD Card Mystery

Posted: Thu Jun 19, 2014 5:02 pm
by hidekiai
First the scenario of what troubles I am having, followed by what I have tried/tested/verified...

We're using several Pi's (i.e. clusters of 16 Pi's to each sets) in a production where it needs to run 24/7 and it'll be constantly be streaming videos (reading, not writing) from the SD card. Similar to others whom has mentioned at http://raspberrypi.stackexchange.com/qu ... my-sd-card that wants to run their applications for long period of time, such as security camera 24/7 kind of situation.

After few months of running Pi's in the lab while developing, we've began to encounter Pi's failing to run. We kept on returning the Pi's thinking it had failed due to heat, power supplies, or other anomalies of running 24/7, but our supplier have constantly told us that most of the Pi's we have returned are fine.

In the end, we have noticed that some SD cards would boot on certain Pi's and fail on another (100% success or failure). I've verified that the '/proc/cpuinfo' shows all Pi's to be of rev "0x000e" so it's not an issue of China versus England factories which some have claimed faults for reliabilities, or whether it is Samsung RAM, etc. This is justified by the fact that same exact dd'ed SD card images that are dd'ed to a reliable SD cards will work on all Pi's (the RAM issue was whether some people used older kernel versus new, for us, our kernel and drivers are working on all Pi's as long as the SD card is reliable).

I've concluded that it must be due to bad SD cards because when I dd the SD card (i.e. 'dd if=/dev/mmcblk0 of=badcard.img bs=4M') as an image of a card that won't boot on some Pi's and then dd it to a new or good SD card (some are now thinking, "how can you read that bad SD card for dd'ing?", I've been fortunate to have a SD card reader on my laptop that can read them I guess, luck? When I do fsck to the first partition of the card via laptop, I sometimes get dirty bit error, sometimes I don't), the image extracted from non-bootable SD card on some Pi's (and written to a good SD card) will boot up on all Pi's (I've tried up to 7 Pi's in our lab, in which 4 has this finicky issue of not wanting to boot some old/used SD cards). Incidentally, the red-herring was because all the Pi's that has the number "1325" on the back works and the ones that are "1318" and "1308" did not...

Hence, the mystery (the subject title), why would this worn out card work on some Pi's and not on another? The same card that was running for about a month, which suddenly fails to boot, inserted to another Pi that are not as finicky, would boot 100% (always) and fail on the finicky ones 100% (always)? It would be much more settling if the card did not work on *all* Pi's so we can conclude that it's just a bad worn out card...

For now, we're writing it off as a worn out SD card and doing our best to identify what daemon are doing lots of writing (i.e. /var/logs and /tmp) and hopefully attempt to make the SD card last as long as we can to stay maintenance free.

If there are other suggestions, comments, ideas, and even sanity verifications to solving this mystery, I would highly appreciate it.

Sources and References:
[*] http://raspberrypi.stackexchange.com/qu ... my-sd-card
[*] http://www.makeuseof.com/tag/extend-lif ... s-sd-card/

Re: SD Card Mystery

Posted: Thu Jun 19, 2014 6:30 pm
by ilvalle
Hi hidekiai,
I don't have an answer for you, but I have just the same issues, on my desk I have several sd card in this conditions, I've been using the raspberry-pi for long term deployment (see: http://traffic.integreen-life.bz.it/). In my case all partitions are mounted in read-only mode, given that I don't think it is related to the fact of writing on it too much. With dd I tried to fill the sd card with 0 but nothing changed.
I've been using only two types of sd card: the transcend sdhc 4gb/8gb class 10 and the one from element14 with debian6 pre-installed. What about your cards?
Which power supply are you using? I recently started to use http://www.amazon.it/Nuovo-Alimentatore ... spberry+pi but it seems to me that it provides too much voltage (On the rpi board the voltage I measured it is above 5.40V)

Re: SD Card Mystery

Posted: Thu Jun 19, 2014 10:07 pm
by hidekiai
ilvalle wrote:...snip...
element14 with debian6 pre-installed. What about your cards?
Which power supply are you using? I recently started to use http://www.amazon.it/Nuovo-Alimentatore ... spberry+pi but it seems to me that it provides too much voltage (On the rpi board the voltage I measured it is above 5.40V)
Hello, thank you very much for your response. We generally seem to be favoring the Transcend SDHC Class 10 8GB cards. They do (mostly) last for several months, but we cannot mount ours as "ro" (we do honor "noatime") because we do require it to write to it sometimes, though we do have "/home" as a separate (3rd) partition, being it a part of the same SD card means that it still writes to it and surely it will wear it (though you can argue that partition 1, commonly vfat can stay "ro" and can at least stay boot-able...)

As for the powersupply, from our experiences, we usually can determine bad p/s within first few hours. Once they are running, we haven't had any issues for month or more (we keep it on 24/7). We don't use any of the USBs so it shouldn't drain power nor fluctuate much. Again, inserting a "good" SD card will make the problematic/finicky Pi's run for days without problems, so we can eliminate the p/s issue.

In regards to the kernel, we are using:

Code: Select all

pi@demopi1:~$ uname -a
Linux demopi1 3.10.24 #2 PREEMPT Mon Dec 23 05:18:12 UTC 2013 armv6l GNU/Linux

pi@demopi1:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 7.1 (n/a)
Release:        7.1
Codename:       n/a
Again, it should not be an issue since this same image is used on all Pi's (although one thing that irks me about this version of kernel we're using is that I cannot use "iotop" because the config is not setup for it, though "dstat" works so I can get some idea of disk writing going on).

We do not underclock (though we do overclock slightly) because we are also aware that it can cause writing corruptions on the SD card.

Thanks (once again) for responding, and please keep any suggestions coming!

P/S: I am not sure if it is the same power supply, but I did see some of the USB power cables that our suppliers provided to have an Amazon logo *grin* And it surely looks similar.

Re: SD Card Mystery

Posted: Fri Jun 20, 2014 8:23 am
by ilvalle
I've encountered the issue with different kernels. Actually, in my former post I forgot to say that In the same time I changed not only the power supply but also the kernel (from 3.10.* up to 3.12.21).
Today I tried a simple test: trying to use a sd card I marked as 'damaged' with new and fresh raspberry-pi: 14 out of 16 booted!
The strange thing is that, the other 2 raspberry booted with a new sd card (same model, capacity, speed, brand etc)
Hope it helps.

Re: SD Card Mystery

Posted: Fri Jun 20, 2014 8:40 am
by RaTTuS
are the SDcards physically bent a bit ...

Re: SD Card Mystery

Posted: Mon Jun 23, 2014 2:50 pm
by hidekiai
RaTTuS wrote:are the SDcards physically bent a bit ...
Indeed, a good point to bring up, but as mentioned, no matter how much I wiggle the contact of the card or not, the Pi's that do not boot up won't boot up (always, as in 100%), yet the Pi's that will always boot up will... if it was bent or contact is sensitive, won't it be intermittent on all Pi's?

Re: SD Card Mystery

Posted: Mon Jun 23, 2014 4:20 pm
by hampi
The problem can be the Raspberry Pi too - not always the ignorant user or wrong type of SD card. The RPi has been tested at factory to boot atleast once, but that does not mean that it will always boot.

Here is my story

http://www.raspberrypi.org/forums/viewt ... 28&t=78648

and it explains the MFT (Magic Finger Trick).

Re: SD Card Mystery

Posted: Mon Jun 23, 2014 5:09 pm
by hidekiai
hampi wrote:The problem can be the Raspberry Pi too - not always the ignorant user or wrong type of SD card. The RPi has been tested at factory to boot atleast once, but that does not mean that it will always boot.

Here is my story

http://www.raspberrypi.org/forums/viewt ... 28&t=78648

and it explains the MFT (Magic Finger Trick).
Thanks for the lead. If it is due to the MFT by pressing hard on the SD card, won't it be for every SD cards on that board at intermittent chances?

In our case, it is that certain SD cards will boot up 100% on some Pi's and that exact card won't boot up 100% on other finicky Pi's (giving the operator an impression of "this card booted up fine on that Pi, so this Pi must be defective"), but brand new SD cards (of the same exact image with same kernel, card capacity, brand/make, etc) will boot up on all (and we have about 20+ Pi's in our lab) finicky or not (without pressing hard on the SD card or anything, it just does)...

We're also all on the rev-version "0x000e", they are not RevA.

Re: SD Card Mystery

Posted: Mon Jun 23, 2014 5:39 pm
by hampi
This is a different version of MFT and not the classical "press the SD card to make better contacts". The scientifically correct way is to use a logic analyser or a scope to see the data transfer and signal levels between the SoC and SD-card. Most likely the problem is due to a hardware issue IMHO.

Re: SD Card Mystery

Posted: Mon Jun 23, 2014 8:23 pm
by Cancelor
Some cards are bent concave, some are bent convex.
Some of the centre contacts are weak some of the outer contacts are weak.

IF this really is the area causing failure (we are by no means sure) then it can be the combination of any number of things.

Re: SD Card Mystery

Posted: Mon Jun 23, 2014 9:11 pm
by hidekiai
hampi wrote:This is a different version of MFT and not the classical "press the SD card to make better contacts". The scientifically correct way is to use a logic analyser or a scope to see the data transfer and signal levels between the SoC and SD-card. Most likely the problem is due to a hardware issue IMHO.
That's over my head, but just for self education, is that still the case if/when all 16 Pi's were running for several months (i.e. since November, 2013 - about 8 months) without any issues? Or is it that these finicky Pi's has been this way with some SD cards, but we did not notice it until just recently when SD cards started getting old?

And the logic analyzer will tell us what? That the card is old, or the card no longer will transfer signal on certain ones? I do not know how to explain this to my superiors if they ask me why I need to have a logic analyzer... Perhaps we'll pass this information on to the distributor who provides us with the Pi's...

Re: SD Card Mystery

Posted: Mon Jun 23, 2014 10:37 pm
by AndrewS
hidekiai wrote:This is justified by the fact that same exact dd'ed SD card images that are dd'ed to a reliable SD cards will work on all Pi's
Dunno if it's the same issue or not, but I recently had a problem very similar to that where a card would get into an unbootable state, and dd-ing from the non-booting card onto an identical card of the same model, and the new card would work fine!
It got fixed by this commit https://github.com/raspberrypi/firmware ... 6f0db58728

Re: SD Card Mystery

Posted: Mon Jun 23, 2014 11:22 pm
by hidekiai
AndrewS wrote:
hidekiai wrote:This is justified by the fact that same exact dd'ed SD card images that are dd'ed to a reliable SD cards will work on all Pi's
Dunno if it's the same issue or not, but I recently had a problem very similar to that where a card would get into an unbootable state, and dd-ing from the non-booting card onto an identical card of the same model, and the new card would work fine!
It got fixed by this commit https://github.com/raspberrypi/firmware ... 6f0db58728
firmware: bootcode: increase sdcard timeout to avoid observed boot failure under certain conditions
Holy cow! That's kernel v3.12? We're currently (locked) to using v3.10.24. If I can burn a v3.12 (or higher) kernel to many SD cards and this problem goes away, then I think your fix is what we need! Thank you so much for the great lead! I wonder if I can just set the "boot_delay" in '/boot/config.txt' to give similar effect (though I highly doubt it).

Once again, thank you VERY MUCH for this lead!

Re: SD Card Mystery

Posted: Mon Jun 23, 2014 11:27 pm
by AndrewS
You can't copy *just* the kernel, you'll need to copy the matching kernel modules too... :geek:

Re: SD Card Mystery

Posted: Tue Jun 24, 2014 7:30 am
by hampi
AndrewS wrote:You can't copy *just* the kernel, you'll need to copy the matching kernel modules too... :geek:
Sure. Just apt-get update, apt-get upgrade and reboot. The 3.12.* seems to be the production version now.

Re: SD Card Mystery

Posted: Tue Jun 24, 2014 7:45 am
by hampi
hidekiai wrote:And the logic analyzer will tell us what?
I tried to debug the mysterious SoC myself. From the Raspberry Pi circuit diagram it seems that four bit SD mode bus is used. You can connect the logic analyser to the pins CLK, CMD, DAT0, DAT1, DAT2 and DAT3. You can then trigger on falling or raising edge on CLK and see some data in DAT? lines. If nothing happens there is no clock signal generated by SoC. If you see some signals you can try to see if there is any obvious problem in them or how long it takes before the problem appears. All in all it is quite useful and can be used to debug other serial interfaces too.

Re: SD Card Mystery

Posted: Tue Jun 24, 2014 10:04 am
by AndrewS
https://github.com/raspberrypi/linux/issues/415 also discusses debugging the SD interface using a low-level analyser.

Re: SD Card Mystery

Posted: Tue Jun 24, 2014 2:57 pm
by hidekiai
AndrewS wrote:You can't copy *just* the kernel, you'll need to copy the matching kernel modules too... :geek:
Hahaha indeed (I hope you are joking right?) I always tell people that drivers (and other *.ko) are one of the unique creatures that are between O/S and userland apps. I'm a Gentoo user, I (fortunately or unfortunately, depending on the day I successfully or unsuccessfully boot after rebuilding my kernel) have been building kernel (no science involved, just "make all & make install & make modules_install ..." etc tweaking .config and "make menuconfig") since about a year after Gentoo emerged (pun intended *grin*). I'm also aware of how annoying libc, glib, and other gcc libs are so strongly tied to O/S headers sometimes, that the spaghetti of dependencies are making me want to switch to Debian or Fedora (there's something wrong when you start using ldd religiously and freak out if you cannot find ldd and strace in a distro *grin*)

Once again, thanks for your advices!

P/S: One of the reasons why I do not use Gentoo (or even Arch) on the Raspberry Pi (I did for about 6 months or so) is that it takes quite a long time to compile/build on the Pi (yes, I know how to cross-compile ARM on my x86_64) so I figured I'd let the smart peoples who maintain the distros compile and verify dependencies on their time and dime, and only need to invest my time downloading...

Re: SD Card Mystery

Posted: Tue Jun 24, 2014 3:02 pm
by hidekiai
AndrewS wrote:https://github.com/raspberrypi/linux/issues/415 also discusses debugging the SD interface using a low-level analyser.
Thanks (again) for this gem of info, it has made more stronger justifications!

Re: SD Card Mystery

Posted: Tue Jun 24, 2014 7:56 pm
by hidekiai
hampi wrote:
AndrewS wrote:You can't copy *just* the kernel, you'll need to copy the matching kernel modules too... :geek:
Sure. Just apt-get update, apt-get upgrade and reboot. The 3.12.* seems to be the production version now.
Or follow http://www.raspberrypi.org/forums/viewt ... re#p551316 for Debian hybrids... Or compile by hand :)

Or for Raspian/Debian hybrids that doesn't have rpi-update, follow https://github.com/Hexxeh/rpi-update...

Re: SD Card Mystery

Posted: Tue Jun 24, 2014 10:02 pm
by hidekiai
Upgrading to kernel v3.12.x did not help...

Summary:
* Have 3 finicky Pi's and 1 flexible Pi (to prove that the SD card works)
* Upgraded from firmware at https://github.com/raspberrypi/firmware manually (I sort of mimicked the script written by Hexxoh but hand-copy)
* Tried Debian hybrid from http://raspbian.org/ of the latest 3.12 kernel (http://downloads.raspberrypi.org/raspbian_latest) via dd'ing the image to known old/unreliable SD card

I first tried manually upgrading the kernel from the firmware (I should have done the dd'ed image first, made more sense to try something with less potential chances of human error) by just updating the images in '/boot', '/opt', and '/lib/modules' to an existing SD card with v3.10 kernel+modules.

Booted it (did 'uname -a') and launched our application on the "good" Pi just to make sure that kernel, the ko, and lib all works. I then plugged this SD card onto the 3 Pi's that won't work. Result: none of the 3 Pi's liked the SD card.

Next, I dd'ed the latest Raspian image, expanded the SD card, changed it's hostname, enabled the SSH daemon, and even did apt-get update; rebooted it at least 2 more times to verify that it boots up always (yes, I did "uname -a" to verify kernel version). Plugged in this SD card to the 3 Pi's and none booted.

I finally then, to verify that the 3 Pi's aren't faulty, inserted a "good" SD card, in which it booted off on all 4 Pi's.

In conclusion (so far), the issue is still a mystery to me of why it's this way. All we can do is request our customers to update with new SD card every 6 to 8 months or so (probably longer than that, now that I've moved most of the stuffs that writes to the SD card as tmpfs and only intermittently will write persistently to SD card).

thanks && peace!

Re: SD Card Mystery

Posted: Wed Jun 25, 2014 12:36 am
by AndrewS
hidekiai wrote:Upgrading to kernel v3.12.x did not help...
To clarify, I believe the specific thing that fixed it for me was in the "firmware" (i.e. bootcode.bin and start.elf) and not necessarily the newer 3.12 kernel.
Next, I dd'ed the latest Raspian image, expanded the SD card, changed it's hostname, enabled the SSH daemon, and even did apt-get update; rebooted it at least 2 more times to verify that it boots up always (yes, I did "uname -a" to verify kernel version). Plugged in this SD card to the 3 Pi's and none booted.
But if the latest 2014-06-20 Raspbian (which already includes the latest firmware) still doesn't work on your "faulty" Pis, but does work on your "good" Pis, then you've obviously found some different bug that hasn't been fixed yet! :? :(

Re: SD Card Mystery

Posted: Wed Jun 25, 2014 11:08 am
by MaxK1
what was the result of adding boot_wait=1 to config.txt to the "bad" sd card? other than taking an extra second to boot ;-) Can you post the contents of your config.txt and cmdline.txt? (I assume the good/working SD card/Pi setups are identical. I'm not sure what I'm fishing for - Samsung/Hynix/Micron
memory issue, MMC issue or just another path to investigate... )

Re: SD Card Mystery

Posted: Wed Jun 25, 2014 3:56 pm
by hidekiai
MaxK1 wrote:what was the result of adding boot_wait=1 to config.txt to the "bad" sd card? other than taking an extra second to boot ;-) Can you post the contents of your config.txt and cmdline.txt? (I assume the good/working SD card/Pi setups are identical. I'm not sure what I'm fishing for - Samsung/Hynix/Micron
memory issue, MMC issue or just another path to investigate... )
Firstly, thank you for responding and suggesting. To clarify, whether it was of virgin copy of the latest Raspian (dated and released few days ago) or the version we are locked to back in December of last year, it fails on the "bad" cards. The virgin copy of Raspian is completely left as-is and not even the config.txt or the command-line are altered. And yes, you are correct, we've written a mass-duplicator application tool for our manufacturer which will mass-duplicate (up to 20 SD cards) and update '/etc/hostname', but they all come from single raw dd'ed image file (as you've mentioned, "setups are identical").

But I do believe you may be on to something on the config.txt/cmdline.txt route. Commonly, when the ACT light flashes immediately when the USB power is plugged in, then it'll work. If the ACT do not flash even for a glimpse, then it's blocked.

Our config.txt are as follows:

Code: Select all

arm_freq=800
force_turbo=1
gpu_mem=128
boot_delay=5
#display_rotate=2
disable_splash=1
disable_overscan=1
start_file=start_x.elf
fixup_file=fixup_x.dat
Normally, we do not need to rotate the display (thus it's commented out) but there were some monitors that our distributor provided that required this, thus when our field operator sees the images flipped, we can just recommend to have that uncommented.

We are using "boot_delay", I have even attempted to search-engine for the "boot_wait", so I am assuming you have meant "boot_delay"? We were already setting this to the value "5" because we had issues with the monitor EDID identification would sometime be delayed with the fact that the monitor would take time to warm up and turn on; else the screen resolution and/or aspect would be invalid.

And our cmdline.txt as follow:

Code: Select all

dwc_otg.lpm_enable=0 root=/dev/mmcblk0p2 rootfstype=ext4 noatime logo.nologo quiet rootwait loglevel=1 sdhci-bcm2708.enable_llm=1 dwc_otg.microframe_schedule=1 dwc_otg.fiq_fix_enable=0 dwc_otg.fiq_split_enable=0 dwc_otg.trans_backoff=3000
As for the Raspian version, we left the '/boot/config.txt' and '/boot/cmdline.txt' as-is (unaltered), but I can post it upon request if needed. Only thing I can point out that is unconventional are the "dwc_otg.trans_backoff=3000" and "logo.nologo", but I think the rest are the defaults.

Majority (probably all) the SD cards we are the Transcend SDHC Class 10 8GB (interestingly, they do not show in http://www.transcend-info.com/Product/MemoryCards , but it's this one: http://www.amazon.com/Transcend-Class-F ... B003VNKNEG) but as mentioned, they have been running fine for about 7 months (since November of last year) nonstop 24/7.

Once again, thank you for your responses and any suggestions would be appreciated.

Re: SD Card Mystery

Posted: Wed Jun 25, 2014 4:04 pm
by hidekiai
AndrewS wrote:...snip...
But if the latest 2014-06-20 Raspbian (which already includes the latest firmware) still doesn't work on your "faulty" Pis, but does work on your "good" Pis, then you've obviously found some different bug that hasn't been fixed yet! :? :(
If that is the case, from your educational guesses, would that be software or hardware bug?

I am told if anybody credible seriously wants to have a look, we can ship the Pi (that always fails) and the SD card (which works on some Pi and not on others) to them. Would you be interested?