First the scenario of what troubles I am having, followed by what I have tried/tested/verified...
We're using several Pi's (i.e. clusters of 16 Pi's to each sets) in a production where it needs to run 24/7 and it'll be constantly be streaming videos (reading, not writing) from the SD card. Similar to others whom has mentioned at http://raspberrypi.stackexchange.com/qu ... my-sd-card that wants to run their applications for long period of time, such as security camera 24/7 kind of situation.
After few months of running Pi's in the lab while developing, we've began to encounter Pi's failing to run. We kept on returning the Pi's thinking it had failed due to heat, power supplies, or other anomalies of running 24/7, but our supplier have constantly told us that most of the Pi's we have returned are fine.
In the end, we have noticed that some SD cards would boot on certain Pi's and fail on another (100% success or failure). I've verified that the '/proc/cpuinfo' shows all Pi's to be of rev "0x000e" so it's not an issue of China versus England factories which some have claimed faults for reliabilities, or whether it is Samsung RAM, etc. This is justified by the fact that same exact dd'ed SD card images that are dd'ed to a reliable SD cards will work on all Pi's (the RAM issue was whether some people used older kernel versus new, for us, our kernel and drivers are working on all Pi's as long as the SD card is reliable).
I've concluded that it must be due to bad SD cards because when I dd the SD card (i.e. 'dd if=/dev/mmcblk0 of=badcard.img bs=4M') as an image of a card that won't boot on some Pi's and then dd it to a new or good SD card (some are now thinking, "how can you read that bad SD card for dd'ing?", I've been fortunate to have a SD card reader on my laptop that can read them I guess, luck? When I do fsck to the first partition of the card via laptop, I sometimes get dirty bit error, sometimes I don't), the image extracted from non-bootable SD card on some Pi's (and written to a good SD card) will boot up on all Pi's (I've tried up to 7 Pi's in our lab, in which 4 has this finicky issue of not wanting to boot some old/used SD cards). Incidentally, the red-herring was because all the Pi's that has the number "1325" on the back works and the ones that are "1318" and "1308" did not...
Hence, the mystery (the subject title), why would this worn out card work on some Pi's and not on another? The same card that was running for about a month, which suddenly fails to boot, inserted to another Pi that are not as finicky, would boot 100% (always) and fail on the finicky ones 100% (always)? It would be much more settling if the card did not work on *all* Pi's so we can conclude that it's just a bad worn out card...
For now, we're writing it off as a worn out SD card and doing our best to identify what daemon are doing lots of writing (i.e. /var/logs and /tmp) and hopefully attempt to make the SD card last as long as we can to stay maintenance free.
If there are other suggestions, comments, ideas, and even sanity verifications to solving this mystery, I would highly appreciate it.
Sources and References:
[*] http://raspberrypi.stackexchange.com/qu ... my-sd-card
[*] http://www.makeuseof.com/tag/extend-lif ... s-sd-card/