Ingen
Posts: 4
Joined: Sun Sep 29, 2019 4:56 pm

SD Card power failure resilience ideas

Sun Sep 29, 2019 8:24 pm

I have a few questions and a potentially novel idea regarding SD Card data integrity in the event of sudden power loss to the Raspberry Pi, particularly in an embedded environment.

Regarding the SD Card (SDC):
The SDC/eMMC communication protocol, the embedded Flash Translation Layer (FTL), and the underlying FLASH memory...

My understanding is that the main cause of SDC filesystem corruptions are due to power failure occurring during critical FTL operations with the FLASH memory. i.e. writing block data, erasing blocks, the wear levelling algorithm etc.

There is excellent documentation available at the SD Association website:
https://www.sdcard.org/downloads/pls/

One document that drew my attention:
Part1_Physical_Layer_Simplified_Specification_Ver6.00.pdf

On page 96
4.6.2 Read, Write and Erase Timeout Conditions
A card shall complete the command within the time period defined as follows or give up and return an error message.
If the host does not get any response with the given timeout it should assume that the card is not going to respond and try to recover (e.g. reset the card, power cycle, reject, etc.).

Read Timeout (max) = 100ms
Write Timeout (max) = 250ms
Erase Timeout (max) = 250ms
This therefore sets an upper bound on the time taken to complete a potentially critical set of tasks by the FTL at 250ms.

>> My proposed idea for handling a system power failure event is:

Rather than maintaining power for the entire board: Raspberry Pi BCM chip, peripherals, memory etc, which requires a large battery or super-capacitor...

[1a] Simply keep a separate 3.3V power supply alive that only drives the SDC, and thus maintain only a modest voltage and current to keep the SDC comms and FLASH activity alive for a short amount of time.
i.e. Tens of milliamps for a few hundred milliseconds.

[1b] The intent is to 'wait-out' the comms dropout for enough time so that the SDC will complete it's current operation and then enter the IDLE state.

[1c] Then gracefully ramp down the SDC 3.3V power supply.

My assumption here is that only the SDC communication pipeline is broken, i.e. CLK, CMD, DAT[3:0]; the SD Card device itself should be kept active for ~500ms

This should result in a few possible scenarios:

[2a] Comms is broken part way through BCM sending a command to the SDC:
SDC should reject the malformed command and go into the Stand-by state.

[2b] Comms is broken immediately after BCM has sent a command + data payload to the SDC:
SDC would commence FLASH interaction, then attempt to return its response.

[2c] Comms is broken after BCM sends command, but only partially sends the data payload:
Would the SDC timeout and go into the Stand-by state, or hang?

The question remains:
When the BCM shuts down and several files are still 'open' in the SDC, or a file read/write operation was 'ragged' (non-atomic), would the EXT4 journaling file system be able to recover from this fault?

Another concern is:
The BCM chip supplies the master clock for the SDC comms protocol (similar to SPI comms). Would the SDC still be able to operate without the CLK input or is that only required for the CMD buffer?

I am aware of several excellent solutions for unexpected Raspberry Pi power loss e.g. a read-only root filesystem, and also some great UPS-style circuits in association with "dtoverlay=gpio-shutdown" to hold up the main power supply long enough to cleanly shut down the system.

I feel there has to be a fine-tuned hardware solution to this, possibly offered as an add-on extra to the basic board.

I'm sure most users would accept a small increase in cost if some basic data resilience features were added to mitigate SD Card filesystem irreparable damage following power failures or accidental power outs, particularly in fully embedded environments where a user has no control over formally shutting down the Raspberry Pi.

W. H. Heydt
Posts: 11016
Joined: Fri Mar 09, 2012 7:36 pm
Location: Vallejo, CA (US)

Re: SD Card power failure resilience ideas

Sun Sep 29, 2019 8:42 pm

The problem here is that SD cards are NOT actually designed to be used as mass storage devices for computers, especial "mains powered" computers. In a phone or camera, you are not likely to get a sudden power drop.

If cards are designed to withstand the issue of dropping power suddenly, it would be a wasted effort for the vast majority of usage, thus making the overwhelming majority of people who don't need the feature pay for it for the few for whom it is useful. This is, by the way, the same general reason why many of the specialized feature requests for Pis are easily dismissed out of hand.

The general solution is: Be sure you have a robust and reliable power supply. This can be easily and inexpensively achieved by using a commercial UPS.

RonR
Posts: 563
Joined: Tue Apr 12, 2016 10:29 pm
Location: US

Re: SD Card power failure resilience ideas

Sun Sep 29, 2019 8:51 pm

Ingen wrote:
Sun Sep 29, 2019 8:24 pm
>> My proposed idea for handling a system power failure event is:

Rather than maintaining power for the entire board: Raspberry Pi BCM chip, peripherals, memory etc, which requires a large battery or super-capacitor...

[1a] Simply keep a separate 3.3V power supply alive that only drives the SDC, and thus maintain only a modest voltage and current to keep the SDC comms and FLASH activity alive for a short amount of time.
i.e. Tens of milliamps for a few hundred milliseconds.

[1b] The intent is to 'wait-out' the comms dropout for enough time so that the SDC will complete it's current operation and then enter the IDLE state.

[1c] Then gracefully ramp down the SDC 3.3V power supply.

My assumption here is that only the SDC communication pipeline is broken, i.e. CLK, CMD, DAT[3:0]; the SD Card device itself should be kept active for ~500ms

The corruption that can occur at the hardware level is only a tiny piece of the problem. Even if you provide a mechanism to ensure the current block write completes successfully, it doesn't solve the problem that the operating system probably has a huge amount of cached filesystem changes that won't be written to SD card. While the card may be logically intact following a loss of power, filesystem structures (directories and file contents) will still be corrupted.

jahboater
Posts: 4778
Joined: Wed Feb 04, 2015 6:38 pm

Re: SD Card power failure resilience ideas

Sun Sep 29, 2019 8:59 pm

RonR wrote:
Sun Sep 29, 2019 8:51 pm
Even if you provide a mechanism to ensure the current block write completes successfully, it doesn't solve the problem that the operating system probably has a huge amount of cached filesystem changes that won't be written to SD card. While the card may be logically intact following a loss of power, filesystem structures (directories and file contents) will still be corrupted.
Perhaps add something like commit=1 to the ext4 entry in /etc/fstab ?
This should flush the disk cache every second instead of every 5 seconds which is the default.

RonR
Posts: 563
Joined: Tue Apr 12, 2016 10:29 pm
Location: US

Re: SD Card power failure resilience ideas

Sun Sep 29, 2019 9:10 pm

jahboater wrote:
Sun Sep 29, 2019 8:59 pm
RonR wrote:
Sun Sep 29, 2019 8:51 pm
Even if you provide a mechanism to ensure the current block write completes successfully, it doesn't solve the problem that the operating system probably has a huge amount of cached filesystem changes that won't be written to SD card. While the card may be logically intact following a loss of power, filesystem structures (directories and file contents) will still be corrupted.
Perhaps add something like commit=1 to the ext4 entry in /etc/fstab ?
This should flush the disk cache every second instead of every 5 seconds which is the default.

This would simply impact overall performance. An inevitable failure/corruption would still be in your future.

Fault tolerance for power failures is best solved with a UPS and appropriate shutdown software (Network UPS Tools [NUT] works very well on Raspberry Pi's).

ejolson
Posts: 3724
Joined: Tue Mar 18, 2014 11:47 am

Re: SD Card power failure resilience ideas

Sun Sep 29, 2019 11:00 pm

RonR wrote:
Sun Sep 29, 2019 9:10 pm
jahboater wrote:
Sun Sep 29, 2019 8:59 pm
RonR wrote:
Sun Sep 29, 2019 8:51 pm
Even if you provide a mechanism to ensure the current block write completes successfully, it doesn't solve the problem that the operating system probably has a huge amount of cached filesystem changes that won't be written to SD card. While the card may be logically intact following a loss of power, filesystem structures (directories and file contents) will still be corrupted.
Perhaps add something like commit=1 to the ext4 entry in /etc/fstab ?
This should flush the disk cache every second instead of every 5 seconds which is the default.

This would simply impact overall performance. An inevitable failure/corruption would still be in your future.

Fault tolerance for power failures is best solved with a UPS and appropriate shutdown software (Network UPS Tools [NUT] works very well on Raspberry Pi's).
Agreed. The exact problem of potentially hundreds of IoT devices needing safe nonvolatile memory is what FOG and edge computing is all about.

Instead of creating 100 battery backup systems for the 100 IoT devices, one installs a single battery-backed Pi 4B on premises that then functions as data storage for all the other devices. Since all data is written and stored in the FOG, the IoT devices never need to write to flash memory. Since the FOG device is on premises, then Internet outages do not affect daily operations.

This is a simple application of the principle of subsidiarity in the context of computer networking:
  • The IoT device itself is not competent due to the possiblity of flash memory corruption from unexpected power loss.
  • A cloud hosted on the Internet is not local enough to provide resiliency from network disruption.
Therefore, a single on-premise Raspberry Pi backed by a suitable uninterruptible power supply turns out to be the most-local competent authority available to provide the needed storage.

Ingen
Posts: 4
Joined: Sun Sep 29, 2019 4:56 pm

Re: SD Card power failure resilience ideas

Tue Oct 01, 2019 8:02 am

Thanks for the feedback and comments folks, much appreciated. And I have learned a bit more about the Linux cache (commit). I know that there are several thousand open file descriptors (?), even with just the basic Linux desktop running! But I have been tinkering with an RPI 3B+ for over a year and have accidentally cut off the power a few times (without clean shutdown) and not incurred a failure yet with a 16GB SanDisk SDC.

My team are about to develop a new compact prototype board that needs to use an RPI Compute Module, which I know has a 'Lite' version without the eMMC memory directly populated, so I wondered if it was worth investigating a 'micro' UPS for the FLASH device only. I will report back my findings if we decide to test this idea.

I read @ejolson's OverlayFS read-only root filesystem posting on here a while ago and figured that would be the best option for our new project since the Raspbian desktop will not be visible to the user, and the user data files are to be stored on a USB memory stick. We might even run the system 'headless'. But ultimately, I guess we should really provide a master UPS main power supply and set "dtoverlay=gpio-shutdown" using GPIO3 to give us the ~5 second window to cleanly shutdown the RPI, since the system is required to run as a simple embedded 'black box'.

Andyroo

Re: SD Card power failure resilience ideas

Tue Oct 01, 2019 12:00 pm

Ingen wrote:
Tue Oct 01, 2019 8:02 am
...
and the user data files are to be stored on a USB memory stick. We might even run the system 'headless'.
...
Will the user have access to this USB stick? If so you may hit the same issue as the OS boot SD.

What is to stop the user pulling the USB stick out without ejecting it or even worse - while the OS is writing data to it?

Ingen
Posts: 4
Joined: Sun Sep 29, 2019 4:56 pm

Re: SD Card power failure resilience ideas

Wed Oct 02, 2019 8:29 am

Andyroo wrote:
Tue Oct 01, 2019 12:00 pm
Ingen wrote:
Tue Oct 01, 2019 8:02 am
...
and the user data files are to be stored on a USB memory stick. We might even run the system 'headless'.
...
Will the user have access to this USB stick? If so you may hit the same issue as the OS boot SD.

What is to stop the user pulling the USB stick out without ejecting it or even worse - while the OS is writing data to it?
Yes - there will be a visual front end, but not from the RPI! There will be adequate information provided and the ability for users to close the USB stick cleanly - hopefully most users will be aware not to remove a memory stick in the middle of a file transfer! There will be a data transfer progress bar.

The main concern for us is to make sure that the Raspbian Operating System continues to boot reliably following power down. As mentioned above, I think the read-only OverlayFS solution provided by ejolson seems to be the best overall design for us.

gsh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 1442
Joined: Sat Sep 10, 2011 11:43 am

Re: SD Card power failure resilience ideas

Wed Oct 09, 2019 9:14 am

sudo apt upgrade
sudo apt dist-update

sudo reboot (for good measure)

Click on the Menu -> Preferences -> Raspberry Pi Configuration -> Performance -> Overlay file system -> Configure...

Da Dah...

Zero writes to your SD card!
--
Gordon Hollingworth PhD
Raspberry Pi - Director of Software Engineering

ejolson
Posts: 3724
Joined: Tue Mar 18, 2014 11:47 am

Re: SD Card power failure resilience ideas

Wed Oct 09, 2019 12:46 pm

gsh wrote:
Wed Oct 09, 2019 9:14 am
sudo apt upgrade
sudo apt dist-update

sudo reboot (for good measure)

Click on the Menu -> Preferences -> Raspberry Pi Configuration -> Performance -> Overlay file system -> Configure...

Da Dah...

Zero writes to your SD card!
That's a nice feature! Is it documented in more detail anywhere?

gsh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 1442
Joined: Sat Sep 10, 2011 11:43 am

Re: SD Card power failure resilience ideas

Wed Oct 09, 2019 1:00 pm

https://github.com/ghollingworth/overlayfs

It's just a script (which has now been integrated into raspi-config by Simon) I wrote to do the instructions written by Mattias Wikstrom

You can read the code in /usr/bin/raspi-config

Gordon
--
Gordon Hollingworth PhD
Raspberry Pi - Director of Software Engineering

fanoush
Posts: 485
Joined: Mon Feb 27, 2012 2:37 pm

Re: SD Card power failure resilience ideas

Wed Oct 09, 2019 2:35 pm

Ingen wrote:
Sun Sep 29, 2019 8:24 pm
[1a] Simply keep a separate 3.3V power supply alive that only drives the SDC, and thus maintain only a modest voltage and current to keep the SDC comms and FLASH activity alive for a short amount of time.
i.e. Tens of milliamps for a few hundred milliseconds.

[1b] The intent is to 'wait-out' the comms dropout for enough time so that the SDC will complete it's current operation and then enter the IDLE state.

[1c] Then gracefully ramp down the SDC 3.3V power supply.

My assumption here is that only the SDC communication pipeline is broken, i.e. CLK, CMD, DAT[3:0]; the SD Card device itself should be kept active for ~500ms
The suggestion to add capacitor near sd slot to keep SD card alive a bit longer than CPU so it has time to finish writing while CPU is already down was already proposed here.The idea was really to prevent the physical damage of the card when it becomes permanently read only, not some filesystem corruption.

Not sure what was the result, if it is doable or has other issues. I guess one complication is that you really don't want this capacitor to power also rest of the system, just the sd card so you also need a diode or something to prevent the current flowing back. Also capacitor may be an issue at poweron time - it would delay powering up the card.

I suppose microsd cards are so small there is no space for such capacitor inside, I wonder if modern (e.g. M2) SSD disks have something like that in place so they can finish work safely when power is cut to them in the middle of write.

User avatar
rpdom
Posts: 15431
Joined: Sun May 06, 2012 5:17 am
Location: Chelmsford, Essex, UK

Re: SD Card power failure resilience ideas

Wed Oct 09, 2019 4:33 pm

fanoush wrote:
Wed Oct 09, 2019 2:35 pm
I suppose microsd cards are so small there is no space for such capacitor inside, I wonder if modern (e.g. M2) SSD disks have something like that in place so they can finish work safely when power is cut to them in the middle of write.
I would not be surprised if they did. They are sold for use in critical systems, after all. Also, many spinning rust storage devices (hard disks) used the motor as a generator to power the circuitry long enough to ensure a clean write was completed and the head parked safely when power was lost, so I'd assume SSDs had something in place to ensure clean shutdown on power fail as well.

ejolson
Posts: 3724
Joined: Tue Mar 18, 2014 11:47 am

Re: SD Card power failure resilience ideas

Wed Oct 09, 2019 4:55 pm

rpdom wrote:
Wed Oct 09, 2019 4:33 pm
fanoush wrote:
Wed Oct 09, 2019 2:35 pm
I suppose microsd cards are so small there is no space for such capacitor inside, I wonder if modern (e.g. M2) SSD disks have something like that in place so they can finish work safely when power is cut to them in the middle of write.
I would not be surprised if they did. They are sold for use in critical systems, after all. Also, many spinning rust storage devices (hard disks) used the motor as a generator to power the circuitry long enough to ensure a clean write was completed and the head parked safely when power was lost, so I'd assume SSDs had something in place to ensure clean shutdown on power fail as well.
There is also the nuclear option.

I see two issues: The possiblity of power interruption during a write causing damage to the hardware and the possibility of a filesystem being left with inconsistent data. While the first is the greater problem, either can prevent a Pi from rebooting when power is restored.

For inexpensive devices used in IoT applications, battery backups, diesel generators and super capacitors seem like too complicated and expensive a solution compared to writing the data over the network rather than to flash.

Doug_
Posts: 2
Joined: Tue Oct 08, 2019 6:40 pm

Re: SD Card power failure resilience ideas

Thu Oct 10, 2019 12:07 pm

gsh wrote:
Wed Oct 09, 2019 1:00 pm
https://github.com/ghollingworth/overlayfs

It's just a script (which has now been integrated into raspi-config by Simon) I wrote to do the instructions written by Mattias Wikstrom

You can read the code in /usr/bin/raspi-config

Gordon
Thanks!!! This is exactly what I need and to find this already within the RPI system is just brilliant.

Ingen
Posts: 4
Joined: Sun Sep 29, 2019 4:56 pm

Re: SD Card power failure resilience ideas

Tue Oct 15, 2019 7:58 am

gsh wrote:
Wed Oct 09, 2019 9:14 am
sudo apt upgrade
sudo apt dist-update

sudo reboot (for good measure)

Click on the Menu -> Preferences -> Raspberry Pi Configuration -> Performance -> Overlay file system -> Configure...

Da Dah...

Zero writes to your SD card!
Fantastic! Thanks very much for including this new feature.

I did a full system upgrade to Raspbian 'Buster' and can now see the read-only features in raspi-config.

Time for some power-cycling tests now...

remcohn
Posts: 1
Joined: Thu Oct 17, 2019 8:31 am

Re: SD Card power failure resilience ideas

Thu Oct 17, 2019 8:41 am

This is very interesting discussion.

Some things i like to point out that i havent seen yet:
1) EXT4 or any other journaling should take care of corruption as the result of *incomplete* writes.
2) SATA etc uses 'barriers' to indicate to the SSD that its not allowed to re-order writes over that barrier: it must commit all things before it to flash before its allowed to start meddling with stuff behind the barrier
3) my assumption is that corruption occurs because the FTL re-orders writes and might even corrupt parts of the filesystem where you are not writing at the time of the power failure as a result of the FTL moving things around to level our wear

That last one causes issues (if my assumption is correct). *but* : if we just let the pi crash, and keep just the flash chip running for a little longer so it can finish its FTL remapping stuff, then the fact that there are uncommitted writes in cache should be handled by the EXT4 journal.

Any thoughts on this ?

gsh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 1442
Joined: Sat Sep 10, 2011 11:43 am

Re: SD Card power failure resilience ideas

Thu Oct 17, 2019 9:44 am

I believe one of the largest issues is with ext4, you're right that the journal should handle keeping a log of structural (metadata) changes that it is about to make, but it doesn't effect the data (the data is written directly to the disk not to the journal first).

When a poweroff event occurs, it is possible that a block (a critically important structural block like the superblock) can get corrupted and incorrectly saved to the SD card. But there are multiple copies of the superblock so surely you can just read one of the other entries?

Yes you can, but this is where the difference occurs between what the ext4 kernel driver is able to do and what fsck.ext4 is able to do. The kernel driver I think (this is mostly conjecture based on my own research) can only work around (it won't fix), problems that it finds when it tries to mount its root, if it requires fsck to fix the problem then this cannot be done by the driver.

So to fix this type of corruption you need to remove the SD card and insert it into another Pi or Linux computer and run an fsck.ext4 on it... It can then boot again...

Up to now, I've never seen a corruption where the FTL damages something on the card that it was not writing to (a different partition for example), but since the FTL is closed in these cases it is really difficult to know whether this is actually the problem. What the overlayfs solution gives you is never writing to the card and therefore never creating the chance for a corruption to occur!

Gordon
--
Gordon Hollingworth PhD
Raspberry Pi - Director of Software Engineering

Heater
Posts: 13694
Joined: Tue Jul 17, 2012 3:02 pm

Re: SD Card power failure resilience ideas

Thu Oct 17, 2019 11:48 am

My experience with many SD card failures in the early days of the Pi were always more than just FS corruption. It became impossible to write to some blocks of the SD card. I spent a good while with those SD cards plugged into a PC and using dd to write all over them and read back. Always there were blocks that did not take the data.

Perhaps they were doggy/fake SD cards but they all came new from reputable local stores.

Of the years I notice others have come here complaining they cannot format their SDs or that they are write protected some how.

Since my Pi 3's though I have not seen such problems. Even if I am cavalier with my use of the power switch and yanking out USB adapters willy-nilly.

Is this all just random chance for me? Or did something change in software and hardware over the years? I might never find out.

Still, when I want reliability it's read-only root for me.
Memory in C++ is a leaky abstraction .

gsh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 1442
Joined: Sat Sep 10, 2011 11:43 am

Re: SD Card power failure resilience ideas

Thu Oct 17, 2019 12:07 pm

There were significant bug fixes in the SD host driver (it's been re-written twice), so it's possible some of those early failures were due to that. So it's possible these failures don't really exist quite to the extent that they did...

Gordon
--
Gordon Hollingworth PhD
Raspberry Pi - Director of Software Engineering

Heater
Posts: 13694
Joined: Tue Jul 17, 2012 3:02 pm

Re: SD Card power failure resilience ideas

Thu Oct 17, 2019 12:35 pm

Yes, I strongly suspect there were issues in software/hardware that caused a lot of failures in the early days. I don't think they were ever pinned down and when people came here with stories of failure the discussion always took off on the lines of bad power supplies, fake SD cards, file system corruption due to unclean shutdown etc, etc. Thus the actual root cause was never known.

What I forgot to say above is that I think this is why the Pi has a reputation for corrupting SD cards to this day. Even if original cause is long gone and the situation is awful lot better.

I'm happy with my current SD card failure rate. Being zero as it is. And a read-only root is easy to do when it's called for.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 3724
Joined: Tue Mar 18, 2014 11:47 am

Re: SD Card power failure resilience ideas

Thu Oct 17, 2019 3:32 pm

Heater wrote:
Thu Oct 17, 2019 12:35 pm
Yes, I strongly suspect there were issues in software/hardware that caused a lot of failures in the early days. I don't think they were ever pinned down and when people came here with stories of failure the discussion always took off on the lines of bad power supplies, fake SD cards, file system corruption due to unclean shutdown etc, etc. Thus the actual root cause was never known.

What I forgot to say above is that I think this is why the Pi has a reputation for corrupting SD cards to this day. Even if original cause is long gone and the situation is awful lot better.

I'm happy with my current SD card failure rate. Being zero as it is. And a read-only root is easy to do when it's called for.
The original bug was independently fixed in the upstream Linux kernel, however, I can't find the reference.

Heater
Posts: 13694
Joined: Tue Jul 17, 2012 3:02 pm

Re: SD Card power failure resilience ideas

Thu Oct 17, 2019 4:14 pm

ejolson,

What?! You mean all this time I have brought up in discussions here over the years and there was actually a bug identified and fixed. Nobody ever mentioned it.

Ah well, never mind, fixed and done now.
Memory in C++ is a leaky abstraction .

PiGraham
Posts: 3666
Joined: Fri Jun 07, 2013 12:37 pm
Location: Waterlooville

Re: SD Card power failure resilience ideas

Thu Oct 17, 2019 4:30 pm

This is excellent.
I've use similar on Windows Embedded system ("Enhanced Write Filter" IIRC)
I found that Adafruit have a script for RO filesystem but I like this better.

Was this a typo:
gsh wrote:
Wed Oct 09, 2019 9:14 am
sudo apt upgrade
sudo apt dist-update

i think that should be (first line just for completeness) :

Code: Select all

sudo apt update
sudo apt upgrade
sudo apt dist-upgrade

Return to “General discussion”