paulv
Posts: 557
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands

Prevent/Minimize USB (Flash) stick corruption

Fri Mar 02, 2018 4:23 pm

No, this is not for the SD card!

I use an 8GB USB Flash stick "drive" as data storage connected to one of the RPi USB ports. The data storage is less than 1MB in total, so there should be plenty of free space for data management and re-allocation, I think.

For this project, I use a classic Model (1)B RPi. It has a full-blown UPS (see some of my other posts on this Forum) to protect for power related issues.

While developing the Python applications, I use the Python logger functionality to create a "trace" of the activity. I'm using logrotate to create a log file per day. There are 3 separate applications that I'm tracking. The entries in the logger are at an average of once every 5 seconds, sometimes faster.

Code: Select all

2018-03-02 10:02:01,206 INFO     process_cv_qual_data
2018-03-02 10:02:10,679 INFO     received data from the cv sensors
2018-03-02 10:02:10,684 INFO     -- loop time=30 sleep time=60
2018-03-02 10:03:10,746 INFO     Process sensor data
2018-03-02 10:03:10,748 INFO     Prep the sensor results for the server
2018-03-02 10:03:10,753 INFO     /usr/bin/rrdtool update /media/data/cv/cv_mon.rrd N:54.5:44.6
2018-03-02 10:03:10,840 INFO     /home/pi/create_cv_graphs.sh
2018-03-02 10:03:15,279 INFO     updated database and server
2018-03-02 10:03:15,281 INFO     starting temp sensor requests
2018-03-02 10:03:25,317 INFO     waiting for sensor replies : 1
2018-03-02 10:03:35,350 INFO     waiting for sensor replies : 2
Needless to say, there is a lot of writing on the USB stick. In addition, the applications collect data that go into separate rrdtool RRD databases with a 3 min. update rate.

I mount the USB drive (/dev/sda1) as follows:

Code: Select all

proc            /proc           proc    defaults          0       0
/dev/mmcblk0p1  /boot           vfat    defaults          0       2
/dev/mmcblk0p2  /               ext4    defaults,noatime  0       1
/dev/sda1       /media/data     ext4    defaults          0       0
tmpfs           /tmp            tmpfs   defaults,noatime,nosuid,size=100m  0       0
tmpfs           /var/tmp        tmpfs   defaults,noatime,nosuid,size=30m 0       0
tmpfs           /var/log        tmpfs  defaults,noatime,nosuid,mode=0755,size=2M       0       0
The system automatically makes backup's of the data structure on the stick every 6 hours, and they go to my NTFS Raid drive.
Checking for corruption on the stick is not possible while the system is running, because you need to umount the drive, and while running the filecheck, I loose sensor information. I could implement a filecheck but I don't want to go there yet.

In any case, I'm seeing file corruption on the USB stick. Most of the time the system crashed without showing much evidence as to the why. The USB drive ended up as unusable, needing a full repair and setup again.

However, I saw some evidence today because logrotate produced an exception that was caused by OSError: [Errno 117] Structure needs cleaning This was caused by the logrotate at mid-night, saving the old log, and creating a new one.

Code: Select all

Traceback (most recent call last):
  File "/usr/lib/python2.7/logging/handlers.py", line 77, in emit
    self.doRollover()
  File "/usr/lib/python2.7/logging/handlers.py", line 350, in doRollover
    os.rename(self.baseFilename, dfn)
OSError: [Errno 117] Structure needs cleaning
Logged from file temp_app.py, line 197
Starting Temp Monitor V 7.1.1
Traceback (most recent call last):
  File "/usr/lib/python2.7/logging/handlers.py", line 77, in emit
    self.doRollover()
  File "/usr/lib/python2.7/logging/handlers.py", line 350, in doRollover
    os.rename(self.baseFilename, dfn)
OSError: [Errno 117] Structure needs cleaning
Logged from file temp_app.py, line 197
Unmounting the stick and running a file check (that went through several iterations) for the first time ended with an error. Running it the second time, again with several iterations, produced a clean report. However, the drive now only contained one entry, the lost+found file. The rest went to bit heaven.

I know that in principle, the SD card and the USB flash are very close in architecture. A lot has been written about the SD card corruption,unfortunately, I could not find much relevant information for the USB flash kind. So my question here is if there is something that can be done to either minimize or prevent failures to these USB sticks?

Please don't tell me to use a rotating HDD or an SSD one, I know that much and I will go that route, but I just want to find out what tricks, if any, can be used on USB flash "drives".

Thanks!
Last edited by paulv on Fri Mar 09, 2018 2:52 pm, edited 1 time in total.

User avatar
FTrevorGowen
Forum Moderator
Forum Moderator
Posts: 4956
Joined: Mon Mar 04, 2013 6:12 pm
Location: Bristol, U.K.
Contact: Website

Re: Prevent/Minimize USB stick corruption

Fri Mar 02, 2018 7:45 pm

paulv wrote: No, this is not for the SD card!

I use an 8GB USB Flash stick "drive" as data storage connected to one of the RPi USB ports. The data storage is less than 1MB in total, so there should be plenty of free space for data management and re-allocation, I think.

For this project, I use a classic Model (1)B RPi. It has a full-blown UPS (see some of my other posts on this Forum) to protect for power related issues.
...
Thanks!
Is it an early, 256Mb B1 with the extra USB polyfuses? (See "raspiblack"here: http://www.cpmspectrepi.uk/raspberry_pi ... uePis.html or the recent blog: https://www.raspberrypi.org/blog/happy-birthday-2018/ ) Mine, running a (Python-based) MoinMoin wiki from a USB stick was retired when said wiki became "corrupted" & crashed (although I was able to recover the wiki pages and re-build it on a newer flash drive in a newer Pi. It's since been moved again and currently runs on a 32Gb flash drive in my P2B).
Trev.
Still running Raspbian Jessie on some older Pi's (an A, B1, B2, B+, P2B, 3xP0, P0W) but Stretch on my 2xP3A+, P3B+, P3B, B+, A+ and a B2. See: https://www.cpmspectrepi.uk/raspberry_pi/raspiidx.htm

paulv
Posts: 557
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands

Re: Prevent/Minimize USB stick corruption

Fri Mar 02, 2018 7:57 pm

Model B Rev 2 with 512MB
Made in the UK

Code: Select all

cat /proc/cpuinfo

processor       : 0
model name      : ARMv6-compatible processor rev 7 (v6l)
BogoMIPS        : 697.95
Features        : half thumb fastmult vfp edsp java tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xb76
CPU revision    : 7

Hardware        : BCM2708
Revision        : 000e
Serial          : 0000000094cbd6ea

paulv
Posts: 557
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands

Re: Prevent/Minimize USB stick corruption

Fri Mar 09, 2018 2:45 pm

After experiencing the same corruption problem on a regular basis, while trying to prevent it from happening, I did a couple of things that may be useful for other as well.

First of all, I used cron to run a shell file-check script every 6 hours. This script would terminate all the processes that accessed the USB stick, sync a few times, wait a few seconds and then unmount the drive, run a file check, remount the drive, and restart all the processes again.

Terminating the applications properly was a little more complicated. The reason is that these processes use MQTT to receive data from remote sensors over WiFi. The RPi serves as the MQTT broker and also as a client. When data arrives for the client, there is an interrupt generated and the interrupt service routine processes the received data. This is a separate process, so I needed to gracefully halt this, to prevent data corruption while the main process is terminated.

This is what I did.
First of all, all these applications are Python scripts that get activated at boot time by systemd .service routines. Here is an example:

Code: Select all

# This service installs a Python script that monitors the temperature, humidity and air pressure data coming from remote sensors.
# The script should never die, and if it does, it will be restarted.
# If the script is restarted 4 x within 180 seconds, the Pi is rebooted.

[Unit]
Description=Installing the temp_mon monitoring script
Requires=basic.target
After=multi-user.target

[Service]
ExecStart=/usr/bin/python /home/pi/temp_app.py
Restart=always

# The number of times the service is restarted within a time period can be set
# If that condition is met, the RPi is rebooted
StartLimitBurst=4
StartLimitInterval=180s
# actions can be none|reboot|reboot-force|reboot-immidiate
StartLimitAction=none

[Install]
WantedBy=multi-user.target
While playing with my system, I did not want the process to reboot my RPi, so the StartLimitAction was set to "none", instead of "reboot".
To prevent systemd from a brute force termination, I added the following after Restart=always :

Code: Select all

# wait for 10s before the SIGKILL gets send
TimeoutStopSec=10s
Normally, systemd will send the SIGTERM signal to gracefully terminate the process. If this does not happen within a certain amount of time, the SIGKILL is sent, and that will definitely kill the process. (equivalent to sudo kill -9) I needed to make sure that my application has enough time to gracefully terminate on it's own, so I needed to extend this time to 10 seconds.

In my Python script, I needed to catch the SIGTERM signal, so it could react and properly terminate. I added the following code to my applications. First you need to import the signal library. Then in main() I added this right at the start :

Code: Select all

    # setup a catch for the following signals:
    for sig in (signal.SIGTERM, signal.SIGINT, signal.SIGHUP, signal.SIGQUIT):
        signal.signal(sig, sig_handler)
This catches the various signals and sends them on to a signal handler. This is my signal handler :

Code: Select all

def sig_handler (signum=None, frame = None):
    '''
    This function will catch the most important system signals, but NOT not a shutdown!
    During debugging, we need to be able to stop the execution to avoid file corruption
    and preserve the status so we can restart it again.

    This handler catches the following signals from the OS:
        SIGHUB  = (1) SSH Terminal logout
        SIGINT  = (2) Ctrl-C
        SIGQUIT = (3) ctrl-\
        SIGTERM = (15) Deamon terminate (deamon --stop): is coming from the systemd manager

        Both SIGHUB and SIGTERM will also generate an IOerror

        However, it cannot catch SIGKILL = (9), the kill -9 or the shutdown
        procedure, which what systemd will do when asked to stop the service.
        Also the UPS system must terminate the processes properly before invoking
        the shutdown process!

        To tell systemd to wait a little so the process can be halted and data stored,
        you need to add the following to the .service file:
            TimeoutStopSec=10s

    '''
    global stop_now

    try:

        write_log("system", "Signal handler called with signal : {0}".format(signum))
        #
        # stop the mainloop and interrupt handling
        stop_now = True

        write_log("system", "Sighandler is stopping MQTT processing")

        time.sleep(10)
        subprocess.call(['sync;sync;sync '], shell=True, \
            stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        time.sleep(5)
        os._exit(1) # force the exit to the OS
        return

    except IOError as e:
        write_log("trace", "ignoring IOError : {0}".format(e))
        return

    except Exception as e:
        write_log("error", "Unexpected Exception in sig_handler() : \n{0}".format(e))
        return
You can easily decide to do different things based on the signal received. "signum" contains the decimal code. Because my applications run as daemons, I don't need to do different things for the other signals.
The global variable "stop_now", that can be set True or False, is used in the "def on_message(mqttc, userdata, msg)" MQTT function and will prevent received messages from further processing, because that will cause write activity on the USB stick.

This should have been sufficient to prevent data corruption by my processes in the termination phase. Unfortunately, I still experienced massive file_check errors that "repaired" the drive into oblivion. Obviously, by now I started to suspect the USB stick. The one I used is one of those inexpensive mini sticks, it's an 8GB version made by (intenso). I have several of them, and never suspected or noticed issues. I used them to hold the rootfs system and I ran several of my RPi's off them, rather then the SD card.

I replaced it with a 16 GB Cruzer one, a much larger stick, and probably using better Flash chips. It has been working fine for a few days now.

Investigating the USB stick some more
I did check the suspected drive by doing a destructive test:

Code: Select all

sudo badblocks -w -s -o usbstick.log /dev/sdb
Strange enough, it reported no errors. So rather then putting it into my RPi system again, I used another RPi to do some tests. I prepared the disk as usual again, wrote a file structure to it by running my backup script for source files

Code: Select all

sudo /usr/bin/rdiff-backup --force --exclude-globbing-filelist /home/pi/.exclude-list /home/pi /media/data
I then ran the file-check again

Code: Select all

sudo fsck.ext4 -yf /dev/sdb1
No errors were reported. I told cron to run this backup and file-check every 10 minutes, and it did not take long for the file-check to fail massively again. I'll keep the stick for later testing. Maybe I can find an error later.


Using RAID on Flash memory
While I was investigating my problems, it dawned on me that I could possibly set it up as a RAID drive. Google confirmed this, and so I used the following to create one. (look here for more information https://raid.wiki.kernel.org/index.php/RAID_setup)

Code: Select all

sudo apt-get install mdadm
sudo gdisk /dev/sdb 
# setup 2 partitions. I used two 3GB ones, set the type to Linux Raid (fd00)
# better is to setup 3 partitions, 1 as a spare, so that one can be added in to replace a bad one
# in case of a failure (I did not do that yet)
# after I was done with gdisk, I looked at the partitions with 
sudo cat /proc/partitions
# They showed up as /dev/sdb1 and /dev/sdb2 in my case
# you can try to activate them without rebooting by:
sudo partprobe -s
# if that does not work, it'll tell you, and you still need to reboot
# the level 1 raid drive is setup as follows :
sudo mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2 /dev/sdb1 /dev/sdb2
# you can also use raid 5, 6, or 10 on 3 or more drives/partitions
# you can look at the process with:
cat /proc/mdstat
# save the configuration/layout for a reboot
sudo mdadm --detail --scan | sudo tee -a /etc/mdadm/mdadm.conf 
# format the drive
sudo mkfs.ext4 /dev/md0
# setup a mount point (different because I'm playing with it)
sudo mkdir -p /mnt/raid
sudo chmod 777 /mnt/raid
# make it bootable with:
sudo nano /etc/fstab
# and add this to the file:
# my raid drive (2x3GB)
/dev/md0        /mnt/raid       ext4    defaults   0   0
# mount it with:
sudo mount -all or with sudo mount /dev/md0 /mnt/raid
I am now testing this drive and learn what this will do/not do. So far it works really well with my tests. I need to learn more about rescuing my data when one of the two mirrors fail, but you get the idea.

If you have more information, please chime in.

Enjoy!
Last edited by paulv on Sun Mar 11, 2018 7:41 am, edited 8 times in total.

paulv
Posts: 557
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands

Re: Prevent/Minimize USB (Flash) stick corruption

Sat Mar 10, 2018 4:42 am

It seems I’m not the only one with weird corruption issues on USB flash drives.
viewtopic.php?t=65108

paulv
Posts: 557
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands

Re: Prevent/Minimize USB (Flash) stick corruption

Sat Mar 10, 2018 4:44 am

Here is more information if you want to learn about RAID drives.
https://www.tecmint.com/understanding-r ... -in-linux/
Enjoy!

User avatar
karrika
Posts: 1052
Joined: Mon Oct 19, 2015 6:21 am
Location: Finland

Re: Prevent/Minimize USB (Flash) stick corruption

Sat Mar 10, 2018 7:44 am

Thank you for this article. I have had this problem for years with the 8G company sticks we have been giving to our customers.
The idea was to use the sticks for transferring large archives of nautical charts. The stick works perfectly once. When you try to update the giant archive using rsync you have a 30% chance that the stick is broken beyond repair.

paulv
Posts: 557
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands

Re: Prevent/Minimize USB (Flash) stick corruption

Sat Mar 10, 2018 10:40 am

Karrika et all,

The funny thing is, I have used these mini 8GB flash sticks for longer periods as the rootfs for my RPi's.
See my sticky post : viewtopic.php?f=29&t=44177

In any case, I have now configured a raid5 by using three partitions of 2GB each on one 8GB stick, a good one.
I was pleasantly surprised that the timing of the write performance was negligible.
A simple test was used to update an rrd database with a size just below 1MB, and update that 50x .
I compared that to a normal flash drive and the the raid5 drive. Virtually no difference.

The next step is to use the bad/suspected stick, configure that for a raid5 and run my backup and file-check test for longer periods.
We'll see if mdadm will catch and report something. I do not have a spare drive assigned, so it cannot swap a bad one out. Nonetheless, I think it should start flagging one or more of the partitions as bad. Eventually...

More later...

User avatar
hojnikb
Posts: 128
Joined: Mon Jun 04, 2012 3:59 pm
Location: @Home

Re: Prevent/Minimize USB (Flash) stick corruption

Sat Mar 10, 2018 7:49 pm

Use a good quality brand name flash drive (possibly one that uses 3D TLC or 2D MLC flash).

Most cheap flash drives use 2d tlc and very simplistic flash controllers, that don't handle constant writes very well.

So best bet is to have quality drive, proper PSU (so little voltage fluctuations). In this case, you might even consider routing power directly to usb ports from microusb.

3rd option would be to use some sort of ram caching and then flushing data to flash drive periodically. This way you minimize block writes.


Or just get an usb to msata ssd device. SSDs are using much more advanced flash controllers that are better at handling errors.
+°´°+,¸¸,+°´°~ Everyone should have a taste of UK Raspberry Pie =D ~°´°+,¸¸,+°´°+
Rasberry Pi, SoC @ 1225Mhz :o, 256MB Ram @ 550Mhz, 16GB SD-Card, Raspbian

User avatar
karrika
Posts: 1052
Joined: Mon Oct 19, 2015 6:21 am
Location: Finland

Re: Prevent/Minimize USB (Flash) stick corruption

Sat Mar 10, 2018 7:59 pm

hojnikb wrote:
Sat Mar 10, 2018 7:49 pm
Or just get an usb to msata ssd device. SSDs are using much more advanced flash controllers that are better at handling errors.
True. This is the route I took. No problems.

paulv
Posts: 557
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands

Re: Prevent/Minimize USB (Flash) stick corruption

Sun Mar 11, 2018 9:17 am

Hi hojnikb,

Thank you for your tips.
However, they are not bringing insight into what this post is all about.
In the first post, I specifically said :
Please don't tell me to use a rotating HDD or an SSD one, I know that much and I will go that route, but I just want to find out what tricks, if any, can be used on USB flash "drives".
Most of us have an abundance of these cheap and little devices. In most applications that are not super critical, they can be used to store files, and I even used them to hold the rootfs and ran my RPi's from them, rather than the SD card.

If however, you want to have a little more security if they are holding your little more precious data, like a database holding sensor data, you may want to have a little bit of a safety net. That's what I'm after. Hence the title of this post.

We all (should) know that flash devices will over time start to fail due to degradation of the memory cells. The flash controller itself however is not part of that degradation, only the memory cells are. In my simple minded view, we should be able to partition the flash memory and use a software RAID to handle the eventual errors in those memory segments. I know that I'm going against the traditional RAID strategies by using one device. The difference is that in the traditional approach and advice to use several separate physical devices in a RAID is to protect against physical hardware failure, typical for spinning drives. If the hardware fails, all the memory behind it is no longer available. My theory is that for flash devices, it is very unlikely that the flash controller itself fails. So, if that is indeed the case, using partitions on the same flash device is just about as safe as physically separate spinning disks.

I am specifically not mentioning SSD technology, because of it's high price. For most RPi applications that's a pricey overkill, although I use them too, for other applications.

The challenge to me therefore, and hence this post, is to get more life out of simple and inexpensive flash technology USB sticks and I'm hoping to get some experience from others or inputs on how to accomplish that.

User avatar
karrika
Posts: 1052
Joined: Mon Oct 19, 2015 6:21 am
Location: Finland

Re: Prevent/Minimize USB (Flash) stick corruption

Sun Mar 11, 2018 9:43 am

Hi paulv,

I am really interested in any findings that could increase the reliability of the USB sticks.

I have tried to mount the stick when it is inserted with:

Code: Select all

KERNEL!="sd[b-z]1", GOTO="usbcheck_end"
IMPORT{program}="/sbin/blkid -o udev -p %N"
ACTION=="add", ENV{mount_options}="relatime"
ACTION=="add", ENV{ID_FS_TYPE}=="vfat|ntfs", ENV{mount_options}="$env{mount_opt>
ACTION=="add", \
 RUN+="/usr/local/bin/blink", \
 RUN+="/bin/mkdir -p /tmp/usb", \
 RUN+="/bin/chmod a+rwx /tmp/usb", \
 RUN+="/bin/mount -o $env{mount_options} /dev/%k /tmp/usb", \
 RUN+="/usr/local/bin/processusb" \
 RUN+="/usr/local/bin/light"
ACTION=="remove", RUN+="/bin/umount -l /tmp/usb", \
 RUN+="/bin/chmod a-w /tmp/usb", \
 RUN+="/usr/local/bin/dark"
LABEL="usbcheck_end"
After that running the data transfer to the stick with:

Code: Select all

#!/bin/bash
/bin/mount -o remount,async,noatime,norelatime /tmp/usb
 if [ -e /tmp/usb/NM ]; then
    /usr/bin/rsync -rptuv --modify-window=1 --partial --force /tmp/usb/NM/ /opt/AVCS
fi
/bin/sync
/bin/mount -o remount,async,relatime /tmp/usb
This works for a while. You insert the stick, wait for a few hours for 4.2GB of chart material to transfer to the stick. This usually takes 3 hours. When the light stops blinking we have told the crew to wait for a minute at least before they remove the stick.

Later, when you want to update the content of the stick the process takes only a few minutes.

After a few cycles the USB is history and the content of the stick is corrupted.

paulv
Posts: 557
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands

Re: Prevent/Minimize USB (Flash) stick corruption

Sun Mar 11, 2018 10:15 am

Some more observations while using my "bad" USB flash "drive".

I created 3 partitions on the 8GB stick, each 2GB in size with type fd00 (Linux raid)
I created a type 5 raid as follows :

Code: Select all

sudo mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdb2 /dev/sdb3
Then did the usual registration and formatted the raid drive :

Code: Select all

sudo mkfs.ext4 /dev/md0
No errors were reported.
I mounted the drive as /mnt/raid put some data on it, unmounted the drive and ran a file check on it.

Code: Select all

sudo fsck -yf /dev/md0
Lo and behold, it found a lot of errors, right after formatting. To eliminate issues caused by the data I put on the drive, I unmounted it and reformatted again. I then used the file-check, right after the formatting, and again, there were errors reported:

Code: Select all

[email protected]:~ $ sudo fsck -yf /dev/md0
fsck from util-linux 2.25.2
e2fsck 1.43.3 (04-Sep-2016)
Pass 1: Checking inodes, blocks, and sizes
Inode 7 has illegal block(s).  Clear? yes

Illegal block #266508 (3924690433) in inode 7.  CLEARED.
Illegal block #266509 (855638067) in inode 7.  CLEARED.
Illegal block #266510 (790629379) in inode 7.  CLEARED.
Illegal block #266511 (2139127935) in inode 7.  CLEARED.
Illegal block #266512 (3369474086) in inode 7.  CLEARED.
Illegal block #266513 (3233857728) in inode 7.  CLEARED.
Illegal block #266514 (4240650255) in inode 7.  CLEARED.
Illegal block #266515 (3790203297) in inode 7.  CLEARED.
Illegal block #266516 (2390188185) in inode 7.  CLEARED.
Illegal block #266517 (858730291) in inode 7.  CLEARED.
Illegal block #266518 (1517601022) in inode 7.  CLEARED.
Too many illegal blocks in inode 7.
Clear inode? yes

Restarting e2fsck from the beginning...
Resize inode not valid.  Recreate? yes

Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #0 (24024, counted=24025).
Fix? yes

Free blocks count wrong (1010609, counted=1010610).
Fix? yes


/dev/md0: ***** FILE SYSTEM WAS MODIFIED *****
/dev/md0: 11/262144 files (0.0% non-contiguous), 36942/1047552 blocks
A subsequent file check worked fine.

Originally, I did not do a file-check after formatting the stick when I used it in my application, I guess this could be why the files started to get corrupted massively.

Question: Does the formatting process not catch certain file system errors, or more specifically not on Flash drives?

Maybe this is a good take-away, Do a file check on a flash drive right after formatting and before you start to use it.

Running the file-check once more showed the expected results:

Code: Select all

[email protected]:~ $ sudo fsck -yf /dev/md0
fsck from util-linux 2.25.2
e2fsck 1.43.3 (04-Sep-2016)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/md0: 11/262144 files (0.0% non-contiguous), 36942/1047552 blocks
Asking for a report from mdadm shows no issues:

Code: Select all

[email protected]:~ $ sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Sun Mar 11 09:21:21 2018
     Raid Level : raid5
     Array Size : 4190208 (4.00 GiB 4.29 GB)
  Used Dev Size : 2095104 (2046.34 MiB 2145.39 MB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Sun Mar 11 11:10:36 2018
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : raspi-dev:0  (local to host raspi-dev)
           UUID : dcfb5036:a08b07e3:a52bf993:31bc7c3b
         Events : 39

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       18        1      active sync   /dev/sdb2
       3       8       19        2      active sync   /dev/sdb3
I then started to run my test program again by putting a bunch of files on the raid every 10 min. and shortly after running my file-check routine. After the file check, the files are deleted. My hope is that the flash controller will allocate different cells on the stick all the time due to the wear levelling. Given enough time, my files will “travel” all over the available cells.

Right away, there were more errors reported, again on Inode 7 :

Code: Select all

[code]Sun Mar 11 11:15:01 CET 2018
fsck from util-linux 2.25.2
Pass 1: Checking inodes, blocks, and sizes
Inode 7 has illegal block(s).  Clear? yes

Illegal block #266258 (1074626820) in inode 7.  CLEARED.
Illegal block #266300 (134217728) in inode 7.  CLEARED.
Illegal block #266301 (134217728) in inode 7.  CLEARED.
Illegal block #266310 (67108864) in inode 7.  CLEARED.
Illegal block #266318 (536870912) in inode 7.  CLEARED.
Illegal block #266320 (268435456) in inode 7.  CLEARED.
Illegal block #266324 (33554432) in inode 7.  CLEARED.
Illegal block #266334 (1073741824) in inode 7.  CLEARED.
Illegal block #266358 (16777216) in inode 7.  CLEARED.
Illegal block #266361 (2147483648) in inode 7.  CLEARED.
Illegal block #266364 (1073741824) in inode 7.  CLEARED.
Too many illegal blocks in inode 7.
Clear inode? yes

Restarting e2fsck from the beginning...
Resize inode not valid.  Recreate? yes

Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #0 (24024, counted=24025).
Fix? yes

Free blocks count wrong (1010421, counted=1010422).
Fix? yes


/dev/md0: ***** FILE SYSTEM WAS MODIFIED *****
/dev/md0: 49/262144 files (0.0% non-contiguous), 37130/1047552 blocks


I'll let this run and monitor the results for a few hours, then run the program without the file-check to see what mdadm will do.

UPDATE : no more errors after about 12 hours into the test cycle.

paulv
Posts: 557
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands

Re: Prevent/Minimize USB (Flash) stick corruption

Mon Mar 12, 2018 6:40 am

While my tests are running, I started to look for more information about what can/could go wrong when using flash drives.

Eventually, you’ll come across tools that try to separate the fake from the real sticks. Fake because there are sticks that report a much higher capacity then is actually there. There are several tools that will help determine that, and als try to investigate how good the cells are.

One of these tools is f3, that runs on Linux. There is a Usage file that has some very interesting information in it, most notably, for me, is this :
The second assumption is troublesome because a fake card may be able to persuade dosfsck(8) to report it’s fine, or not report the whole problem, or give users the illusion the memory card was fixed when it wasn’t. I singled dosfsck(8) out because of the question about it, but those two assumptions are true for fsck software for other file systems and badblocks(8) as well.
This rings some alarmbells in my head.
It’s probably fair to say that most Linux tools are designed under the assumption that the underlying or interfacing hardware is developed by other engineers that truthfully design and document their products as “honest” as they can. Never mind their Marketing dept.

However, with devices that can be manufactured by the millions, are very in-expensive, and can be sold virtually anonymously, devious minds can create stuff that the tools can no longer handle. The tools are being mislead, and so are we.

Here is a link to that information : https://fight-flash-fraud.readthedocs.i ... and-f3read

Enjoy!

paulv
Posts: 557
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands

Re: Prevent/Minimize USB (Flash) stick corruption

Mon Mar 12, 2018 7:27 pm

While further trying to find tools that can identity potential troublesome flash drives, I first tried the DOS tool H2testW (version 1.4) that reported a bunch of problems. Encouraged, I then downloaded and installed the F3 Linux tool suite, which is based on H2testW, but further developed, and tried that on a few of my USB sticks.

Both tools work by filling the available memory with 1GB files, that can be tested in a next step.
Here is the writing result of filling the stick with 1GB files on the known bad Intenso stick:

Code: Select all

[email protected]:~ $ sudo f3write /mnt/stick
F3 write 7.0
Copyright (C) 2010 Digirati Internet LTDA.
This is free software; see the source for copying conditions.

Free space: 7.46 GB
Creating file 1.h2w ... OK!
Creating file 2.h2w ... OK!
Creating file 3.h2w ... OK!
Creating file 4.h2w ... OK!
Creating file 5.h2w ... OK!
Creating file 6.h2w ... OK!
Creating file 7.h2w ... OK!
Creating file 8.h2w ... OK!
Free space: 0.00 Byte
Average writing speed: 4.47 MB/s 
The size is about right, 7.46GB for an 8GB size. Actually, 8 files of 1GB each where written, and there was no free space left. (no fraction of an attempt to fill it with another 1GB file)
After that I ran the reading report :

Code: Select all

[email protected]:~ $ sudo f3read /mnt/stick
F3 read 7.0
Copyright (C) 2010 Digirati Internet LTDA.
This is free software; see the source for copying conditions.

                  SECTORS      ok/corrupted/changed/overwritten
Validating file 1.h2w ... 2082924/    14228/      0/      0
Validating file 2.h2w ... 2072358/    24794/      0/      0
Validating file 3.h2w ... 2070986/    26166/      0/      0
Validating file 4.h2w ... 2069708/    27444/      0/      0
Validating file 5.h2w ... 2073470/    23682/      0/      0
Validating file 6.h2w ... 2087140/    10012/      0/      0
Validating file 7.h2w ... 2096990/      162/      0/      0
Validating file 8.h2w ...  950956/     5428/      0/      0

  Data OK: 7.39 GB (15504532 sectors)
Data LOST: 64.41 MB (131916 sectors)
               Corrupted: 64.41 MB (131916 sectors)
        Slightly changed: 0.00 Byte (0 sectors)
             Overwritten: 0.00 Byte (0 sectors)
Average reading speed: 13.01 MB/s
Every 1GB block reports corruption, to a total of 64.41MB. This stick is begging to be hit by a sledgehammer, although I will keep it for more testing.

Another Intenso stick, that seemed to work OK and I did not suspect to have issues, was also reported bad. Another one ready to be hammered into oblivion.

Code: Select all

[email protected]:~ $ sudo f3read /mnt/stick
F3 read 7.0
Copyright (C) 2010 Digirati Internet LTDA.
This is free software; see the source for copying conditions.

                  SECTORS      ok/corrupted/changed/overwritten
Validating file 1.h2w ... 2096392/      760/      0/      0
Validating file 2.h2w ... 2096434/      718/      0/      0
Validating file 3.h2w ... 2096480/      672/      0/      0
Validating file 4.h2w ... 2095588/     1564/      0/      0
Validating file 5.h2w ... 2094938/     2214/      0/      0
Validating file 6.h2w ... 2095886/     1266/      0/      0
Validating file 7.h2w ... 2096882/      270/      0/      0
Validating file 8.h2w ...  509346/      670/      0/      0

  Data OK: 7.24 GB (15181946 sectors)
Data LOST: 3.97 MB (8134 sectors)
               Corrupted: 3.97 MB (8134 sectors)
        Slightly changed: 0.00 Byte (0 sectors)
             Overwritten: 0.00 Byte (0 sectors)
Average reading speed: 15.91 MB/s
A third stick, an 8GB one from ScanDisk, reported no errors.

So both test tools (H2testW and F3) report problems that the other traditional programs I tried did not, or not very well. Looks like I need to test every stick with one of these tools before actually using the stick.

Enjoy!

User avatar
karrika
Posts: 1052
Joined: Mon Oct 19, 2015 6:21 am
Location: Finland

Re: Prevent/Minimize USB (Flash) stick corruption

Mon Mar 12, 2018 8:31 pm

Thank you for the tests. The article and the last test sets seem to confirm that there is a lot of broken USB sticks out there that give you a false feel of safety. I plan to run through all my sticks to see which ones to ditch right away. I suppose the answer will be that all the cheap company sticks are trash.

Here is the output of ./f3probe --destructive --time-ops /dev/sdb

So in my case the stick is reported as good.

Good news: The device `/dev/sdb' is the real thing

Device geometry:
*Usable* size: 7.47 GB (15669248 blocks)
Announced size: 7.47 GB (15669248 blocks)
Module: 8.00 GB (2^33 Bytes)
Approximate cache size: 0.00 Byte (0 blocks), need-reset=no
Physical block size: 512.00 Byte (2^9 Bytes)

Probe time: 8'36"
Operation: total time / count = avg time
Read: 851.3ms / 4813 = 176us
Write: 8'32" / 4192321 = 122us
Reset: 165.5ms / 1 = 165.5ms

paulv
Posts: 557
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands

Re: Prevent/Minimize USB (Flash) stick corruption

Tue Mar 20, 2018 9:11 am

Despite my recent efforts to minimize crashes, I had another catastrophic failure today.
In my latest setup, I used a 16GB Transcent USB stick that tested OK with the destructive tests before I used it.This stick has been used for occasional temporary data storage for a couple of years without any problems.

I created 4 partitions of 3GB each and configured them in a raid 5 configuration, using 3 drives with one spare.

An automatic file check just after mid-night showed some issues, but were automatically repaired.
This morning however, the system crashed, because the raid collapsed. mdadm found a failure on one drive and while it attempted to switch in my spare drive, and recover, all four drives were rendered inactive and the raid went offline.

I was able to see this myself while rebooting and trying to repair the drive by hand. Eventually the stick became unusable, even a file check would not run anymore.

It seems that my theory of using partitions on the same Flash system would be roughly equivalent to physically separate drives is proved to be invalid, although I don't understand the root cause of why this is so.

At this moment I'm running the destructive test on the Transcend stick again.

paulv
Posts: 557
Joined: Tue Jan 15, 2013 12:10 pm
Location: Netherlands

Re: Prevent/Minimize USB (Flash) stick corruption

Wed Jun 13, 2018 7:20 am

After trying to minimize the "disc" activity by reducing logging of my applications, using logrotate and tmpfs and a couple more things, it only took about 2 moths before the stick developed issues. And this was with a brand name USD flash stick, sigh...

I gave up on using a flash stick for the OS. I purchased a 2.5" hard disk (rotating), and used a USB adapter cable to connect it to the RPi. My RPi has a USP, so it's protected for sudden shutdowns or power failures. The hard disk is not powered by the RPi, it is powered directly from a 5V supply, that also powers the UPS, and is always on. This prevents startup issues with the RPi. The RPi can reboot freely since the drive is always there. I used my own procedure [ https://www.raspberrypi.org/forums/vie ... 9&t=44177 ] to move the OS to the hard disk, and this setup has been running for about a month now.

I don't expect issues anymore.

Return to “Advanced users”