Optimizing Linux for Flash memory.


27 posts   Page 1 of 2   1, 2
by obarthelemy » Sun Sep 25, 2011 1:24 pm
The Pi will run mainly off flash memory, which works very differently than mechanical hard drives. Linux, by default, is tweaked for hard drives, which has a negative impact on the performance and longevity of flash-based Linux installs. Luckily, Linux is very flexible and we can re-configure it to work much better with flash.
The goal of this project is to
1- Inventory tweaks. There's a lot of info around the web, and even in the Pi forum. Let's gather it all here.
2- Test and validate those tweaks. Some may work only for SSDs, not SD nor USB sticks. Or only for different versions of Linux/kernels/controllers...
3- Recap that knowledge in a noob-friendly tutorial
4- maybe produce a script to apply the tweaks automagically, if that can be done reliably.

I'm reserving the next 3 posts for
1- list of sources
2- tutorials
3- script
and will try to consolidate all relevant info (from the forum, this thread, and the web) in those 3 posts, to avoid the ever-growing, confusing thread of death syndrome.

More details on how Flash memory is different from a hard disk:
- while any byte of info on flash is independently readable and writable, internally, flash is accessed by "blocks" of typically 128 KB. So writing a single byte of data actually means reading a full 128KB block, changing the required byte, and writing the new block back. Contrary to hard disks, flash knows of no "small" writes.
- on top of that, flash writes are fairly slow
- flash memory supports a somewhat limited number of write cycles, so limiting writes as much as possible increases longevity.
Hard disks have none of those 3 issues.
Posts: 1399
Joined: Tue Aug 09, 2011 10:53 pm
by obarthelemy » Sun Sep 25, 2011 1:24 pm
Posts: 1399
Joined: Tue Aug 09, 2011 10:53 pm
by obarthelemy » Sun Sep 25, 2011 1:25 pm
2- Tutorials

Step 1: Theory

- SD or USB ? Which specifications ? Let's gor for SD class 4, even though...
Jamesh tests on an alpha board show that SD performance is bad compared to USB (thread here: http://www.raspberrypi.org/for.....38;t=499.0).
Even though the conclusion is that USB sticks are faster the SD cards, we'll still choose the sdcard: things may get better in the final Pi; adapting SD tweaks for Flash is fairly trivial (everything that works for SD also applies for USB); we need the USB ports for other things; SD cards are more protected (inside the Pi); and finally since everything connected to the USB or Ethernet competes for bandwidth, better segregate as much as we can away from the USB interface.
SD cards are available in different "classes" of speed. There seem to be compatibility issues with faster cards, so going for slowpoke class 4 cards seems the best choice at the moment. All tweaks work regardless of class/speed anyway.

- Choosing the right format and filesystem: ext4 > ext2, avoid others.
There are a number of flash-optimized formats and filesystems, but these are targetted at straight flash ram that has no controller. SD cards and USB sticks always include a controller, and are designed for use with regular formats/filesystems. In the Linux world, we can safely use the mainstays: ext 2/3/4 or the upcoming btrfs. I could not find much info on the still-somewhat-new btrfs for flash, and ext3 is really an intermediate step between 2 and 4, so we can rule those 2 out. The main advantage of ext2 over ext4 is that it does not offer journalling at all. ext4 can turn off journalling, and offers a number of small extras over ext2, let's choose ext4, though ext2 would be OK, too.

- how bad are FAT and NTFS ? Pretty bad, avoid if possible
It would be nice to use FAT (FAT32, exFAT) or NTFS as much as possible on the Pi, especially since while Linux can read/write those, Windows can't really access ext partitions. Alas, these are non-native formats for Linux, inherently less speedy and reliable. Plus, optimizing them on Linux for flash use is undocumented, so we have to avoid them. The recommendation for easy data exchange with Windows is to have a separate USB stick, or at least a separate partition, and copy data to/from that as needed.
I've had very limited success with the ext2fsd ext driver for Windows (http://www.ubuntugeek.com/how-.....ows-7.html) use carefully and with backups ^^

Step 2: Prepare a Boot SD
IMPORTANT: You need working Pi boot files in a separate location to copy onto the SD once it has been formatted.
Small logistics problem here: we need to format the SD card, but can't format the one we're running the Pi from. Solutions
- use another Linux PC
- boot the Pi off a USB stick (if that's possible) so the SD card can be messed with
- connect an SD card reader to you Pi and use a second SD card.

- swap or no swap ?
http://distilledb.com/blog/arc.....linux.page
Since the Pi doesn't have much RAM, I'm going to assume a swap partition is needed. Probably 128, 256 or 512 MB, it really depends of what software you're running and your usage patterns. Let's go for 256 MB.

- Create partitions aligned on Flash blocks
TBD: tool to confirm flash block size. Assuming 128K.
Issue: hard to describe precisely without a real Pi (drive letters, fdisk options)
unfinalized odds and ends:
http://www.styryx.com/en/compu.....-usb-flash
http://linux-howto-guide.blogs.....speed.html

- Format partitions with journalling disabled
http://cptl.org/wp/index.php/2.....-in-linux/
This cannot be done on the system drive, so let's do it while we're booting off another drive, at the same time we're formatting it:
sudo mkfs.ext4 -O ^has_journal -L PiBoot /dev/sdx1
sudo fsck.ext4 -f /dev/sdx1


Step 3: Optimize your boot SD
At this point, copy the boot files to the SD, put it in the internal SD slot, disconnect all other mass storage for safety, and reboot.

- Disable superfluous writes access time
by default, inux keeps track of the last time a file has been read, which genererates a disk write for every file read. We want to disable that for all drives:
Open the hard drive config file with
sudo nano /etc/fstab

Add the noatime (no access time), no diratime (same for directories instead of files) and data=writeback options after the defaults parameters for each drive, except swap. Modified line should read something like:
/dev/sda2 / ext4 defaults,data=writeback,noatime,nodiratime 0 0

Do that for each drive, save, and remount your drives with
mount -o remount /


NOTE: the data=writeback option means than when you save/update a file, the OS will take a few seconds to update the directory to point to the new file. If your computer crashes/stops in the mean time, you'll lose your changes.

- Change the disk scheduler http://www.redhat.com/magazine.....chedulers/
add block/sda/queue/scheduler = noop to your /etc/sysfs.conf (requires the sysutils package) or elevator=noop to the kernel boot parameters in your /etc/default/grub

-Reduce swappiness http://community.linuxmint.com.....l/view/293
Reduce swappiness to make the operating system avoid the use of the swap area and prefer to use the memory modules instead. Open your /etc/sysctl.conf file with the gedit text editor, make a new line at the bottom of the file and add this:
vm.swappiness=10


- disable/relocate to RAM transient logs and variables
http://tombuntu.com/index.php/.....te-drives/
http://www.styryx.com/en/compu.....-usb-flash

NOTE: This eats up a lot of RAM. It s not recommended for the Model A, nor for the model B if your apps use up a lot of RAM. It is best for a "server" model B with no X11.

- Misc
Avoid logical volumes: hard to control alignment.
Posts: 1399
Joined: Tue Aug 09, 2011 10:53 pm
by obarthelemy » Sun Sep 25, 2011 1:25 pm
Reserved 3- Scripts
Posts: 1399
Joined: Tue Aug 09, 2011 10:53 pm
by emercer » Mon Sep 26, 2011 4:27 am
Obvious troll is obvious but... shouldn't this be wikified?
User avatar
Posts: 165
Joined: Sun Aug 07, 2011 1:54 am
Location: Sao Paulo, Brazil
by obarthelemy » Mon Sep 26, 2011 6:35 am
It's work in progress right now, I'll see when it's somewhat finished.
Posts: 1399
Joined: Tue Aug 09, 2011 10:53 pm
by jamesh » Mon Sep 26, 2011 8:34 am
I did some tests last night with a noatime mounting of the rootfs.

(mount -o remount, noatime /)

I was doing some fairly unscientific testing (startx midori and using a stopwatch to first screen up), but my results were very inconsistent.

Boot
startx midori >>>>>>>>> 67s
mount -o remount, noatime /
startx midori >>>>>>>>> 38s
mount -o remount, atime /
startx midori >>>>>>>>> 15s

reboot
mount -o remount, noatime /
startx midori >>>>>>>>> 89s ??? Whuh??

So very odd results. I think I need a better test
Volunteer at the Raspberry Pi Foundation, helper at Picademy September, October, November 2014.
Forum Moderator
Forum Moderator
Posts: 15711
Joined: Sat Jul 30, 2011 7:41 pm
by obarthelemy » Mon Sep 26, 2011 9:55 am
Lol... mmm... thanks ? At least we now know these settings have a huge impact ^^
There's a whole lot of parameters to play with, variances with different kernel versions... this is getting complicated, especially with no way to actually play with a Pi.
If think the list of parameters to tweak is final.
Also, the actual values to set those to should be OK.
Precise, reliable, noob-friendly procedures are a bi***.

Pleaaaase do this for us :-p
Posts: 1399
Joined: Tue Aug 09, 2011 10:53 pm
by jamesh » Mon Sep 26, 2011 10:23 am
I think my test isn't consistent enough - needs internet access (which is wireless in my case), midori remembers pages opened so can change between runs etc.

I think some sort of app startup is the way to go, just need something more consistent. Might try GCompris.
Volunteer at the Raspberry Pi Foundation, helper at Picademy September, October, November 2014.
Forum Moderator
Forum Moderator
Posts: 15711
Joined: Sat Jul 30, 2011 7:41 pm
by willjcroz » Mon Sep 26, 2011 10:44 am
Quote from jamesh on September 26, 2011, 09:34
So very odd results. I think I need a better test

It looks like the remount operation is not dropping the FS caches and the cache is speeding up the tests as the test is repeated. Try something like:

sync; echo 3 > /proc/sys/vm/drop_caches; time startx midori
Posts: 11
Joined: Thu Sep 01, 2011 1:26 pm
by jamesh » Mon Sep 26, 2011 11:05 am
I assumed there was some caching involved, but it's the tests after reboot that were odd. Setting noatime made the startup slower, which isn't right. At worst they should be the same times. I think Midori just needed to do more to start up.

Thanks for the info on clearing cache - that should be useful.
Volunteer at the Raspberry Pi Foundation, helper at Picademy September, October, November 2014.
Forum Moderator
Forum Moderator
Posts: 15711
Joined: Sat Jul 30, 2011 7:41 pm
by asb » Mon Sep 26, 2011 12:02 pm
I wouldn't expect a major performance boost using noatime, as the kernel has defaulted to relatime since 2.6.30.

https://github.com/torvalds/linux/commit/0a1c01c9477602ee8b44548a9405b2c1d587b5a2
Forum Moderator
Forum Moderator
Posts: 851
Joined: Fri Sep 16, 2011 7:16 pm
by jamesh » Mon Sep 26, 2011 12:14 pm
Quote from asb on September 26, 2011, 13:02
I wouldn't expect a major performance boost using noatime, as the kernel has defaulted to relatime since 2.6.30.

https://github.com/torvalds/linux/commit/0a1c01c9477602ee8b44548a9405b2c1d587b5a2


Interesting. Think noatime would still give a slight increase in performance, but not much given the def. of relatime.
Volunteer at the Raspberry Pi Foundation, helper at Picademy September, October, November 2014.
Forum Moderator
Forum Moderator
Posts: 15711
Joined: Sat Jul 30, 2011 7:41 pm
by asb » Mon Sep 26, 2011 12:29 pm
Quote from jamesh on September 26, 2011, 13:14
Interesting. Think noatime would still give a slight increase in performance, but not much given the def. of relatime.


Agreed. I have noatime in my fstab, but haven't done performance tests. Unless you're a mutt user there's no reason not to go with noatime (and even then there are workarounds I believe).
Forum Moderator
Forum Moderator
Posts: 851
Joined: Fri Sep 16, 2011 7:16 pm
by obarthelemy » Tue Oct 11, 2011 11:05 pm
Good news: did everything in the tutorial, and nothing broke. Takes about 5 mins, gathering the info took 5 hrs... the famous linux ratio is alive and well ^^
Posts: 1399
Joined: Tue Aug 09, 2011 10:53 pm
by n1ywb » Mon Oct 17, 2011 2:37 pm
It would be a good idea to configure the system to minimize log verbosity to avoid thrashing the flash on every little log write. Or if you don't need to retain logs mount /var/log as a ramdisk.
Posts: 26
Joined: Mon Oct 10, 2011 6:54 pm
by alexleung » Mon Oct 17, 2011 3:56 pm
I format SD Card to work like RAID device. The latest SSDs is using 8K page size & 2M erase block size.

mke2fs -T ext4 -E stride=2,stripe=width=512 /dev/sda1
# raid member - 8KB (-E stride=2), whole raid array - 2048KB (-E stripe-width=512)

More information for variety of page size & erase block size for SD Card and USB stick
https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey?action=show&redirect=WorkingGroups%2FKernelConsolidation%2FProjects%2FFlashCardSurvey
Posts: 2
Joined: Fri Sep 16, 2011 3:20 pm
by mard0 » Sun Oct 30, 2011 11:14 am
What about using btrfs as the default filesystem? It still isn't as fast as ext4 but has a few optimizations for ssd's
Posts: 52
Joined: Wed Oct 26, 2011 4:23 pm
by richard77 » Sun Oct 30, 2011 11:51 am
In my experience, with slow media like SDs, a squasfs + aufs setup makes a noticeable difference.
Posts: 12
Joined: Fri Oct 28, 2011 7:35 pm
by mard0 » Sun Oct 30, 2011 12:11 pm
Quote from richard77 on October 30, 2011, 11:51
In my experience, with slow media like SDs, a squasfs + aufs setup makes a noticeable difference.


Aaah you're just little faster then me. This link talks about some optimizations for running linux on flash memory that i have used before. One of them is using squashfs, but also postponing writes to disk can improve the wearlevel.
http://stevehanov.ca/blog/inde......php?id=48
Posts: 52
Joined: Wed Oct 26, 2011 4:23 pm
by NegentropicMan » Fri Nov 04, 2011 8:09 am
Ok, as far as I see, there may be some resourceful information coming from the field of power saving which may be interesting for us:
As there are blocks (of 128k) which have to be rewritten each time something is written, it may be a good idea to reduce the vm writeback time (though this is generally not a good idea without a fs journal... we have to investigate further which method has fewer drawbacks) in order to write possibly bigger chunks of data (which may be aligned). The wb time is modified in
/proc/sys/vm/dirty_writeback_centisecs


The syslog daemon may be used with the - option in order to make less sync operations (and again, use lesser, but bulkier writes). Same thing with journaling/lost data on crashes here

These two tips come from http://www.lesswatts.org/tips/disks.php, bear in mind that our goal is not to save power but to optimize write access.

The next option may be to use a swap file instead/additional to a swap partition. The advantage is that the swap file lies in a file system, which may be optimized for flash media, so all optimizations for the file system take an effect for the swap, too. See http://distilledb.com/blog/arc.....linux.page, section "Adding swap containers" for further information.
Posts: 2
Joined: Fri Nov 04, 2011 7:29 am
by asb » Mon Nov 07, 2011 11:07 pm
A great resource for understanding the limitations of SD cards and other cheap drives is this LWN article:
http://lwn.net/Articles/428584/

Also see the Linaro flash card survey:
https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey

Finally, there's a rather excellent video from Embedded Linux Con Europe 2011 on the topic:
http://free-electrons.com/blog.....11-videos/
Forum Moderator
Forum Moderator
Posts: 851
Joined: Fri Sep 16, 2011 7:16 pm
by lingon » Sat Jan 07, 2012 8:10 pm
Here is a table of fast SD-cards listed by their specified transfer speeds:

http://hjreggel.net/cardspeed/....._sdxc.html

Benchmark results for these fast cards would be even more interesting to see of course.
Posts: 115
Joined: Fri Aug 26, 2011 7:31 am
by Chromatix » Sat Jan 07, 2012 11:28 pm
I recommend using the "deadline" ioscheduler instead of "noop".  The advantage of "deadline" is that it includes the elevator algorithm, and therefore groups spatially nearby accesses whenever practical.  This means that several small accesses which lie in the same 128KB block might just result in only one big write.  The "noop" scheduler is only appropriate for true SSDs.

I've done some investigation of SD card and USB drive performance for my day job.  Generally read performance is quite good, in that there is a fairly consistent bandwidth (not necessarily high) and a reasonably low latency (less than hard disk).  Write performance with a Linux filesystem varies wildly, mostly predicated on write *latency* rather than bandwidth.  The write bandwidth of cards under ideal conditions is usually quite a lot higher than the guaranteed rate, but under non-ideal conditions is usually much lower.

Latency of Class 10 cards is not specified in the SDHC card standard, and is therefore often *much* worse than for Class 2, 4, 6 cards which *do* have a latency spec.  The best performing cheap SDHC cards we've found are the Transcend Class 6 4GB (€8 locally) and the SanDisk Class 2 4GB.  So far we have not found an 8GB or larger card with anywhere near comparable latency to these.

Bear in mind also that every read must wait while a write is in progress, therefore high write latency can be unacceptable for a system drive.

Among USB drives, the Kingston 410 series is the best that we have been able to obtain locally, with write latencies comparable to a hard disk's full-stroke seek.  There are some other brand lines which emphasise high performance but are not kept in stock locally.  The prices of these high-performance drives are considerably higher than for basic USB drives of the same capacity.  However, latencies of the cheaper drives have been observed in the one-second range which is unacceptable for a system drive.

USB-attached hard disks and SSDs are considerably better performing and have higher capacity, but are correspondingly more expensive.  I think it is wise to keep the projected cost of storage at most equal to the cost of the device.

The idea of using SquashFS and a union filesystem is a good one, especially if the OS is provided modularly as a series of SquashFS images (mounted by initrd) in lieu of packages. Tinycore is already halfway there, but it copies the contents into a tmpfs at runtime instead of using a union.  ArchLinux seems to use SquashFS images in the LiveCD, too.

Eliminating logfile writes is also good - perhaps configure the syslogd to dump them to a VT instead of to disk?  Noatime is fine, but I would advise caution about delaying general writes too much - children can be very impatient, which may result in the power being yanked (possibly not by the owner) before a clean shutdown can be achieved.
The key to knowledge is not to rely on people to teach you it.
User avatar
Posts: 430
Joined: Mon Jan 02, 2012 7:00 pm
Location: Helsinki
by lingon » Mon Jan 09, 2012 7:23 pm
This is an interesting post at Phoronix about a new Linux I/O scheduler FIOPS under development for flash memory:

http://www.phoronix.com/scan.php?page=news_item&px=MTAzOTU

This might be useful once it is available in some future kernel version.
Posts: 115
Joined: Fri Aug 26, 2011 7:31 am