Whether you're a secretary or a scientist, the ability to sort files is about as fundamental as it gets in the world of computing. The problem with this "obscure but fundamental" topic is that the way computers sort files by name
is COMPLETELY MESSED UP! You will not find a truly universal standard for sorting files and the Raspberry Pi is no exception. Instead, you will find a variety of different behaviors that make little or no intuitive sense to the vast majority of people. Even within the Linux world, you will find several different implementations that affect the order in which files are presented or processed. To be clear, I'm not suggesting there's any bug or mistake with the Raspberry – or any other computer. Like most things, if you take a "deep dive" on the peculiarities of file sorting on any system, you will usually find that it has some kind of "internal logic" that explains its behavior. That's what my tutorial explores. More importantly, I reveal a powerful new sorting technique that I developed after much experimentation and testing.
RASPBERRY PI FILE SORTING BEHAVIOR:
To make things even more confusing, sorting behavior doesn't just vary from one computing system to another – it can vary WITHIN
a single system as well! This is very much the case for the Raspberry and its official operating system – Raspbian Stretch. Just to prove my point, I assembled a group of test files and ran a series of sorting experiments. The file names I chose begin with an assortment of numbers, letters, words, and symbols. In my chart, the first two columns are probably the most relevant to most users. The first column shows how File Manager (PCManFM) sorts files by ASCENDING NAME
. The second column shows how Terminal sorts the same files when you run the extremely fundamental "ls" command – also by ASCENDING NAME. The ls command is also referred to as the "list command" (that's a lower-case "l" as in "list", not the number 1). I'll explain the other columns in a moment, but your key takeaway at this point should be how RADICALLY DIFFERENT
the sorting behaviors are. Pick any color and follow it with your eyes. You will see that File Manager and the ls command – even though they are BOTH part of the same operating system – will sort the exact same files in very different ways:
To view this file sorting image at full resolution, right-click and select "open image in new tab" – or on phones and tablets, “tap and hold” and save it to your pictures for full-size viewing.
IMPLICATIONS OF MY CHART:
As you can see in the above chart, Raspbian's File Manager does a fairly impressive job of sorting files (by name) in a rather intuitive, predictable and "human friendly" way. Unfortunately, File Manager is basically just a nice piece of custom code that rests on top of the "core" Raspbian operating system. Most programs and scripts you use will NOT
rely on the kind of sorting behavior you see in File Manager. Instead, they will typically rely on the sorting behavior exhibited by the ls command, which is part of the GNU "coreutils". As the author of the gigantic FFmpeg / mpv tutorial
, I first encountered this phenomenon when experimenting with numeric image sequences that need to be fed into FFmpeg to generate a movie – where the frames must be in the proper order. FFmpeg – like almost every piece of software – is simply not going to reinvent the wheel. In other words, if an operating system already has a built-in sorting mechanism, no rational programmer is going to waste their time creating an entirely new sorting algorithm on their own. In this regard, FFmpeg is no exception. But that can be a real problem if you don't understand how Raspbian's "internal" sorting mechanism behaves. And even if you do understand how it works, that alone is not enough – because you'll still need to know how to control it and make it bend to your will so that the sorting behavior conforms to your wishes!
If you ask any reasonably intelligent kindergartner to list, in order, the numbers 1, 2, and 10, almost all of them will give you the correct answer. They will say "1, 2, 10". But look at the results of Raspbian's internal "sorting engine" – which is reflected by the output of the ls command. It thinks the correct order is "10, 1, 2". If you were feeding an image sequence into FFmpeg, for example, you would end up with a movie where *ALL 3* frames would be in the wrong location. That would be quite the strange movie – a movie that begins in the future, jumps to the past, and ends in the middle! Now I'm not saying that "10, 1, 2" is wrong – but it's certainly not "human friendly" or even slightly intuitive to most people. There is, however, a certain kind of strange logic to it. If you think about it, the number 2 is the "odd man out" in that numeric sequence. If you look at them more as letters rather than numbers, only 10 and 1 have something in common. That commonality is that they both begin with the same character – the number "1"! In other words, if you think of "1" as being the equivalent of the letter "A", they both begin with "A"! As a result, from a grouping perspective, it makes sense to group the 10 and the 1 together before you get to the 2! I'll come right out and say it, however – no matter what the historical reasons or other explanations, I think that's a RIDICULOUS approach to sorting! Whenever possible, humans should be the ones training computers – not the other way around! If you have to completely rethink the rules of something as basic as sorting, you're allowing yourself to be trained by the computer! That would not necessarily be a bad thing if that new "training" lead to a deeper, more enlightened understanding of the universe – but if it forces you to think in a "messed up" way, that's not good at all!
NAME AND TIME:
When all is said and done, there are really only 2 main ways to sort a file – by name or by time! In other words, you can either sort files by their file name or by their timestamp. Both of these are part of the file's "metadata", since they're not technically part of the file's contents. If you have a picture, for example, the picture itself obviously consists of "picture data" – the various colors and brightness levels of each pixel. If you have a picture of a tree, for instance, the file name and timestamp are clearly not part of the tree's photographic depiction – which is why they are considered to be "meta" to the image itself. Now, before anyone starts correcting me, I'm fully aware that you can also sort files by size and type – such as whether they are large or small or end with an extension like .jpg or .txt. That kind of sorting can be extremely useful in limited cases. But the focus of my tutorial is PRACTICAL file sorting – so although I won't ignore the sorting of files by size or type, I will only mention that in passing. Besides, those methods of sorting are quite self-explanatory!
If you don't like the default sorting behavior of Raspbian's "internal engine", you have only one realistic choice in most cases – YOU MUST RENAME THE FILES IN THE ORDER YOU WANT THEM! In theory, you might be able to change a program's internal file sorting order as it processes or displays files – but that capability is RARE! In a few cases, such as the excellent image-viewing program Feh, there is some limited flexibility in this regard – but it's certainly not a common feature!
For example, Raspbian's default image viewer – GPicView – does NOT display files in any recognizable sorting order. It clearly uses the file name to determine the order and it seems to have its own "method to the madness", but it's certainly not intuitive or useful to the average person. For instance, it will display 00 before 0 and 10 before 1 – so you might think it's following the same order as the standard ls command (without any options). But that is NOT the case – because it places the tilde and underscore characters at the top of the sorting list, whereas the ls command will put those toward the bottom. I won't bore you with more details – but suffice it to say that GPicView also does not match the behavior of File Manager, POSIX, or "natural" sorting! Clearly, in a case like this, if you want GPicView to flip through a bunch of images in a predictable manner, your only choice is to rename the files themselves – in a predictable order that you control!
Another theoretical approach would be to alter your system's environmental variables so that the internal sorting behavior changes. I demonstrated that in my chart when I switched the behavior from "Raspbian style" sorting to POSIX-style sorting by invoking "LC_ALL=C" as a prefix to the ls command. But as you can see, POSIX-style sorting generated an equally absurd sequence of "1, 10, 2" for the numeric series of "1, 2, 10". That's hardly an improvement in my eyes! You can also trigger the ls command's "-v" option to generate a "natural style" sorting behavior. Basically, this attempts to understand version numbers inside file names and sort them accordingly. This is all well and good, but keep in mind that for 99% of software, such as FFmpeg, you're not going to be able to get "under the hood" and slip in a "-v" option to alter your software's sorting behavior! Instead, with almost all software, it's simply going to mirror the behavior exhibited by the ls command – without any options or alterations. Finally, just to complete the explanation of my chart, the "-X" option sorts by extension type – .bmp files come before .jpg files, for example. If you look carefully, however, you'll see that it precisely matches the sorting behavior of the standard ls command without any options – except that it also takes into account the alphabetical order of the file extension. No great revolution there! If you read the official manual pages for the ls command, you'll see that my chart pretty much reflects the sum total of name-based sorting options. [Yes, super nerds, there are also options to sort by ctime, atime and directory order. But since ctime is merely "change time" and not "creation time", it has very limited value. Separately, since Raspbian uses a "relatime" setting like most Linux systems do by default, atime is virtually useless. And directory order? Hardly worth talking about!]
PADDING WITH LEADING ZEROS:
This, in my opinion, is the ultimate way to exercise total control over the sorting behavior of your files. Whether it's Linux, Windows or Mac, there is only one truly reliable, "universally recognized" file sorting scheme. If your file names begin with a numeric sequence that always consists of the exact same number of digits – a consistent format that is assured through the use of "leading" zeros that are "padded" to an underlying number series – your system will always recognize them in their intended order! So, for example, 001.jpg, 002.jpg, 010.jpg, will always be correctly interpreted in the proper "1, 2, 10" sequence by all modern computing systems. But what is the best way to do this?
MY UNIVERSAL FILE SORTING FORMULA:
I have personally struggled with this issue, off and on, for several months. Although I quickly came up with a variety of methods for specific
sorting scenarios, my dream was to generalize them into a "universal" command line that could be applied to almost ALL
sorting scenarios. To be clear, I am certainly not the Einstein of file sorting. My universal formula borrows ideas from more than a dozen Internet postings that I discovered through many hours of Googling. My unique contribution, however, is that I have taken all of those disparate ideas – often presented in completely different contexts – and synthesized them into one unified formula. Even that wasn't enough, however, to perfect my method. I also spent several frustrating hours tweaking the "equation" until it finally worked without throwing inexplicable error messages or behaving in unexpected ways. This was definitely a case of mining "undocumented obscurity" to exploit the hidden power of Raspbian Linux for all it is worth! So here it is – my universal file sorting command line:
NOTE: Several variations of my command line are available in copy-friendly plain text in the examples section that appears at the bottom of my tutorial. To view this 1080-wide command line image at full resolution, right-click and select "open image in new tab" – or on phones and tablets, “tap and hold” and save it to your pictures for full-size viewing.
As you may have noticed, I highlighted – in alternating blue and red – the 5 adjustable elements inside my file sorting command line. So let me start by explaining "start=1". What this does is very simple: It determines the "starting number" for your sorted file sequence. If, for example, you changed "start=1" to "start=300", your file names will begin with 0300, 0301, 0302, etc. If you need to insert files into an existing sequence – such as the frames of a movie – it can be very handy to have FULL CONTROL over the starting value – which also means, by extension, that you also have full control over the ending value. In most cases, however, "start=1" is probably the most useful setting. If you set it to 1, your files names will begin with 0001, 0002, 0003, etc. That is certainly the most "universal" setting. But my command line's start value is extremely flexible. You can even set it to "start=0". If you do that, your file name sequence will go like this: 0000, 0001, 0002, etc.
In the world of Raspbian, the "-tr" option means "reverse time" order – the "t" for time, the "r" for reverse. But don't let that fool you! Quite ridiculously, different operating systems – including different distributions of Linux – use OPPOSITE meanings for "reverse time" sort. But at least in Raspbian, "reverse time" counterintuitively means what I and most rational people would refer to as a "chronological sort" – NOT a "reverse chronological" sort. In my book, "reverse time" would always mean "reverse chronological" order – not "chronological" order. Here's a common sense example: If you asked the average person to list, in chronological order, the birth years of 1970, 1980, and 1990 – based on the times those people were "created" – almost all of them would say "1970, 1980, and 1990"! But if they thought like Raspbian's internal sorting engine, they instead would say "1990, 1980, and 1970"! That's probably because the people behind coreutils thought it would be more convenient to list "newest files first" as the default behavior. In other words, without adding the "-r" for reverse option, the default behavior is to place the newest files you created or edited at the top of the list. I actually agree with this approach on ONE BIG LEVEL – because it's my preferred sorting behavior as well. After all, most of the time when you're doing things on your computer, you want to see the most recent files you were working on at the top of the list! What I object to, however, is the decision to construe this as being "chronological" in time. It's not! It's definitely "reverse chronological" order no matter what they say!
RASPBIAN'S BIZARRE NOTION OF TIME – TEST IT YOURSELF:
Don't believe me? Try it yourself by creating a brand-new empty folder called "TEST" – or whatever name you wish. Then, open that folder and right-click inside it. Then click "Create New" and then click "Empty File". Using that method, create 3 empty files called 1970, 1980 and 1990. Be sure to create "1970" first and "1990" last. Then, open that folder in Terminal and run the following command. The "-1" at the end forces the ls command to list "one file per line" – a much more "human readable" format than having to read the file names horizontally across the screen:
ls -t -1
You will see that the output – although in theory being in "normal" time order – is actually what most people would call "reverse chronological" order. Think about the people who were born as being letters in the alphabet. Who is the equivalent of the letter "A" in the alphabet? In other words, who came first? It's obviously the person born in 1970! Ascending chronological order should clearly be A, B, C – not C, B, A! Anyway, here's the absurd output the "normal" time command produces:
But if you add "r" for reverse and run this command line instead:
ls -tr -1
You will get the following output – which I consider to be the correct, perfectly normal "chronological" order – even though they consider it to be in "reverse" time order:
DO YOU TAKE PICTURES?
Understanding Raspbian's odd notion of time is probably most important when dealing with pictures and images. If you took a bunch of pictures with your camera on a vacation, you'd probably like to have those pictures in the same order you took them! So if you took a picture on Monday, you'd probably like that to appear before another picture that you took on Tuesday. Likewise, if you've turned your Raspberry into a surveillance camera by using the outstanding motionEyeOS system, you'd probably want the image captures to be sorted in the same order they were acquired. The problem is that if Raspbian's internal sorting engine doesn't like the EXACT format of your file names – and it can be very picky – it will NOT sort them the way you want! Instead, they will end up completely out of sequence!
In Raspbian, therefore, it's important to appreciate the value of "reverse time" (-tr) sorting – because it will cause the oldest picture or file to be listed FIRST. If you think about it, the oldest picture is also the FIRST PICTURE
taken by your camera! So for almost all image applications, you'll want Raspbian's "reverse time" sorting! Technical note: Sorting with the ls command is actually based on the most recent file modification time – not the "creation" time. But when you're dealing with image sequences, for example, modification time is usually the same as the creation time – unless of course the image was later edited and re-saved under the same file name.
This item in my sorting command line is entirely optional – but it can be critical in certain cases. Under ideal circumstances, the contents of your folder would be entirely "pure". In other words, the only files inside it are the ones you wish to sort. This is certainly the cleanest way to approach things. But if you have other unrelated files – such as .txt or .wav files, for example – you need to either remove them from the folder first – or filter them out when you run the sort. That's because the sorting technique I developed is completely agnostic. From its standpoint, *ALL* files in a folder are fair game. That makes it both powerful but also dangerous if you don't know what you're doing! So in this example, we've added "*.jpg" to ONLY sort files that end in "*.jpg". That means it will ignore all .png files, mp4 files, .txt files, etc. You can obviously change "*.jpg" to "*.png" or "*.txt" – or whatever you want. You can also completely remove that item from my command line if you don't need it or want it!
This item controls the "digit format" of the sorted files. If you're certain that you will never have a need for more than 9,999 sorted files, "%04d" is perfect. But if you're doing an intense project that might involve several million files – a possibility I'm raising for the benefit of readers on computers more powerful than the Raspberry – you need to carefully consider how many "digits" you'll need. For example, standard 30 frame-per-second video – if run continuously for 24 hours – will generate almost 2.6 MILLION frames per day! That's 2,592,000 frames to be exact. If you notice, that's 7 digits long. However, in less than 4 days, it will break the 10 MILLION frame mark. At that point, you're now in 8-digit territory. So be sure to "plan ahead" when you select the digit value. It's always best to "over plan" and give yourself some extra wiggle room. So if you're planning to break 10 million files, be sure to set it to at least "%08d" – or even better, "%09d" so that you can make it past the one-month mark! For many common applications – like a brief list of a few dozen items in a folder – using "%02d" is probably the cleanest-looking format.
This useful option includes the ORIGINAL file name in the newly renamed file. So instead of "My First Picture.jpg" being renamed to "0001.jpg", it will actually be renamed to the very clear and legible "0001 - My First Picture.jpg". I personally love this option because it's THE BEST OF BOTH WORLDS! You get all the sorting advantages of prefixing it with a properly padded number with leading zeros – but you also get to retain the original file name (which will have no impact on the sorting order). But if for some reason you wish to have "pure" numeric values for your newly sorted and renamed files, you can easily DELETE this item from the command line. But you must "delete" it in a very specific way. Please see my "CRITICAL NOTE" in EXAMPLE 2, below, for the simple change you need to make!
My command line has proven itself to be very reliable. But "external" events that have nothing to do with my command line can strike at any time – for example, your Raspberry could get hit with a power surge while it's actively sorting and renaming your files. That could permanently mangle them! So if the files you're sorting are of any great importance, there's only one way to protect yourself with 100% certainty: First make a backup copy of all the files you're about to sort and put them on a physically separate storage device! Just sayin'.
Make sure File Manager is CLOSED when you run the sorting command line. Otherwise, it will process the files much more slowly – because File Manager will be constantly updating the displayed contents of the open folder. As long as you follow that tip, the command line is very fast. It will sort and rename several thousand files per minute on the Raspberry Pi 3!
PASSIVELY TEST BEFORE YOU SORT & RENAME:
You should always passively and "non-destructively" test how different options with the ls command affect the sorting order BEFORE you commit to anything! In other words, open Terminal in your folder and run "ls -tr -1", for example, to see how "chronological" sorting order will behave with your files. If it happens to work well for your particular purpose, then you might as well "burn it in" to your actual file names. That way, a standard file name sort in either File Manager or Terminal with the ls command – without any options – will automatically list them in the order you want! It also means that any software that uses the Raspberry's "internal sorting engine" will process them in the correct order. Remember to always end your command line with a "-1" when you run your passive test – because that way it will list each file on easily readable separate lines!
EXAMPLE 1: YOU TOOK A SERIES OF PICTURES WHERE THE FILE NAMES NEED TO BE SORTED IN CORRECT CHRONOLOGICAL ORDER – WHILE ALSO RETAINING THE ORIGINAL FILE NAMES:
Unfortunately, Raspbian's internal "sorting engine" – including the ls command – will sequence the file names in this non-numerical, non-chronological order:
To address that, run this command line – it's the same one that appears in my "universal sorting formula" graphic:
start=1; ls -tr *.jpg | cat -n | while read n f; do mv "$f" "`printf "%04d - $f" $start`"; ((start++)); done
The files will now be renamed to this:
0001 - 1.jpg
0002 - 2.jpg
0003 - 3.jpg
0004 - 10.jpg
0005 - 11.jpg
0006 - 12.jpg
Notice how it kept the original file names in perfect condition – but added a properly padded numeric sequence as a prefix! This now makes the pictures "universally sortable" by almost any computing system or software. For any of this to work, of course, an extremely modest assumption is being made – that when your camera takes a picture, the file receives a basic timestamp. And no – I'm not referring to the EXIF data that may also record the time the picture was taken. None of that is needed! All that's required is that the file itself has a simple time associated with it. What makes my technique even more flexible is that the time and date don't even have to be correct. Even the year could be wrong. All that matters is that the pictures were saved in chronological
order. I'm not aware of any modern camera that doesn't behave in this manner.
EXAMPLE 2: YOU TOOK A SERIES OF PICTURES WHERE THE FILE NAMES NEED TO BE SORTED IN CORRECT CHRONOLOGICAL ORDER – WHILE ELIMINATING THE ORIGINAL FILE NAMES:
start=1; ls -tr *.jpg | cat -n | while read n f; do mv "$f" "`printf "%04d.jpg" $start`"; ((start++)); done
Using the same files listed in Example 1, the above command line will sort and rename the files like this:
If you look carefully, it's not a completely simple matter of deleting the " - $f" part to get rid of the original file name. Instead, that part must be replaced with ".jpg" immediately after the "%04d" part with no spaces. Obviously, if you were renaming .png or .txt files, for example, you would have to change that part from "%04d.jpg" to "%04d.png" or "%04d.txt", etc.
EXAMPLE 3: YOU TOOK A SERIES OF "HOW TO" PICTURES TO HELP A FRIEND REPLACE A CAR BATTERY – BUT YOU WANT THEM SORTED IN A PROPER CHRONOLOGICAL ORDER:
First, open the hood of the car to get access to the car battery.jpg
After you get access to the battery, disconnect the top wire.jpg
Carefully disconnect the bottom wire after you remove the top wire.jpg
Unfortunately, both File Manager AND Raspbian's internal sorting engine will sort them, by name, in perfect alphabetical order – which also happens to be the COMPLETELY WRONG order for our needs:
After you get access to the battery, disconnect the top wire.jpg
Carefully disconnect the bottom wire after you remove the top wire.jpg
First, open the hood of the car to get access to the car battery.jpg
But if you run the following command line, you'll get them in the perfect order shown below – with a very neat 2-digit prefix. This of course assumes that you took the pictures in a chronological, step-by-step order:
start=1; ls -tr *.jpg | cat -n | while read n f; do mv "$f" "`printf "%02d - $f" $start`"; ((start++)); done
01 - First, open the hood of the car to get access to the car battery.jpg
02 - After you get access to the battery, disconnect the top wire.jpg
03 - Carefully disconnect the bottom wire after you remove the top wire.jpg
EXAMPLE 4: BURN "NATURAL-STYLE" SORTING INTO YOUR FILE NAMES – SO THAT A STANDARD NAME SORT WILL AUTOMATICALLY REFLECT "NATURAL" NAME SORTING ORDER. FOR EXAMPLE, A SERIES OF FILES WITH "VERSION NUMBERS" INSIDE THEIR FILE NAMES:
By default, Raspbian's internal sort will place them in this unfriendly order:
The following command will burn the "natural sort" directly into the file names. Note that we are no longer sorting by timestamp through the use of the "-tr" option. Instead, we are now sorting by file name in a very specific way – by using the ls command's "natural" sort option. Note also that we have now changed the "*.jpg" to "*.txt" in order to selectively ignore all non-text files:
start=1; ls -v *.txt | cat -n | while read n f; do mv "$f" "`printf "%04d - $f" $start`"; ((start++)); done
The above command line will rename the files like this:
0001 - Document v1.txt
0002 - Document v2.txt
0003 - Document v3.txt
0004 - Document v10.txt
0005 - Document v11.txt
0006 - Document v12.txt
EXAMPLE 5: SORT FILES BY FILE SIZE (the largest files will appear at the top):
start=1; ls -S | cat -n | while read n f; do mv "$f" "`printf "%04d - $f" $start`"; ((start++)); done
NOTE: You can add the "r" option to reverse the file size order – in other words, you would change the "-S" to "-Sr". This will place the smallest files at the top.
NOTE: As you can see, we are no longer filtering files with *.jpg or *.txt extensions. In most cases, if you're sorting by file size, you would want to include ALL files in your sort. Every situation is different, of course – which is why my formula gives you maximum flexibility.
EXAMPLE 6: SORT FILES BY EXTENSION TYPE (an alphabetical sort that's derived from standard ls command sorting):
start=1; ls -X | cat -n | while read n f; do mv "$f" "`printf "%04d - $f" $start`"; ((start++)); done
NOTE: As you can see, we are not filtering files with *.jpg or *.txt extensions. In most cases, if you're sorting by extension type, you would want to include ALL files in your sort.
EXAMPLE 7: SORT FILES IN ACCORDANCE WITH POSIX STYLE:
start=1; LC_ALL=C ls | cat -n | while read n f; do mv "$f" "`printf "%04d - $f" $start`"; ((start++)); done
NOTE: POSIX style follows "traditional sort order" and uses "native byte values" to determine the sequence.