mtmcdonough wrote: ↑
Fri Nov 02, 2018 3:01 pm
The file names are extremely simple
based on the page number:
HSWB bp p 2-3
HSWB bp p 4-5
HSWB bp p 6-7
HSWB bp p 8-9
HSWB bp p 10-11
HSWB bp p 12-13
HSWB bp p 14-15
I can remember that one person was using GPicView and was on page 686 and clicked next page and was on page 68.
I had to highlight your use of the term "extremely simple". To a human or perhaps a well-trained A.I., I agree – the file names you list are simple. But keep in mind that since you're using a computer, all that really matters is what the computer thinks is simple – not you!
I ALREADY EXPLAINED ALL OF THIS IN MY FILE SORTING TUTORIAL!
Yeah, I know that might sound obnoxious. But I wouldn't be the infamous RPi_Mike if I didn't speak directly. I have a reputation to live up to, after all. Haha.
I spent A LOT
of time on this extremely comprehensive tutorial, so I can't resist pointing out that for those who read it carefully, it already addresses the basic issue you raise.
In fact, if you had read the 7th paragraph of my tutorial, you would have seen that I specifically discussed the major sorting limitations of GPicView – the exact same image viewer you're asking about! I even explained that resolving those issues with GPicView would typically require a complete renaming of the files.
Not only that, I also gave a specific example – along with a working command line that anyone can use – that automatically renames files based on "natural" sorting. That's listed in bold under "EXAMPLE 4" of my tutorial and completely solves the problem you're dealing with (based on the limited sample you provided).
Finally, I also created a detailed chart
at the very top of my tutorial that lists the exact sorting behavior of the Raspberry under many different scenarios. Although that particular chart didn't specifically list the behavior of GPicView, it certainly shows the only kind of sorting that correctly handles the classic "1, 2, 10" ending up as "1, 10, 2". So when you were surprised that "page 686" suddenly advanced to "page 68", it didn't have to be a surprise – at least not after reading my tutorial!
Nonetheless, I realize you may not have made the connection between "10 and 1" – which IS
listed on my chart – and the examples you give of "686 and 68". Realize, of course, that there are literally an infinity of whole numbers – so I obviously couldn't list every possible whole number sequence in the mathematical universe. Nonetheless, the core issue – and underlying pattern – is EXACTLY
[Hint: 10 and 1 – and 686 and 68 – both have something in common. Each pair shares the same initial numeral! In the first pair, the shared numeral is "1". In the second pair, the shared numeral is "6". So although it may seem nonsensical to most humans, there's a perfectly sound logic to that kind of sorting if you think of it in terms of "grouping". In other words, from the standpoint of many sorting algorithms, all numbers that begin with the same numeral are part of the same group – and therefore will be grouped together – even though most people will think "wait, why is this out of numerical order?" It's the same basic reason why aardvark is put next to ant – they both begin with "a". This was also addressed in my tutorial!]
OK, good – I'm glad I got all that off my chest!
I will now drill down to an even deeper level of insanity on the topic of file sorting and image display – and take it to a whole new level...
ANOTHER RPI_MIKE EXCLUSIVE – THE MOST COMMON TYPES OF FILE SORTING BEHAVIOR ON THE RASPBERRY PI:
Items in green reflect human-friendly "natural" order. Items in red and blue display other forms of order. Note also that the ls command in Terminal typically requires the "-1" option to generate the single-line list view that you see in my chart; that element has been omitted for visual clarity. To view this 1920x1080 image at full resolution, right-click and select "open image in new tab" – or on phones and tablets, "tap and hold" and save it to your pictures for full-size viewing.
CONCEPTUALLY, THERE ARE ONLY 3 FUNDAMENTAL WAYS TO CONTROL THE DISPLAYED ORDER OF IMAGES:
You can rename the files into a "computer-friendly" sorting order that your image viewer understands – by leveraging existing file name or timestamp metadata.
You can leave the files alone and change the internal sorting behavior of the image viewer itself – by leveraging existing file name or timestamp metadata.
You can manipulate and/or leverage various other metadata (such as exif data, if it exists) – but the internal sorting behavior of the image viewer would still need to correctly interpret this metadata in order for it to be of use. I will not be exploring this third possibility, but I wanted to mention it anyway.
OK, enough theory! I will now dive into 4 specific methods that have a high likelihood of accomplishing what you want. Methods 1 and 3 are completely different but are probably the most applicable for your particular use case.
Before you even think about using any of my suggestions or command lines, you need to BACKUP all your images. If a power surge or other random glitch occurred while the system was processing your files, you could lose EVERYTHING... FOREVER!
METHOD #1: RENAME ALL YOUR FILES WITH NATURAL SORTING:
Simply go to the folder that has all your images and right-click it. Then click "Open in Terminal" and run the following command line that I created back in June of 2018 (as shown in "Example 4" in my original tutorial at the top of this thread):
start=1; ls -v *.jpg | cat -n | while read n f; do mv "$f" "`printf "%08d - $f" $start`"; ((start++)); done
The file names will then be renamed as shown below (these are the same names used in my chart). All image viewers I'm aware of – including GPicView – will correctly display these file names in the proper order (thanks to the consistently "padded" numbers with leading zeros that appear at the beginning of each file name):
00000001 - DAVIS 1.jpg
00000002 - DAVIS 2.jpg
00000003 - DAVIS 10.jpg
00000004 - JONES p 1-2.jpg
00000005 - JONES p 3-4.jpg
00000006 - JONES p 5-6.jpg
00000007 - JONES p 10-11.jpg
00000008 - JONES p 30-31.jpg
00000009 - JONES p 50-51.jpg
00000010 - SMITH 6.jpg
00000011 - SMITH 68.jpg
00000012 - SMITH 686.jpg
CRITICAL NOTE #1:
The command line I used is identical to my tutorial's original command line with one tiny exception. Because you have 500,000 images, your file names need a minimum "padding" of 6 digits, not 4. However, with 6-digit padding, you can only hit a maximum of 999,999 images. Since you don't want to run out of digits in the future – which would create an entirely new sorting problem – I expanded it to 8 digits. That will give you "room for growth" and allow you to handle up to 100 million images. If you think your group may eventually handle up to a billion images, you can tweak my command line up to a maximum of 9 digits. All of this is controlled by the part that currently says "%08d".
CRITICAL NOTE #2:
The ".jpg" part of my command line is case-sensitive
. So if all your images end in JPG, for example, you will have to change that part accordingly. Also, of course, .JPG is definitely NOT the same as .JPEG. The command line is quite exacting in that regard. I deliberately added the file type to my command line in order to screen out unrelated files. For example, if some .txt files were accidentally inside your image folder and the ".jpg" part wasn't there, it would grab the .txt files and rename them too (thus becoming part of the numerical sequence). Conversely, you can eliminate the .jpg part completely and my command line will simply rename ALL FILES
in the folder, regardless of extension. As long as the folder is "clean" and only has what you want in it, that works just fine. You can also change the .jpg to .png or .bmp or whatever you want. Finally, people sometimes have a mix of *.jpg and *.JPG and *.JPEG and *.jpeg extensions in the same folder. If that's the case, just make sure the folder only has your images in it and run my command line without any file type. If you pay attention to the details, you'll find my sorting technique is quite potent and flexible – because it fully leverages the extremely robust ls command from Linux itself.
CRITICAL NOTE #3:
My command line is quite fast for a $35 computer. It will process more than 5,000 images per minute on a Raspberry Pi 3. So it should only take about 100 minutes to rename half a million images. Maybe! See the next note for more details.
CRITICAL NOTE #4:
In the past, I've generated about 100,000 "surveillance" images on my Raspberry in various experiments (just birds and squirrels in my back yard, etc). And I had them all in one folder. So I do have experience with managing and manipulating large volumes of images on the Raspberry. But I've never gotten into 500,000 territory. My general sense is that things seem to slow down a bit once you break the 50,000 mark. And for all I know, the entire operating system might collapse once you get up to 200,000 or 500,000 files – especially if they're all in the same folder. Remember – the Raspberry is still a $35 computer, not a $3,500 computer. Such a vast number of images is unknown territory for me, so the behavior at those levels could be unpredictable.
But if I were you, I would just "go for it" at first. Let's say you have all 500,000 images in one folder. As long as you've backed everything up, you have nothing to lose! So I would say just run my command line and see what happens! Go have a sandwich and watch some TV – and then come back about 2 hours later and see what's up. Who knows – it might even take 4 hours, because things might slow down unexpectedly once you cross 230,000 images or some other arbitrary "threshold". Just be patient. If it works, it works – if it doesn't, it doesn't. But let's say your system chokes on 500,000 images in one folder. If that happens, I would simply break up the entire collection into 10 separate folders of 50,000 images each – and then run my command line on each folder individually. Each folder will take less than 10 minutes to process. Of course, each folder will then end up with duplicate number padding (00000001 to 00050000) – but for your particular use case, that shouldn't matter. Once someone gets to image 50,000 in folder 1, they can simply move on to image 1 in folder 2, etc. And remember: My technique still retains the original part of the file name as well – so all file names will still be unique!
CRITICAL NOTE #5:
leave File Manager open when you execute the file renaming command in Terminal. In other words, Terminal should be the ONLY
program running on your system when you follow my procedure. Having File Manager open will force the contents of the folder to constantly "refresh" inside the window, which will greatly bog down your system or even crash it during this intense process. You should also use common sense and do everything after a FRESH BOOT – and a full minute after that for your system to completely settle down.
Create a brand-new empty folder on a Raspberry and stick a few hundred SAMPLE images inside it. Carefully pick them out to find a truly representative sample to really test out the sorting. Then just run my command line and see if it produces the results you want. On a project like this, it's always best to first test things out for 15 minutes to see how the general procedure behaves before you waste several hours!
WHAT IF IT DOESN'T WORK?
If my method doesn't work, I can already tell you that it isn't my fault. How can I be so confident? It's because my command line is thoroughly tested and proven to work – it properly leverages the built-in "natural sorting" algorithm provided by Linux itself. That means that if the images end up out of order, there is only one realistic possibility: The original file names are hopelessly non-conforming to "natural" order. As just a random example, if some of your existing image names are based on the names of ancient imaginary gods, there obviously is no way that the natural sorting algorithm will understand that – that Zaduwapa always comes before Pabawulu, for instance. Or let's say that some of the file names use hyphens in some cases – but at other times, they use a different symbol from the keyboard for the same purpose. In other words, if the existing names you have for your images are INTERNALLY INCONSISTENT
, then all bets are off! Or what if some of the file names have typos in them? Or what if some of the people that created these images randomly used "non-standard" characters in the file names – like the $ or # symbol? If any of these things apply, you're almost certainly out of luck – because the natural sorting algorithm is definitely NOT
magic or a mind reader! It still requires that every single image was carefully named in an appropriate and consistent and "natural" manner. And keep in mind, that's "natural" as defined by the Linux operating system (or Windows or Mac or almost every other operating system). It's certainly not going to be "natural" as a random human might define it!
GARBAGE IN, GARBAGE OUT:
To be honest, before this project was even started, all images should have been given "computer friendly" file names from the get-go. I realize that you personally may have been given these images by someone else, so I'm not saying it was your fault. But a computer, of course, doesn't care what someone's tale of woe may be. IT IS WHAT IT IS! And there's only one proper way to name large numbers of sequential images: They must all begin with an identically-formatted number that's "padded" with "leading zeros". In other words, 0001, 0002, 0003, etc. That's just how it is – and how it will always be! Even if it were a $10 billion dollar project, if that basic consideration was not taken into account from the start and you ended up with trillions of images with inherently messed up file names, the entire project might literally be UNSALVAGEABLE. The entire $10 billion dollar project might have to be thrown away in a dumpster and started all over again from scratch. Seriously – this one topic is that big of a deal! Which is partly why I wrote an entire tutorial on it.
WHAT ALTERNATIVES WOULD THERE BE IF THE FILES STILL PROVE TO BE UNSORTABLE IN A SATISFACTORY MANNER?
You would have to hire a programmer who could attempt to write a custom
sorting algorithm that would somehow take into account Zaduwapa or any other "unorthodox" or inconsistent elements in your file names. Your only other option would be to hire an army of clerical staff (or volunteers) to manually go through all 500,000 file names and fix them by hand. In theory, a sophisticated A.I. might also be able to automate such a horrendously tedious task!
METHOD #2: RENAME ALL YOUR FILES BASED ON FILE MODIFICATION TIME:
In theory, if a single system was used to scan or retrieve all 500,000 images – and those scans or retrievals were done in the same order as the page order – you should be able to use the timestamps of the files to sort them without any regard to the actual file names!
If that is the case, you would use this command line:
start=1; ls -tr *.jpg | cat -n | while read n f; do mv "$f" "`printf "%08d - $f" $start`"; ((start++)); done
With this method, if page 1 were the first image generated or acquired, it would automatically become image number 00000001. However, if page 54 were the second image generated or acquired, it would automatically become image number 00000002. In other words, this method goes strictly by the timestamp – without any regard to what the existing file name is (although it will still retain the original file name inside the newly-created name).
I obviously don't know what the background of these genealogical images are and how they were acquired. But be aware that if you've transferred them from another system, for example, the timestamps could be all messed up or completely lost (all set to 12 midnight or whatever). Also know that if these images were generated or acquired OUT OF ORDER – and I suspect they probably were – then the timestamp method would obviously not work!
METHOD #3: USE AUTOMATIC NATURAL SORTING IN FEH – THE BEST IMAGE VIEWER FOR LINUX:
For any serious image viewing, you need feh – not GPicView. And yes, it's spelled with a lower case "f". Feh is a free program that's available in the official Raspbian repository. But don't bother installing it! That version is two years old at this point and it's missing the one critical feature you need – automatic "natural sorting"! It's pretty amazing – it will automatically "play" your images in proper "natural" order without any file renaming whatsoever.
In other words, it's completely "non-destructive" in that it doesn't tamper with the files at all. This new feature became available only 8 months ago in March of 2018, so you definitely lucked out!
It's the 4th item in green on my chart – the item labeled "feh --sort name --version-sort". Once you have feh, that's the actual command line you would use to activate feh and place it in "natural sort" mode.
So to get that awesome new feature, you'll need the latest version of feh.
To do that, you'll need to build the program from raw source code.
Don't worry though – I wrote a software building script that will do everything for you automatically!
In fact, I communicated with the developer of feh just yesterday — we were able to resolve a bug that would have prevented you from using natural sorting. But that has now been fixed!
If anyone is curious to know more about that conversation, check out our exchange
on feh's official GitHub site.
So here's what you need to do to build the latest version of feh:
Carefully copy the following script and simply paste the entire thing into Terminal and hit the Enter key! That's it! My script requires a basic Internet connection and will only take about 3 minutes to complete. On the remote chance that you've already installed an older version of feh, you first need to remove it from your system with this command line:
sudo apt-get --purge remove feh
NOTE: During the building process, you will probably see some "warnings" about unused variables, etc. They are developer-related notifications that have no relevance to the user.
Whatever you do, don't forget to include the large blue "curly braces" at the beginning and end of my script (they are a part of my script). So here it is – just paste this into Terminal and run it:
# INSTALL THE DEPENDENCIES:
sudo apt-get update
sudo apt-get install -y libcurl4-openssl-dev libx11-dev libxt-dev libimlib2-dev libxinerama-dev libjpeg-progs libpng-dev libexif-dev libexif12
# CREATE THE FEH BUILD FOLDER:
# DOWNLOAD AND UNZIP THE LATEST STABLE RELEASE TARBALL:
tar jxvf feh-2.28.1.tar.bz2
# COMPILE AND INSTALL FEH:
make -j4 curl=0 xinerama=0 verscmp=1
sudo make install
If everything worked, after about 3 minutes, the last 2 lines of Terminal's output should say this:
feh version 2.28.1
To use feh in natural sorting mode, simply do this:
Right-click the folder that has your images. Then click "Open in Terminal". Then run the following command line and you're good to go. Use the left and right arrow keys to rapidly flip through the images:
feh --sort name --version-sort
An even slicker version of this command line, which the genealogists might especially like, is the following. It does the same natural sorting AND
it automatically displays all the images FULL SCREEN
. If the images are bigger than the display's resolution, it will shrink the image to fit the display. And if the image is smaller than the display, it will expand the image to fill up the display (while maintaining its original aspect ratio). On top of that, the "d" in the line means it will also display the exact file name on-screen – so the genealogists can always know exactly what name and page number they're looking at:
feh -F -d --zoom max --sort name --version-sort
CRITICAL NOTE #1:
Remember – as explained earlier, if your files themselves have "unnatural" file names or have typos or are inconsistently named, there's nothing feh or any other program can do about it! They will be out of order!
CRITICAL NOTE #2:
Feh has TONS
of other options. See the official manual
CRITICAL NOTE #3:
Once you've installed the latest version of feh, you obviously don't need to build it again on that particular computer. If you need to set feh up on multiple computers, just run my 3-minute script on each one.
METHOD #4: USE AUTOMATIC TIMESTAMP SORTING IN FEH:
To sort by timestamp in feh, simply run the following command line – after making sure, of course, that you have Terminal open inside the right folder. The first image acquired – in other words, the oldest image – will be the first one displayed. That's the most common "timeline" people would want for their images. If you want the most recently acquired image to appear first instead, just remove the "--reverse" part:
feh -S mtime --reverse
And if you also want full-screen viewing with file name display, use this version instead:
feh -F -d --zoom max -S mtime --reverse