User avatar
scruss
Posts: 1821
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: TUTORIAL: File Sorting on the Raspberry Pi

Mon Nov 05, 2018 5:53 pm

GPicView is only following orders. What it finds logical isn't what we might.

This will fix it, but backup everything first¹ because this will automatically rename files and perhaps do bad things:

Code: Select all

rename 's/(\d+)/sprintf("%04d", $1)/ge' *
This assumes that all files containing runs of numbers have numbers less than 10,000. It'll also rename files like SMWB bp5 p 20-21 to SMWB bp0005 p 0020-0021 because it doesn't know what it's doing.

If you've got thousands of scanned pages, a better way of storing them is as a PDF. img2pdf will store multiple pages as a ‘book’, leaving the files unchanged. Scan Tailor does much the same, but allows graphical rearranging. Scan Tailor may alter your scans a bit, so maybe you don't want that.

---
¹: and if these are out-of-copyright genealogical registers, please upload them to archive.org for safe(r) keeping. Files are always just one click away from oblivion.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

User avatar
RPi_Mike
Posts: 100
Joined: Sat Dec 09, 2017 12:57 am
Location: United States

Re: TUTORIAL: File Sorting on the Raspberry Pi

Mon Nov 05, 2018 6:12 pm

mtmcdonough wrote:
Fri Nov 02, 2018 3:01 pm
The file names are extremely simple based on the page number:

HSWB bp p 2-3
HSWB bp p 4-5
HSWB bp p 6-7
HSWB bp p 8-9
HSWB bp p 10-11
HSWB bp p 12-13
HSWB bp p 14-15

I can remember that one person was using GPicView and was on page 686 and clicked next page and was on page 68.

PROLOGUE:
I had to highlight your use of the term "extremely simple". To a human or perhaps a well-trained A.I., I agree – the file names you list are simple. But keep in mind that since you're using a computer, all that really matters is what the computer thinks is simple – not you!


I ALREADY EXPLAINED ALL OF THIS IN MY FILE SORTING TUTORIAL!
Yeah, I know that might sound obnoxious. But I wouldn't be the infamous RPi_Mike if I didn't speak directly. I have a reputation to live up to, after all. Haha.

I spent A LOT of time on this extremely comprehensive tutorial, so I can't resist pointing out that for those who read it carefully, it already addresses the basic issue you raise.

In fact, if you had read the 7th paragraph of my tutorial, you would have seen that I specifically discussed the major sorting limitations of GPicView – the exact same image viewer you're asking about! I even explained that resolving those issues with GPicView would typically require a complete renaming of the files.

Not only that, I also gave a specific example – along with a working command line that anyone can use – that automatically renames files based on "natural" sorting. That's listed in bold under "EXAMPLE 4" of my tutorial and completely solves the problem you're dealing with (based on the limited sample you provided).

Finally, I also created a detailed chart at the very top of my tutorial that lists the exact sorting behavior of the Raspberry under many different scenarios. Although that particular chart didn't specifically list the behavior of GPicView, it certainly shows the only kind of sorting that correctly handles the classic "1, 2, 10" ending up as "1, 10, 2". So when you were surprised that "page 686" suddenly advanced to "page 68", it didn't have to be a surprise – at least not after reading my tutorial!

Nonetheless, I realize you may not have made the connection between "10 and 1" – which IS listed on my chart – and the examples you give of "686 and 68". Realize, of course, that there are literally an infinity of whole numbers – so I obviously couldn't list every possible whole number sequence in the mathematical universe. Nonetheless, the core issue – and underlying pattern – is EXACTLY the same.

[Hint: 10 and 1 – and 686 and 68 – both have something in common. Each pair shares the same initial numeral! In the first pair, the shared numeral is "1". In the second pair, the shared numeral is "6". So although it may seem nonsensical to most humans, there's a perfectly sound logic to that kind of sorting if you think of it in terms of "grouping". In other words, from the standpoint of many sorting algorithms, all numbers that begin with the same numeral are part of the same group – and therefore will be grouped together – even though most people will think "wait, why is this out of numerical order?" It's the same basic reason why aardvark is put next to ant – they both begin with "a". This was also addressed in my tutorial!]

OK, good – I'm glad I got all that off my chest!

I will now drill down to an even deeper level of insanity on the topic of file sorting and image display – and take it to a whole new level...





ANOTHER RPI_MIKE EXCLUSIVE – THE MOST COMMON TYPES OF FILE SORTING BEHAVIOR ON THE RASPBERRY PI:
Raspberry_Pi_Sorting_Behavior_RPi_Mike.png
Raspberry_Pi_Sorting_Behavior_RPi_Mike.png (48.93 KiB) Viewed 382 times
Items in green reflect human-friendly "natural" order. Items in red and blue display other forms of order. Note also that the ls command in Terminal typically requires the "-1" option to generate the single-line list view that you see in my chart; that element has been omitted for visual clarity. To view this 1920x1080 image at full resolution, right-click and select "open image in new tab" – or on phones and tablets, "tap and hold" and save it to your pictures for full-size viewing.





CONCEPTUALLY, THERE ARE ONLY 3 FUNDAMENTAL WAYS TO CONTROL THE DISPLAYED ORDER OF IMAGES:

1: You can rename the files into a "computer-friendly" sorting order that your image viewer understands – by leveraging existing file name or timestamp metadata.

2: You can leave the files alone and change the internal sorting behavior of the image viewer itself – by leveraging existing file name or timestamp metadata.

3: You can manipulate and/or leverage various other metadata (such as exif data, if it exists) – but the internal sorting behavior of the image viewer would still need to correctly interpret this metadata in order for it to be of use. I will not be exploring this third possibility, but I wanted to mention it anyway.

OK, enough theory! I will now dive into 4 specific methods that have a high likelihood of accomplishing what you want. Methods 1 and 3 are completely different but are probably the most applicable for your particular use case.





WARNING:
Before you even think about using any of my suggestions or command lines, you need to BACKUP all your images. If a power surge or other random glitch occurred while the system was processing your files, you could lose EVERYTHING... FOREVER!





METHOD #1: RENAME ALL YOUR FILES WITH NATURAL SORTING:
Simply go to the folder that has all your images and right-click it. Then click "Open in Terminal" and run the following command line that I created back in June of 2018 (as shown in "Example 4" in my original tutorial at the top of this thread):

start=1; ls -v *.jpg | cat -n | while read n f; do mv "$f" "`printf "%08d - $f" $start`"; ((start++)); done

The file names will then be renamed as shown below (these are the same names used in my chart). All image viewers I'm aware of – including GPicView – will correctly display these file names in the proper order (thanks to the consistently "padded" numbers with leading zeros that appear at the beginning of each file name):

00000001 - DAVIS 1.jpg
00000002 - DAVIS 2.jpg
00000003 - DAVIS 10.jpg
00000004 - JONES p 1-2.jpg
00000005 - JONES p 3-4.jpg
00000006 - JONES p 5-6.jpg
00000007 - JONES p 10-11.jpg
00000008 - JONES p 30-31.jpg
00000009 - JONES p 50-51.jpg
00000010 - SMITH 6.jpg
00000011 - SMITH 68.jpg
00000012 - SMITH 686.jpg


CRITICAL NOTE #1: The command line I used is identical to my tutorial's original command line with one tiny exception. Because you have 500,000 images, your file names need a minimum "padding" of 6 digits, not 4. However, with 6-digit padding, you can only hit a maximum of 999,999 images. Since you don't want to run out of digits in the future – which would create an entirely new sorting problem – I expanded it to 8 digits. That will give you "room for growth" and allow you to handle up to 100 million images. If you think your group may eventually handle up to a billion images, you can tweak my command line up to a maximum of 9 digits. All of this is controlled by the part that currently says "%08d".

CRITICAL NOTE #2: The ".jpg" part of my command line is case-sensitive. So if all your images end in JPG, for example, you will have to change that part accordingly. Also, of course, .JPG is definitely NOT the same as .JPEG. The command line is quite exacting in that regard. I deliberately added the file type to my command line in order to screen out unrelated files. For example, if some .txt files were accidentally inside your image folder and the ".jpg" part wasn't there, it would grab the .txt files and rename them too (thus becoming part of the numerical sequence). Conversely, you can eliminate the .jpg part completely and my command line will simply rename ALL FILES in the folder, regardless of extension. As long as the folder is "clean" and only has what you want in it, that works just fine. You can also change the .jpg to .png or .bmp or whatever you want. Finally, people sometimes have a mix of *.jpg and *.JPG and *.JPEG and *.jpeg extensions in the same folder. If that's the case, just make sure the folder only has your images in it and run my command line without any file type. If you pay attention to the details, you'll find my sorting technique is quite potent and flexible – because it fully leverages the extremely robust ls command from Linux itself.

CRITICAL NOTE #3: My command line is quite fast for a $35 computer. It will process more than 5,000 images per minute on a Raspberry Pi 3. So it should only take about 100 minutes to rename half a million images. Maybe! See the next note for more details.

CRITICAL NOTE #4: In the past, I've generated about 100,000 "surveillance" images on my Raspberry in various experiments (just birds and squirrels in my back yard, etc). And I had them all in one folder. So I do have experience with managing and manipulating large volumes of images on the Raspberry. But I've never gotten into 500,000 territory. My general sense is that things seem to slow down a bit once you break the 50,000 mark. And for all I know, the entire operating system might collapse once you get up to 200,000 or 500,000 files – especially if they're all in the same folder. Remember – the Raspberry is still a $35 computer, not a $3,500 computer. Such a vast number of images is unknown territory for me, so the behavior at those levels could be unpredictable.

But if I were you, I would just "go for it" at first. Let's say you have all 500,000 images in one folder. As long as you've backed everything up, you have nothing to lose! So I would say just run my command line and see what happens! Go have a sandwich and watch some TV – and then come back about 2 hours later and see what's up. Who knows – it might even take 4 hours, because things might slow down unexpectedly once you cross 230,000 images or some other arbitrary "threshold". Just be patient. If it works, it works – if it doesn't, it doesn't. But let's say your system chokes on 500,000 images in one folder. If that happens, I would simply break up the entire collection into 10 separate folders of 50,000 images each – and then run my command line on each folder individually. Each folder will take less than 10 minutes to process. Of course, each folder will then end up with duplicate number padding (00000001 to 00050000) – but for your particular use case, that shouldn't matter. Once someone gets to image 50,000 in folder 1, they can simply move on to image 1 in folder 2, etc. And remember: My technique still retains the original part of the file name as well – so all file names will still be unique!

CRITICAL NOTE #5: Do NOT leave File Manager open when you execute the file renaming command in Terminal. In other words, Terminal should be the ONLY program running on your system when you follow my procedure. Having File Manager open will force the contents of the folder to constantly "refresh" inside the window, which will greatly bog down your system or even crash it during this intense process. You should also use common sense and do everything after a FRESH BOOT – and a full minute after that for your system to completely settle down.

TEST FIRST: Create a brand-new empty folder on a Raspberry and stick a few hundred SAMPLE images inside it. Carefully pick them out to find a truly representative sample to really test out the sorting. Then just run my command line and see if it produces the results you want. On a project like this, it's always best to first test things out for 15 minutes to see how the general procedure behaves before you waste several hours!

WHAT IF IT DOESN'T WORK? If my method doesn't work, I can already tell you that it isn't my fault. How can I be so confident? It's because my command line is thoroughly tested and proven to work – it properly leverages the built-in "natural sorting" algorithm provided by Linux itself. That means that if the images end up out of order, there is only one realistic possibility: The original file names are hopelessly non-conforming to "natural" order. As just a random example, if some of your existing image names are based on the names of ancient imaginary gods, there obviously is no way that the natural sorting algorithm will understand that – that Zaduwapa always comes before Pabawulu, for instance. Or let's say that some of the file names use hyphens in some cases – but at other times, they use a different symbol from the keyboard for the same purpose. In other words, if the existing names you have for your images are INTERNALLY INCONSISTENT, then all bets are off! Or what if some of the file names have typos in them? Or what if some of the people that created these images randomly used "non-standard" characters in the file names – like the $ or # symbol? If any of these things apply, you're almost certainly out of luck – because the natural sorting algorithm is definitely NOT magic or a mind reader! It still requires that every single image was carefully named in an appropriate and consistent and "natural" manner. And keep in mind, that's "natural" as defined by the Linux operating system (or Windows or Mac or almost every other operating system). It's certainly not going to be "natural" as a random human might define it!

GARBAGE IN, GARBAGE OUT: To be honest, before this project was even started, all images should have been given "computer friendly" file names from the get-go. I realize that you personally may have been given these images by someone else, so I'm not saying it was your fault. But a computer, of course, doesn't care what someone's tale of woe may be. IT IS WHAT IT IS! And there's only one proper way to name large numbers of sequential images: They must all begin with an identically-formatted number that's "padded" with "leading zeros". In other words, 0001, 0002, 0003, etc. That's just how it is – and how it will always be! Even if it were a $10 billion dollar project, if that basic consideration was not taken into account from the start and you ended up with trillions of images with inherently messed up file names, the entire project might literally be UNSALVAGEABLE. The entire $10 billion dollar project might have to be thrown away in a dumpster and started all over again from scratch. Seriously – this one topic is that big of a deal! Which is partly why I wrote an entire tutorial on it.

WHAT ALTERNATIVES WOULD THERE BE IF THE FILES STILL PROVE TO BE UNSORTABLE IN A SATISFACTORY MANNER? You would have to hire a programmer who could attempt to write a custom sorting algorithm that would somehow take into account Zaduwapa or any other "unorthodox" or inconsistent elements in your file names. Your only other option would be to hire an army of clerical staff (or volunteers) to manually go through all 500,000 file names and fix them by hand. In theory, a sophisticated A.I. might also be able to automate such a horrendously tedious task!





METHOD #2: RENAME ALL YOUR FILES BASED ON FILE MODIFICATION TIME:
In theory, if a single system was used to scan or retrieve all 500,000 images – and those scans or retrievals were done in the same order as the page order – you should be able to use the timestamps of the files to sort them without any regard to the actual file names!

If that is the case, you would use this command line:

start=1; ls -tr *.jpg | cat -n | while read n f; do mv "$f" "`printf "%08d - $f" $start`"; ((start++)); done


NOTE #1: With this method, if page 1 were the first image generated or acquired, it would automatically become image number 00000001. However, if page 54 were the second image generated or acquired, it would automatically become image number 00000002. In other words, this method goes strictly by the timestamp – without any regard to what the existing file name is (although it will still retain the original file name inside the newly-created name).

NOTE #2: I obviously don't know what the background of these genealogical images are and how they were acquired. But be aware that if you've transferred them from another system, for example, the timestamps could be all messed up or completely lost (all set to 12 midnight or whatever). Also know that if these images were generated or acquired OUT OF ORDER – and I suspect they probably were – then the timestamp method would obviously not work!





METHOD #3: USE AUTOMATIC NATURAL SORTING IN FEH – THE BEST IMAGE VIEWER FOR LINUX:
For any serious image viewing, you need feh – not GPicView. And yes, it's spelled with a lower case "f". Feh is a free program that's available in the official Raspbian repository. But don't bother installing it! That version is two years old at this point and it's missing the one critical feature you need – automatic "natural sorting"! It's pretty amazing – it will automatically "play" your images in proper "natural" order without any file renaming whatsoever.

In other words, it's completely "non-destructive" in that it doesn't tamper with the files at all. This new feature became available only 8 months ago in March of 2018, so you definitely lucked out!

It's the 4th item in green on my chart – the item labeled "feh --sort name --version-sort". Once you have feh, that's the actual command line you would use to activate feh and place it in "natural sort" mode.

So to get that awesome new feature, you'll need the latest version of feh.

To do that, you'll need to build the program from raw source code.

Don't worry though – I wrote a software building script that will do everything for you automatically!

In fact, I communicated with the developer of feh just yesterday — we were able to resolve a bug that would have prevented you from using natural sorting. But that has now been fixed!

If anyone is curious to know more about that conversation, check out our exchange on feh's official GitHub site.

So here's what you need to do to build the latest version of feh:

Carefully copy the following script and simply paste the entire thing into Terminal and hit the Enter key! That's it! My script requires a basic Internet connection and will only take about 3 minutes to complete. On the remote chance that you've already installed an older version of feh, you first need to remove it from your system with this command line:

sudo apt-get --purge remove feh

NOTE: During the building process, you will probably see some "warnings" about unused variables, etc. They are developer-related notifications that have no relevance to the user.

Whatever you do, don't forget to include the large blue "curly braces" at the beginning and end of my script (they are a part of my script). So here it is – just paste this into Terminal and run it:


{
# INSTALL THE DEPENDENCIES:
sudo apt-get update
sudo apt-get install -y libcurl4-openssl-dev libx11-dev libxt-dev libimlib2-dev libxinerama-dev libjpeg-progs libpng-dev libexif-dev libexif12


# CREATE THE FEH BUILD FOLDER:
mkdir Feh_Build


# DOWNLOAD AND UNZIP THE LATEST STABLE RELEASE TARBALL:
cd Feh_Build
wget https://feh.finalrewind.org/feh-2.28.1.tar.bz2
tar jxvf feh-2.28.1.tar.bz2
cd feh-2.28.1


# COMPILE AND INSTALL FEH:
make -j4 curl=0 xinerama=0 verscmp=1
sudo make install
sudo ldconfig
feh -version
}


If everything worked, after about 3 minutes, the last 2 lines of Terminal's output should say this:

feh version 2.28.1
Compile-time switches:

To use feh in natural sorting mode, simply do this:

Right-click the folder that has your images. Then click "Open in Terminal". Then run the following command line and you're good to go. Use the left and right arrow keys to rapidly flip through the images:

feh --sort name --version-sort

An even slicker version of this command line, which the genealogists might especially like, is the following. It does the same natural sorting AND it automatically displays all the images FULL SCREEN. If the images are bigger than the display's resolution, it will shrink the image to fit the display. And if the image is smaller than the display, it will expand the image to fill up the display (while maintaining its original aspect ratio). On top of that, the "d" in the line means it will also display the exact file name on-screen – so the genealogists can always know exactly what name and page number they're looking at:

feh -F -d --zoom max --sort name --version-sort


CRITICAL NOTE #1: Remember – as explained earlier, if your files themselves have "unnatural" file names or have typos or are inconsistently named, there's nothing feh or any other program can do about it! They will be out of order!

CRITICAL NOTE #2: Feh has TONS of other options. See the official manual for details.

CRITICAL NOTE #3: Once you've installed the latest version of feh, you obviously don't need to build it again on that particular computer. If you need to set feh up on multiple computers, just run my 3-minute script on each one.





METHOD #4: USE AUTOMATIC TIMESTAMP SORTING IN FEH:
To sort by timestamp in feh, simply run the following command line – after making sure, of course, that you have Terminal open inside the right folder. The first image acquired – in other words, the oldest image – will be the first one displayed. That's the most common "timeline" people would want for their images. If you want the most recently acquired image to appear first instead, just remove the "--reverse" part:

feh -S mtime --reverse

And if you also want full-screen viewing with file name display, use this version instead:

feh -F -d --zoom max -S mtime --reverse
Last edited by RPi_Mike on Mon Nov 05, 2018 10:34 pm, edited 2 times in total.

User avatar
RPi_Mike
Posts: 100
Joined: Sat Dec 09, 2017 12:57 am
Location: United States

RPi_Mike – King of File Sorting [LOL]

Mon Nov 05, 2018 8:05 pm

RPi_Mike_King_of_File_Sorting_LOL.png
RPi_Mike_King_of_File_Sorting_LOL.png (73.09 KiB) Viewed 363 times
This chart, of course, only scratches the surface of RPi_Mike's file sorting majesty. He has several other equally impressive methods to choose from, as described in his file sorting magnum opus. To view this image at full resolution, right-click and select "open image in new tab" – or on phones and tablets, "tap and hold" and save it to your pictures for full-size viewing.

User avatar
scruss
Posts: 1821
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: TUTORIAL: File Sorting on the Raspberry Pi

Tue Nov 06, 2018 2:14 am

by sticking a serial number on the front of the file name, that's not preserving the original file name at all

Also, now try using the feh options from the file manager. Without a shell script. Good luck there!

Also, your method 1 has unnecessary complexity. If you must do it that way, why not use the value that cat -n has already given you?

Code: Select all

ls -v *.jpg | cat -n | while read n f; do mv "$f" "$(printf "%08d - $f" $n)"; done
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

User avatar
RPi_Mike
Posts: 100
Joined: Sat Dec 09, 2017 12:57 am
Location: United States

Re: TUTORIAL: File Sorting on the Raspberry Pi

Tue Nov 06, 2018 3:29 am

scruss wrote:
Tue Nov 06, 2018 2:14 am
by sticking a serial number on the front of the file name, that's not preserving the original file name at all

Also, now try using the feh options from the file manager. Without a shell script. Good luck there!]

Someone dares to question the self-proclaimed King of File Sorting? I'm shocked!

Seriously though: I understand the semantic argument you're attempting to make, but the simple fact remains that my method fully preserves the entirety of the original file name — without the slightest alteration — down to every last character — inside the newly numbered file. The "name" portion of the file is completely preserved, so I'm very comfortable with my choice of words.

If I supplement my existing identity by adding sunglasses to the front of my face, has my face itself actually changed?

I suppose on some level it has changed, because it's now wearing glasses — but my unaltered face is still there. Your method, however, is the equivalent of altering the face itself.

But there's also a giant elephant in the room: The primary point of this entire thread is about renaming files to suit your sorting needs. So of course I'm renaming the files when I'm renaming the files! I certainly hope you're not trying to play "gotcha" on that.

As for using feh with File Manager, that's clearly your thought, not mine — because I never once said anything about it!

PS: While I was writing my response, I see you went back and edited your post to add a 3rd point. If you wish to research, develop and thoroughly test a more "elegant" or "less complex" command line that does everything mine does, I'm certainly not stopping you!

mtmcdonough
Posts: 5
Joined: Thu Nov 01, 2018 10:21 pm

Re: TUTORIAL: File Sorting on the Raspberry Pi

Wed Nov 07, 2018 8:59 am

Mike: Thank you very much for your clear, direct and exhaustive reply.

First, I am a new user of the Raspberry and one of the staff volunteers who noticed the difference of the file sorting versus what we see on our Windows 10 computers. I am not a technical person and have very little experience with the Raspberry Pi, Raspbian or GPicView. We have been using Windows computers for years and only recently decided to do a test with the Raspberry. We have a support person who did the install and supports us remotely.

Second, there is great reluctance of our Board to change our file names for many reasons which I won't go into here.

Third, it's possible that your method #3 will work for us and I will pass along your article to our support person. I very much appreciate all the work you put into addressing my questions. I guess the real issue is that we hope we can find a way to sort our files on the Raspberry in the same manner they are sorting in Windows. Maybe that's not possible.

Thanks, too, to all the others who have posted replies. Onward and upward.

User avatar
scruss
Posts: 1821
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: TUTORIAL: File Sorting on the Raspberry Pi

Thu Nov 08, 2018 3:05 pm

So if you can't rename the files, would it be okay if they had the modification times arranged in natural order so at least you could access them in order? To do that, you'd use touch instead of mv.

I'm still open to helping with collating these in a book form using imgtopdf: it keeps the files unchanged (they can be extracted identically with pdfimages), you can add helpful metadata (book title, author, etc) and they can be read on any device. Ping me at my username@my username.com if you're interested - it may be a hard sell to your directors, though.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

User avatar
RPi_Mike
Posts: 100
Joined: Sat Dec 09, 2017 12:57 am
Location: United States

Re: TUTORIAL: File Sorting on the Raspberry Pi

Thu Nov 08, 2018 10:04 pm

mtmcdonough wrote:
Wed Nov 07, 2018 8:59 am
I guess the real issue is that we hope we can find a way to sort our files on the Raspberry in the same manner they are sorting in Windows. Maybe that's not possible.

Big news! For image viewing, I figured out how to replicate the entire Microsoft Windows experience on the Raspberry Pi – complete with automatic natural file sorting (and no need to rename your files).

I suspect it's exactly what the genealogists want for their 500,000 images. Then again, one never knows what "the Board" might think!

I have fully tested my technique and it works perfectly, so this isn't vaporware. But I still need to write it up into a high-quality, easy-to-follow tutorial that anyone can use. That's what I'm famous for, after all. Haha.

So give me a couple days to get around to that.



scruss wrote:
Tue Nov 06, 2018 2:14 am
Also, your method 1 has unnecessary complexity. If you must do it that way, why not use the value that cat -n has already given you?

Code: Select all

ls -v *.jpg | cat -n | while read n f; do mv "$f" "$(printf "%08d - $f" $n)"; done

I wish to elaborate on this comment – your "3rd point" that you slipped in while I was writing my last response to you.

Your characterization of "unnecessary complexity" in my command line was supremely ironic. Why? Because what you perceived as "unnecessary complexity" was actually a powerful feature you failed to grasp!

Limiting the command line to "n" – as you recommend – forces all renaming jobs to begin with 1 (such as 0001).

Such a limitation is a big problem in many instances. But before I continue, let me re-print one of the key versions of my command line so that other readers know what we're talking about:

start=1; ls -v *.jpg | cat -n | while read n f; do mv "$f" "`printf "%08d - $f" $start`"; ((start++)); done


Now, here's an example of why this matters:

Imagine that you have 700 images – or files of any kind – that you already renamed from 0001 to 0700.

But then, a month later, you have a new batch of images that you wish to seamlessly ADD to the existing sequence. If you limit yourself to "n", that would be a huge problem – because then you would end up with two sets of files that had duplicate 0001s, 0002s, etc.

But thanks to my extremely flexible command line, all you have to do is change the value of "start" from 1 to 701. And BOOM – all done! It will automatically rename all the new images, starting with 0701. At that point, you can safely add them to your first batch of images and the entire sequence will flow perfectly from 0699 to 0700 to 0701 to 0702 – with no gaps whatsoever.

(I should also stress that this major feature has no meaningful impact on compute time – my command line processes more than 5,000 files per minute on the Raspberry 3. So the "complexity" comes at no cost.)

This is not some made-up example either. I personally struggled with this issue while trying to feed an image sequence into FFmpeg for conversion to an MP4 video. FFmpeg is quite strict when it comes to inputting image sequences. It demands a properly sortable sequence with consistently formatted file names – as do many other programs.

In fact, it was that personal experience – and all the frustration that came with it – that led me to research and develop a truly UNIVERSAL command line that can handle almost any renaming task.

My command line fully leverages the potent ls command from Linux itself, so it wields a lot of power.

Finally, it's only fair to mention that I already explained the purpose of "start" in my original tutorial – the very first post in this entire thread! I worked very hard on my tutorial, so it's a bit vexing when someone tries to unjustifiably ding me on a matter I already explained. This is especially true when it's clear the person didn't even bother reading what I wrote – or did a quick scan and missed the important details.

In fact, check it out yourself. Visit my File Sorting Tutorial, press Ctrl+F, and do a text search for "start=". You'll see I already explained the so-called "unnecessary complexity" 5 months ago!

mtmcdonough
Posts: 5
Joined: Thu Nov 01, 2018 10:21 pm

Re: TUTORIAL: File Sorting on the Raspberry Pi

Thu Nov 08, 2018 11:33 pm

Mike: That's great news. I will be happy for forward you next post to or support person. He is currently studying your post about using feh.
Thanks for your continuing help and interest.

Scruss: I'm not sure if your posts are meant for me but we cannot use modification dates/times to keep our books in order because we often photograph a book where some or all of the images need to be cropped, straightened or re-shot. Then they may be renamed one or more times. Also, we often find we need to shoot pages we missed because they were stuck together or our volunteers simply skipped them while using the camera. But thanks your your advice and suggestions.

User avatar
scruss
Posts: 1821
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: TUTORIAL: File Sorting on the Raspberry Pi

Fri Nov 09, 2018 3:49 am

RPi_Mike wrote:
Thu Nov 08, 2018 10:04 pm
start=1; ls -v *.jpg | cat -n | while read n f; do mv "$f" "`printf "%08d - $f" $start`"; ((start++)); done
Okay then: that code does allow you to restart. That's a feature someone might use. But can you explain what the bits highlighted below do?

start=1; ls -v *.jpg | cat -n | while read n f; do mv "$f" "`printf "%08d - $f" $start`"; ((start++)); done

The variable n is never used, so the command that generates it (cat -n) doesn't do anything except waste time. Might as well remove them.
… In fact, check it out yourself. Visit my File Sorting Tutorial, press Ctrl+F, and do a text search for "start=". You'll see I already explained the so-called "unnecessary complexity" 5 months ago!
TBH, I only read the code.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

User avatar
RPi_Mike
Posts: 100
Joined: Sat Dec 09, 2017 12:57 am
Location: United States

Re: TUTORIAL: File Sorting on the Raspberry Pi

Fri Nov 09, 2018 6:30 am

scruss wrote:
Fri Nov 09, 2018 3:49 am
The variable n

My command line is so robust – so powerful – so majestic – that even if some guy in Canada erroneously strips out the important "start" feature, it still maintains its core functionality.

THAT is the purpose of the "n"!

User avatar
scruss
Posts: 1821
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: TUTORIAL: File Sorting on the Raspberry Pi

Sat Nov 10, 2018 5:31 am

You don't actually know what it does, do you?
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

User avatar
RPi_Mike
Posts: 100
Joined: Sat Dec 09, 2017 12:57 am
Location: United States

Re: TUTORIAL: File Sorting on the Raspberry Pi

Sat Nov 10, 2018 7:14 am

Wow, scruss! With blatantly over-the-top words like "robust, powerful and majestic", I thought it was obvious that I was joking!

So let me spell it out: Though harmless, the "cat -n" and "n" are not serving any purpose in the command line. As you suggest, they can be removed. But you may not want to do that (more on that in a moment).

Many months ago, I had begun my experiments with a purely "cat -n" approach. But I was unsatisfied with that solution because it locked me in to starting all sequences with 0001. As I explained earlier, there are many instances where you need to start at 0700, for example.

So as I continued exploring different techniques, obsessing on how to control the start number and ignoring everything else, "cat -n" simply got left in! One reason I have such an excellent track record on here is because I test everything before I publish a single word. Because "cat -n" had no impact on function, my command line passed all tests with flying colors – and it still does!

But here's why I would just leave it in: Out of curiosity, I tested 2 versions of my command line on 10,000 test files: One was my original command line and the other was with "cat -n" and "n" removed (as you suggested). I even rebooted my system before each test and allowed everything to settle down for a full minute before I ran them: In other words, identical testing conditions each time.

I of course suspected that by eliminating the "dead wood" of "cat -n" and "n", it would probably be a bit faster – maybe 5 or 10%.

But to my surprise, my original command line with the harmless "cat -n" was 22% faster!

I won't claim this was intentional – because it wasn't! And who knows, it might have been a random testing fluke. But if it wasn't a fluke, one purely speculative theory is that triggering "cat -n" accidentally wakes something up inside coreutils and makes everything go faster! Very few people know what goes on deep inside the bowels of Linux, so this will probably remain a mystery.

But yeah, I'm personally going to leave everything "as is" for one simple reason: My command line works perfectly, so I'm not going to mess with success.

Sometimes, the best discoveries are the accidental ones.

User avatar
RPi_Mike
Posts: 100
Joined: Sat Dec 09, 2017 12:57 am
Location: United States

Re: TUTORIAL: File Sorting on the Raspberry Pi

Sat Nov 10, 2018 7:34 am

MAGIC UNDO:
In the very near future, I'll be posting my brand-new image viewing solution – with automatic natural sorting! It will be in the "Graphics, sound and multimedia" forum, the home of my gigantic FFmpeg / mpv tutorial. That already covers video and audio, so this new tutorial should compliment things quite nicely by providing a quality solution for images.

In the mean time, I've come up with an entirely new thought on file sorting and renaming.

I realize it may not satisfy the board of directors at the genealogical society – since they may not want their 500,000 files "touched" in any manner. [Although, as long as the files have been properly backed up, it shouldn't necessarily matter if people do things to them – even if they destroy or mutilate them beyond repair! However, I'm fully aware that may not be the point: Some people, for example, may simply not want to see even the slightest addition to the core file name. That of course is an entirely separate matter that would not be addressed by a backup. These potential issues will soon be irrelevant anyway: My new image viewing solution will address all of that without any "touching" of the files whatsoever!]

Nonetheless, for the benefit of others, I'd like to introduce my "magic undo" command line. It uses a regular expression I wrote to search for the padded number prefixes and replaces them with nothing – thus completely reversing the renaming done by my original command line:
Magic_Undo_RPi_Mike.png
Magic_Undo_RPi_Mike.png (213.98 KiB) Viewed 116 times

NAME PRESERVATION: I also can't resist pointing out that my "magic undo" command line brings my claim of "name preservation" to life in a truly sweeping and literal way – because it provides a simple and rapid mechanism to restore the completely original file names without any qualification or nuance. And when I say rapid, I mean rapid: In a test I conducted on 100,000 JPG files on a Raspberry Pi 3B+, it renamed them at a rate of more than 18,000 files per minute. That's quite astonishing for a $35 computer!

CRITICAL NOTE #1: I've discovered that the rename command is perfectly willing to "lie" to you! Even when processing thousands of files, it will return you to the command prompt in about 1 second – thus creating the false impression that it's completed the task. Unless it's just a few hundred files, it has certainly NOT completed the task in that time! So you need to look at the CPU monitor instead. You'll see that it's maxed out on 1 of 4 cores (25% of total CPU activity). The rename command is not finished until your system settles back down to around 0 or 1% activity. So leave your system alone until then, or you might ruin your files!

CRITICAL NOTE #2: As I've mentioned before, if you're processing more than a thousand files, make sure that File Manager is NOT open when you run any of these renaming commands. Otherwise, it will force the displayed contents of the folder to constantly "refresh" inside File Manager, thus bogging things down. On a giant batch of files, it could even crash your system! And of course use common sense: No other programs should be running either!

CRITICAL NOTE #3: As with renaming the files in the first place, the folder within which you run this command line should be "clean". In other words, it should only contain the "padded number" files you wish to process – not other unrelated files that might fit the same prefix pattern. I could certainly spend more time making the regular expression's pattern matching even more targeted and sophisticated, but there's no reason for me to do all that when this basic "best practice" should be observed anyway!

GENERAL WARNING: Always backup your files before you do anything to them!

MAGIC UNDO COMMAND LINE: So here it is! It's also, of course, the place where scruss will start reading – since he claims, with professed honesty, to "only read the code":

rename 's/^\w+ - //' *.jpg

Return to “General programming discussion”