RPi_Mike
Posts: 85
Joined: Sat Dec 09, 2017 12:57 am
Location: United States

TUTORIAL: The True Nature of a Linux File – And its Timestamps

Sun Jun 24, 2018 4:58 am

INTRODUCTION: If you do a Google Images search for Linux "file structure" or "inodes" or "directory" – or any combination of these or similar terms – you'll see a whole bunch of diagrams that explain things from the vantage point of the Linux ext2, ext3, or ext4 FILESYSTEM. However, when you use a computer, you're rarely ever thinking about a "filesystem". Instead, you're thinking about FILES. Simple files! You might be working on a text file or a video file. But whatever you're doing, your primary "cognitive model" is of files and folders – not a file "system"!

Due to the lack of good explanations on the Internet, I decided to explain things from a "file-centric" standpoint instead of explaining the entire filesystem! Once you understand the true nature of a Linux file, you will also understand its timestamp behavior. This is the dual purpose of my tutorial.

CONCEPTUAL MODEL: Let me be clear that my diagram, which appears below, is a conceptual model – it's not intended to be a thoroughgoing, technically complete description of the entire filesystem. But it does reveal the basic nature of a file:
Parts_of_a_File_Linux_ext4_RPi_Mike.png
Parts_of_a_File_Linux_ext4_RPi_Mike.png (72.76 KiB) Viewed 1036 times
To view this image at full resolution, right-click and select "open image in new tab" – or on phones and tablets, "tap and hold" and save it to your pictures for full-size viewing.

ILLUSION OF A FILE: Your biggest takeaway should be that a file, as you perceive it with your eyes, is an ILLUSION. In a very real sense, every time you look at a file inside a folder on your Raspberry, you're being tricked. I say that because it's visually presented to you as ONE DISCRETE THING. That's done on purpose – because psychologically, it's much easier for a human to think of a single file as a single thing. Why? Because that's how things work in everyday life! For example, if you're looking around your home for a physical piece of paper with a bunch of text on it, that piece of paper exists as a single physical object. It certainly doesn't have a ghost-like "trinity" existence where different aspects of the paper exist in three different realms. And you certainly don't need a "filesystem" to retrieve and combine those different ghosts – real ghosts and metaghosts – before the piece of paper finally materializes!

Yet that is EXACTLY the nature of every file on your Raspberry!

PARTS OF A FILE: In simple conceptual terms, every file on your Raspberry has 4 separate "parts":

File Name: This is stored inside a directory (which itself is a special kind of file, even though it presents itself as a folder). The file name is a form of metadata.

File Path: This is defined by its very presence inside a directory.

File Properties: These are stored inside the "inode" (a special kind of data "structure"). The contents of the inode are also a form of metadata.

File Data: This is the actual contents of your file. The data is broken up into pieces and stored inside a series of "data blocks".

NOTE: Technically speaking, the path of a file – in other words, the "folder" it's located in – such as /home/pi or /home/pi/downloads – is NOT actually part of the file. If you're in the living room right now, that's simply your location – it's not actually part of you. It's simply where you happen to be! But for the purpose of this tutorial, that distinction unnecessarily confuses things without adding any real value. So I'm unilaterally declaring the path of a file to be part of the file! In fact, I'll refer to it as part of the file's metadata – even though it's not really that either.

METADATA: Let me point out an important concept. The so-called "file" is actually 3 parts metadata and only 1 part data! The file's name, path, and properties are all forms of METADATA. Only the data is the actual data! In other words, if you have a picture of a tree stored on your computer that's called Tree.jpg, the file's name has absolutely nothing to do with the photographic depiction of the tree itself. In other words, the file name contains no photographic data. So that's why we call it "meta" data – because it's a special kind of data ABOUT the "real" data! Same thing goes for the path: The specific location of the file – the folder it happens to be in – has absolutely nothing to do with the appearance of the tree! Same thing for the file's properties – such as who "owns" the file or what its "permissions" are. Those things also have nothing to do with what the tree actually looks like. Those elements are still very important – but they are all "meta" to the data. They are metadata! Think of metadata as the shipping label on the outside of a box. It may have all kinds of information about the contents of the box – but it is definitely NOT the contents of the box! The stuff inside the box – the contents of the box – is the actual data!

DATA: A file's data can be thought of as the file's CONTENTS – as opposed to its metadata. But what about the data – the actual pixels of brightness and color that capture the appearance of the tree? That also is an illusion of sorts – because that data is not stored in one place. It's also not even "inside" the "file" – because as I just explained, there's no such thing as a "file" in any normal sense of the word! Instead, the tree's picture data is broken up into separate chunks called "data blocks". The inode, among other things, contains a means of pointing to the locations of these blocks. When you open a file to look at its contents, the data in those blocks are assembled to present a cohesive image.

EXPLAINING TIMESTAMP BEHAVIOR: Now that we've established the basic concept of a "file", we can finally turn our attention to the timestamps and explain their behavior – why they change or don't change under different circumstances! Raspbian Linux records a total of 4 timestamps for almost every file: Access Time (atime), Modification Time (mtime), Change Time (ctime), and Creation Time (crtime). The first 3 timestamps are revealed by the standard stat command. For example, running "stat Tree.jpg" in Terminal will tell you those timestamps. But if you'd also like to see the Creation Time – which is completely hidden – please see my other tutorial on that subject.

But when it comes to interpreting timestamp behavior, I've now authored an extremely comprehensive chart. The chart, which appears below, explains everything you need to know. It's got a lot of important details, so be sure to open it in a new tab and look at it full size – preferably on a 1080p monitor. Once you've fully absorbed my chart and the implications of the atime mount settings – which I've explained further below – you can probably consider yourself a timestamp expert!
Timestamp_Events_Linux_ext4_RPi_Mike.png
Timestamp_Events_Linux_ext4_RPi_Mike.png (91.25 KiB) Viewed 1036 times
To view this image at full resolution, right-click and select "open image in new tab" – or on phones and tablets, "tap and hold" and save it to your pictures for full-size viewing.

ATIME, NOATIME, AND RELATIME – PARTITION MOUNT SETTINGS EXPLAINED: Consider this an optional reading – it's for those who want a complete understanding of atime recording behavior. As my first example, let's assume the partition on your storage device is configured to mount with a full "atime" setting – meaning that access times are recorded EVERY SINGLE TIME you or your computer opens or "accesses" a file. If your partition has that setting, the atime behavior is easy to explain: If you or your computer opens up the CONTENTS of a file, that means the file was "accessed". As a result, an atime event will be recorded. In other words, the atime timestamp will be updated.

However, very few Linux computers have an "atime" setting as their default partition configuration. That's because it would force data to be written to a file every single time it's merely accessed or looked at. This can slow a computer down dramatically by greatly increasing the input / output activity for no particularly good reason. It also increases the wear and tear on your storage device.

As a result, a "noatime" setting is often used for partitions. In fact, on my NOOBS-based copy of Raspbian Stretch, I noticed that the Raspberry Pi Foundation chose a default setting of "noatime" for the internal SD card's "root" partition – the main partition most people actually use (it's where /home/pi is located, for example). That means atime recording is completely switched off with just one exception: At the moment a file is created, the system counts that as an "access" event. In effect, therefore, with a "noatime" setting, a file's creation time becomes the file's permanent "access" time.

Perhaps the most popular setting for partitions is the "relatime" setting, which indicates "relative" timestamping behavior. It's often seen as a good balance between the "overkill" of the full atime mount setting and the completely "dead" noatime setting. Relatime means your system will only update the access time in a manner that's "relative" to other factors. In other words, it will only bother updating the access time if one of two specific conditions are met:

A: The modify time (mtime) or change time (ctime) is NEWER than the last access time (atime)

OR:

B: The last access time (atime) is more than 24 hours old

If neither one of the above criteria are met then the access time will simply NOT be updated – even if you personally accessed the file just 5 seconds ago! Your system will pretend it never happened! In effect, it will act as though it's operating under a noatime mount setting.

Although my SD card's main partition uses a noatime setting by default, I noticed that when I did a standard formatting of a USB thumb drive – using fdisk to create a Linux ext4 filesystem – the default setting was automatically placed on "relatime"!

NOTE: To find out the mount settings for your various partitions, run this in Terminal:

cat /proc/mounts

For your internal SD card's main partition – the one where /home/pi is located – you will probably see this output:

/dev/root / ext4 rw,noatime,data=ordered 0 0

Be aware that the beginning of this line, "/dev/root", is a quirk of Raspbian: It is not an explicit expression of the partition's device path. Instead, it's more like an "alias". If you have NOOBS, "/dev/root" actually means /dev/mmcblk0p7. If you have "pure" Raspbian, it means /dev/mmcblk0p2. Most other versions of Linux do not do this. Anyway, as you can see, my main partition has a "noatime" setting. I did not set this – it was the default configuration that came with NOOBS-based Raspbian. That means the recording of access times is completely switched off.

However, my USB thumb drive says this:

/dev/sda1 /media/pi/THUMB_DRIVE ext4 rw,relatime,data=ordered 0 0

As you can see, my thumb drive fully supports "relatime" – so access times will be updated under the above two conditions (A or B).
Last edited by RPi_Mike on Sun Jun 24, 2018 9:17 am, edited 1 time in total.

jahboater
Posts: 2858
Joined: Wed Feb 04, 2015 6:38 pm

Re: TUTORIAL: The True Nature of a Linux File – And its Timestamps

Sun Jun 24, 2018 7:25 am

Very interesting - great explanation of noatime and relatime.

I think of files/directories conceptually as:

The i-node is a "structure" containing the meta-data, and pointers to the actual data (it is not a file).
These i-nodes are kept in a table.
Each i-node is referred to by its index in the table - a simple number.
A directory is the mapping between filenames and i-node table index's, that is, a simple list of pairs (names/inode number).

A directory is a file with its own i-node number (and therefore may be referred to by other directories - forming the tree).

A directory may have as many entries as wished with the same i-node number. That is, a file may have multiple names (called links).
You can delete one of the entries and it will have no effect on the others. Only when the last entry with that i-node number is deleted are the actual i-node and data removed.
Further, multiple directories may have entries with the same i-node number.

Actually beautifully simple.

For interest, modern file systems (ext4) have "extents" - the i-node does not contain pointers to blocks scattered all over the disk. It will contain a pointer to the first block and a count of the number of blocks. This requires much less space in the i-node structure and means that the data is contiguous (great for speed).
There may be multiple extents, depending on how the file was written.
See "filefrag -v" <filename>

RPi_Mike
Posts: 85
Joined: Sat Dec 09, 2017 12:57 am
Location: United States

Re: TUTORIAL: The True Nature of a Linux File – And its Timestamps

Sun Jun 24, 2018 9:28 am

jahboater wrote:
Sun Jun 24, 2018 7:25 am
Very interesting - great explanation of noatime and relatime.

I appreciate your feedback. To the other readers out there, jahboater is definitely a "top 10" poster in terms of his technical sophistication. Given this website has nearly a quarter-million registered members, that literally puts him above the 99.99 percentile.

As I mentioned in my tutorial, my goal was to keep things at a mostly conceptual level that a large number of people could actually understand – not to provide a "thoroughgoing, technically complete description of the entire filesystem."

Nonetheless, I have now changed my textual reference of "pointers" to the more generic expression "means of pointing". In my research, I saw exactly what you're alluding to – that the ext4 filesystem uses a nested hierarchy of pointers involving direct blocks, indirect blocks, double indirect blocks, triple indirect blocks, etc. – but I thought that was wayyyyyy too complicated given the "file-centric" goals of my tutorial. So I just went with "pointers"!

I've also changed my passing reference to the inode as a file to the more abstract (but technically accurate) concept of it being a "structure".

Glad you liked my discourse on noatime and relatime. My timestamp chart is the real "meat" of my tutorial – so hopefully it will let people understand the seemingly mysterious behavior of atime, mtime, ctime, and crtime!
Last edited by RPi_Mike on Sun Jun 24, 2018 10:03 am, edited 1 time in total.

jahboater
Posts: 2858
Joined: Wed Feb 04, 2015 6:38 pm

Re: TUTORIAL: The True Nature of a Linux File – And its Timestamps

Sun Jun 24, 2018 9:41 am

RPi_Mike wrote:
Sun Jun 24, 2018 9:28 am
I saw exactly what you're alluding to – that the ext4 filesystem uses a nested hierarchy of pointers involving direct blocks, indirect blocks, double indirect blocks, triple indirect blocks, etc. – but I thought that was wayyyyyy too complicated given the "file-centric" goals of my tutorial. So I just went with "pointers"!
"pointers" is fine I think.
(They were of course just block numbers, but now I presume they are a block number and a count).

For interest ...
Looking around my disk I cant find any regular file that has more than one or two pointers, even files with many megabytes size.

Code: Select all

[email protected] ~ $ filefrag -v makefile
Filesystem type is: ef53
File size of makefile is 12166 (3 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..       2:    9063523..   9063525:      3:             last,eof
makefile: 1 extent found
One pointer for a three block file.

Code: Select all

[email protected] ~ $ filefrag -v glibc-2.27.tar.xz 
Filesystem type is: ef53
File size of glibc-2.27.tar.xz is 15395316 (3759 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..    2047:    3606528..   3608575:   2048:            
   1:     2048..    3758:    3638192..   3639902:   1711:    3608576: last,eof
glibc-2.27.tar.xz: 2 extents found
You can see that this last one has two pointers for 3759 blocks. Older file systems like ext3 would have had 3759 pointers and possibly indirect ones as well. That is, a pointer to each individual block. Horrific but it worked for decades!

The nested hierarchy must be for truly monstrous files ...

Heater
Posts: 9719
Joined: Tue Jul 17, 2012 3:02 pm

Re: TUTORIAL: The True Nature of a Linux File – And its Timestamps

Sun Jun 24, 2018 12:29 pm

What I find amazing is that there is even a need to explain, in layman's terms, what a "file" or "directory" is on a computer.

Computers and their operating systems have had the notion of files and directories etc since forever.

The general public has had access to computers since the microprocessor arrived on the scene in the late 1970's. Not long after that every kid knew about "files".

Then came the IBM PC and MS-DOS which was all about "files". That is pretty much all it did.

What on Earth happened to cause our whole civilization to forget such a thing?

As it happens, I have never paid much attention to the various timestamps an OS might maintain on it's files. They are kind of useless. They don't survive moving the file from place to place, OS to OS, etc.

jahboater
Posts: 2858
Joined: Wed Feb 04, 2015 6:38 pm

Re: TUTORIAL: The True Nature of a Linux File – And its Timestamps

Sun Jun 24, 2018 1:58 pm

Heater wrote:
Sun Jun 24, 2018 12:29 pm
As it happens, I have never paid much attention to the various timestamps an OS might maintain on it's files. They are kind of useless. They don't survive moving the file from place to place, OS to OS, etc.
"ls -lt | head" is pretty handy to find recently modified files in a large directory

RPi_Mike
Posts: 85
Joined: Sat Dec 09, 2017 12:57 am
Location: United States

Re: TUTORIAL: The True Nature of a Linux File – And its Timestamps

Sun Jun 24, 2018 2:58 pm

Heater wrote:
Sun Jun 24, 2018 12:29 pm
What I find amazing is that there is even a need to explain, in layman's terms, what a "file" or "directory" is on a computer.

Computers and their operating systems have had the notion of files and directories etc since forever.

The general public has had access to computers since the microprocessor arrived on the scene in the late 1970's. Not long after that every kid knew about "files".

What on Earth happened to cause our whole civilization to forget such a thing?

What a gross mischaracterization of my tutorial! And what a flight of historical fantasy!

The great majority of people, both 20 years ago and today, understand a "file" and "directory" exactly the way it is visually presented to them – as the electronic equivalent of a piece of paper inside a manila folder.

To suggest that most people understand the true nature of these things – that they're aware of the underlying structure and that they know it's all just a carefully constructed illusion – doesn't even pass the giggle test.

To suggest that my in-depth exploration of timestamps – and its relation to the true nature of a Linux file – somehow amounts to a Kindergarten lesson is just... mind boggling!

I'm aware that you're also at the 99.99 percentile, along with jahboater.

He, however, took the time to absorb what I actually wrote and clearly appreciated it.

You, however, were probably so appalled by the perceived absurdity of my title, that you quickly skimmed through it and saw the whole thing as a joke.

Steven Pinker, the eminent psychologist at Harvard, recently wrote an entire book about "the curse of knowledge". Part of this "curse" is that many highly intelligent people – especially those who excel in technical fields like computing – simply fail to grasp a very basic concept: Out of 10,000 people, 9,999 of them are not going to know what they know! Despite their great intelligence, many therefore struggle to communicate effectively – because part of the "curse" is that everything is sooooo "obvious" to them.

If their underlying personality is pleasant, it just means they're incapable of writing a quality tutorial that most people would understand.

If they're not so pleasant, however, they find themselves seduced by the idea that almost everyone else is worthy of contempt.

Heater
Posts: 9719
Joined: Tue Jul 17, 2012 3:02 pm

Re: TUTORIAL: The True Nature of a Linux File – And its Timestamps

Sun Jun 24, 2018 3:30 pm

RPi_Mike,
What a gross mischaracterization of my tutorial! And what a flight of historical fantasy!
I did not mean to mischaracterize anything. You were talking about:

'keep things at a mostly conceptual level that a large number of people could actually understand – not to provide a "thoroughgoing, technically complete description of the entire filesystem."'

I said "...in layman's terms..."

I think we are on the same page there.
The great majority of people, both 20 years ago and today, understand a "file" and "directory" exactly the way it is visually presented to them – as the electronic equivalent of a piece of paper inside a manila folder.
I'm glad to hear it.
To suggest that most people understand the true nature of these things – that they're aware of the underlying structure and that they know it's all just a carefully constructed illusion – doesn't even pass the giggle test.
Why are you giggling? I did no suggest that. I did say "...in layman's terms...".
To suggest that my in-depth exploration of timestamps – and its relation to the true nature of a Linux file – somehow amounts to a Kindergarten lesson is just... mind boggling!
I did not suggest that either.
He, however, took the time to absorb what I actually wrote and clearly appreciated it.
I do appreciate it. There are things in there I did not know.

Yes, yes, we are all aware of "the curse of knowledge". If you ever follow my replies to questions here I try to pitch them at what I perceive as the level of knowledge of the poster.

No, my question was more general. I find myself in a world where people, young and old, ask me the most basic questions about using a computer everyday. I just wonder where have these people been, given that computers have been mixing with us humans since 1976 or so.

RPi_Mike
Posts: 85
Joined: Sat Dec 09, 2017 12:57 am
Location: United States

Re: TUTORIAL: The True Nature of a Linux File – And its Timestamps

Sun Jun 24, 2018 5:15 pm

Heater wrote:
Sun Jun 24, 2018 3:30 pm
Why are you giggling? I did no suggest that. I did say "...in layman's terms...".

Let me remind the readers of what actually happened.

Your very first sentence on this thread said the following:

"What I find amazing is that there is even a need to explain, in layman's terms, what a "file" or "directory" is on a computer."

If you had written that in a poem five years ago, that would be one thing.

But everything comes down to CONTEXT: You wrote that as your very first reply to my tutorial! So it's reasonable to think your words might have something to do with... uh... my tutorial??

Any fair-minded reader would interpret your first sentence as (1) This guy is actually telling people what a "file" and "directory" is! [Even though my presentation was several levels above anything so simple.] And (2) How sad is that – that there is even a need to explain such ridiculously simple things! [Even though, again, what I was actually explaining went wayyyyy beyond anything so simple.]

You then used that as a launching pad to spin your tale of the 'decline and fall of the Roman Empire' – that people are getting dumber than ever when it comes to computers. Again, with my tutorial acting as the backdrop the entire time.

NOTE TO LITERALISTS: My reference to Rome is called "using my own adjectives and analogies to accurately and fairly describe what I'm hearing." It does not mean the original poster said anything about Rome – just as he had not used the word Kindergarten. Just as my reference to the "giggle test" did not mean I was giggling (which, ironically, IS an unfair, inaccurate distortion of what I said).

Then you ended by saying how "useless" timestamps are – even though more than half of my tutorial dealt with... uh... timestamps??

The bottom line is that my reaction was not even slightly off-base. If we fed your first post into a completely neutral semantic and syntactic analyzer, it would conclude that your tone was entirely non-positive from start to finish.

Keep in mind that an actual human being spent several hours creating that professional-quality tutorial as a FREE contribution to the Raspberry Pi community. Did you have even one positive or constructive thing to say in your initial post?

No, you did not.

Heater
Posts: 9719
Joined: Tue Jul 17, 2012 3:02 pm

Re: TUTORIAL: The True Nature of a Linux File – And its Timestamps

Sun Jun 24, 2018 5:31 pm

OK, whatever it was I wrote has not been received the way I intended. Clearly I did not express myself well enough. It certainly was not intended as a slight of you or your tutorial here.

Heater
Posts: 9719
Joined: Tue Jul 17, 2012 3:02 pm

Re: TUTORIAL: The True Nature of a Linux File – And its Timestamps

Sun Jun 24, 2018 5:34 pm

RPi_Mike,
If we fed your first post into a completely neutral semantic and syntactic analyzer, it would conclude that your tone was entirely non-positive from start to finish.
Out if curiousity, do you have such a thing or a link to one? I'd like to try it out.

User avatar
scruss
Posts: 1774
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: TUTORIAL: The True Nature of a Linux File – And its Timestamps

Sun Jun 24, 2018 8:00 pm

Okay, very nicely done, but: does it matter that the file/folder paradigm is an “illusion”? Computers are built on abstractions, and a filesystem is one of many. What's more interesting to me are the various ways that filesystems can be implemented, all with various strengths and weaknesses, and yet all (nearly all?) can be mapped to the file/folder concept. I can't think of any system that uses flat files any more, though I'm sure there must be some.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

ejolson
Posts: 1842
Joined: Tue Mar 18, 2014 11:47 am

Re: TUTORIAL: The True Nature of a Linux File – And its Timestamps

Mon Jun 25, 2018 6:04 am

scruss wrote:
Sun Jun 24, 2018 8:00 pm
I can't think of any system that uses flat files any more, though I'm sure there must be some.
The built-in file manager on my Android mobile flattens the filesystem and displays all pictures in one list, all audios in another, all videos, all zip files, all apps and finally a list of all documents without regard for where they are in the hierarchical filesystem. I think some mp3 music players do the same thing, but maybe allow sorting on additional metadata such as album, artist and year, provided anyone bothered to specify those things.

I asked the person who made this tutorial in a different thread whether they were planning to write a book. The answer was no, but it seems to me that the writing style and topics could make a good book, especially if the stuff about Harvard psychologists is omitted.

The members of this forum live in many different countries. Even if they know English well, they write and speak differently with words and phrases whose meanings reflect different cultures. Body language can also be confusingly different between cultures. Without body language to rely on, even two people from the same culture may have difficulty distinguishing the attitudes and feelings behind written words such as those which appear on this forum.

RPi_Mike
Posts: 85
Joined: Sat Dec 09, 2017 12:57 am
Location: United States

Re: TUTORIAL: The True Nature of a Linux File – And its Timestamps

Tue Jun 26, 2018 12:44 am

scruss wrote:
Sun Jun 24, 2018 8:00 pm
Okay, very nicely done, but: does it matter that the file/folder paradigm is an “illusion”? Computers are built on abstractions, and a filesystem is one of many.

On 99 out of 100 levels, the "illusion" of a file does NOT matter. For the great majority of people, it has no practical implication.

But there is one giant exception: file creation time – crtime!

Some might think "who cares?" But it is what it is – it happens to be the rabbit hole I'm exploring right now.

Without understanding the "true nature" of a Linux file and its illusions, it would be impossible to grasp what "creation time" actually means. That's why I felt it necessary in my tutorial to first explain the "trinity" of the "file" – the file's name, the file's data, and the file's inode. And yes, there's also the file's path – though as I explained, that's not technically "part" of a file. Understanding all of this is absolutely necessary to have a complete grasp of timestamps.

Once you have that understanding, a kind of false labeling is revealed – that file "creation time" is NOT necessarily "file creation time". It's actually "inode creation time"! Of course, if the file still happens to be on the same partition as when you first created it, it actually would be the correct "creation time" – but only by coincidence!

In fact, I would argue that all of this reveals a conceptual gap in the ext4 filesystem. It's not a bug – but at the very least, it reveals the overly "filesystem-centric" attitude of ext4.

Most computer scientists would agree that in an ideal world, a filesystem should be the SERVANT OF DATA, not the other way around. In other words, it should be as unobtrusive as possible and fully "respect the data" – both the file's content data and, whenever possible, its metadata. After all, a filesystem is only a means to an end. Therefore, "the data" should not be forced to conform to the filesystem unless there's a very good reason.

Let me prove my point with a little story:

When did work begin on The New York Times bestseller, The Philosophy of Mike?

[Don't bother looking it up on Amazon – this is a thought experiment!]

The answer is January 1, 2015. That's the day I started working on my book and saved the file for the very first time. That was my book's original "creation time"!

NOTE ON THE SUBTLE, BUT IMPORTANT, CHOICE OF TERMINOLOGY: I honestly don't like the term "creation time" for any file-related purpose. Why? Because it raises a fundamental question about the ultimate meaning of "creation". Is a book "created" when you only completed the first page and then saved the file for the first time? Or is it only truly "created" when you completed the book and saved the file for the very last time? In the real world, almost everyone would agree that a book isn't truly "created" until the book, in its final published form, actually exists! Think about the famous Michelangelo sculpture, David. Most people would agree that David wasn't "created" when all that existed was his big toe. That literally was NOT David – it was simply a toe! Perhaps it was the beginning of the David "origin story", but it certainly wasn't David! So by any meaningful standard, David was not "created" at that point. But if it were based on the Linux ext4 notion of "creation", David was indeed "created" when his toe first appeared! As a result, I prefer the term "BIRTH" to sidestep the misleading implications of "creation" time. The term "birth" is conceptually superior and closer to reality because it implies that it's "just the beginning" of the file – just as a person's final adult form is NOT "created" at the moment of birth. In the real world, as a baby grows to a child and then adulthood, it's continuously "modified" – exactly what happens to a typical file as you work on it over a period of months or years. But knowing the birth time is still essential if you ever want to answer a very basic question: "hmmmmm... when did I first start writing my book?" That, to me, is asking when the book was "born". It wasn't completed or "created" at that point – but it was born!

OK, so back to my story...

Because I'm such a speedy writer, I completed the entire book only a month later – on February 1, 2015. Hence, that was the "last modification" time.

Then, a month later on March 1, 2015, I opened the file, hit the print button, and placed the printed copy in my bookcase. That would be the "last access" time.

Then, three years later in 2018, a neighbor of mine asked if he could borrow my printed copy and keep it in HIS bookcase for a while. I said "no problem dude – anything for a neighbor!"

A few months later, he returns the book to me and I happen to notice that it now says "Creation Date: June 25, 2018".

How could that be? Did he somehow rewrite my book and create a new philosophy? It's a 2015 book and I personally wrote it – so I know it wasn't created in 2018. To resolve the mystery, I use an OCR scanner to capture all "the data" – both the raw data contents of the book AND the "metadata" of the book's cover.

After running a binary compare, I confirm that all "the data" has remained unchanged – except for the creation date!

So I confront him about this oddity.

He then explains to me that the entire town has slipped into a parallel universe. Everything's the same except for one thing: All bookcases now have a hidden "booksystem". He explains that just as computer files require a "filesystem" for storage, bookcases now require a "booksystem". He then explains the hidden architecture of this booksystem. He says that one of the core components of the booksystem is the "bnode". He says it's just like the inode on Linux systems – except the "b" stands for book!

Apparently, whenever a book is placed inside a new bookcase – the equivalent of a new partition – the book must be stripped of its original "creation time" and reassigned a new, arbitrary "creation time" that's based on the arbitrary time that someone placed the book in the new bookcase!

He says this must happen because that's simply the nature of the booksystem – the book must be assigned a new bnode!

I tell the guy I don't care what the "internal logic" of this ridiculous booksystem is! Everyone knows that my famous philosophy book was created in 2015 – not 2018. But because he drank the Kool-Aid of the new universe, he insists that it makes perfect sense – because all that matters is the date of "bnode creation"!

I tell the guy that's insane – but then he claims that it's a "technical limitation" of the bnode system.

First of all, WHO CARES! If that indeed is a technical limitation of the booksystem, then all that means is that the booksystem itself is bogus and needs to be fixed!

But even that doesn't make sense. You see, before I handed it to my neighbor, I used 2 separate ink stamps to place a "last accessed" and "last modified" timestamp on the "metadata" of the outside cover.

Both the atime and mtime timestamps had been completely preserved – exactly the same thing that happens on a Linux system when you move a file from one partition to another! Both timestamps still said "2015". [Yes, that's right – it means the neighbor just wanted to impress his friends by displaying my book in his book case. He never actually opened it or read it.]

This definitely reveals a conceptual flaw in the booksystem that CANNOT be excused by any technical explanation. Why? Because the preservation of the atime and mtime metadata proves that the booksystem IS capable of transferring metadata from the bnode of my bookcase to the bnode of my neighbor's bookcase!

Now, back to the world of Linux:

Maybe there is some value in recording the inode creation time – I'm not necessarily against that if there's at least some valid "use case".

But if there is a valid use for it, it should not be given a misleading label. Instead, if it does have some value, I propose that it be called itime – "inode creation time" – the time the inode on the current partition was "created" to accept the new file. Sound familiar? It is! It's exactly what crtime means today!

For backward compatibility, if there is a future Linux filesystem such as ext5, the current crtime metadata could be mapped to itime. In other words, itime in the future would simply mean what crtime means today.

And then there would be "birth time" – btime – the time when the file was originally "born". In other words, btime would be when the newborn file was first saved by the author on its original partition. It would always answer a very basic question at any time in the future, no matter what partition it found itself on: "When did the author start working on the book – when was the file first born?"

In case Theodore Ts'o or some other Linux luminary stumbles across this post some day, here is my proposal for how the stat command could behave in a future ext5 filesystem – and how it compares to the current ext4 implementation.

NOTE 1: The stat command currently displays atime as "Access" and ctime as "Change". To remain consistent with that convention, I've displayed btime as "Birth" and itime as "Inode".

NOTE 2: The timestamps in the following graphic are based on the fanciful thought experiment you just read. It's what the stat command would generate AFTER my neighbor moved the book file to his "partition". In other words, this is what HIS system would say when he ran the stat command on my book's file. Except this version is my ext5 proposal – we are no longer in the bizarre parallel universe – so the timestamps actually make sense. Birth is btime – the time I first started working on my book and initially saved the file. And Inode is itme – the time my neighbor moved the file to his system (and thus the time when a new inode was created on his computer). Note also that Change Time (ctime) also reflects the time my neighbor received the file – since any movement of a file is also considered to be a change in the file's status (as is currently the case with ext4):
Linux_ext5_Timestamp_Proposal_RPi_Mike.png
Linux_ext5_Timestamp_Proposal_RPi_Mike.png (55.19 KiB) Viewed 722 times
To view this image at full resolution, right-click and select "open image in new tab" – or on phones and tablets, "tap and hold" and save it to your pictures for full-size viewing.

PS: For those reading this in the present day, check out my implementation of the "xstat" command – it provides everything the stat command currently does PLUS "creation time" – which as you know, at the moment, really means "inode creation time"!

ejolson
Posts: 1842
Joined: Tue Mar 18, 2014 11:47 am

Re: TUTORIAL: The True Nature of a Linux File – And its Timestamps

Tue Jun 26, 2018 4:05 am

RPi_Mike wrote:
Tue Jun 26, 2018 12:44 am
here is my proposal for how the stat command could behave in a future ext5 filesystem.
Would it be possible to store birth times as extended attributes?

User avatar
scruss
Posts: 1774
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: TUTORIAL: The True Nature of a Linux File – And its Timestamps

Wed Jun 27, 2018 2:57 pm

Thanks for the explanation, RPi_Mike. I can see where you're coming from.
RPi_Mike wrote:
Tue Jun 26, 2018 12:44 am
Most computer scientists would agree that in an ideal world, a filesystem should be the SERVANT OF DATA, not the other way around. In other words, it should be as unobtrusive as possible and fully "respect the data" – both the file's content data and, whenever possible, its metadata.
Filesystems primarily respect the contents of files. If you need metadata to stick around, you have to write it to the file itself. Anything not written in the file will go the way of Classic Mac OS resource forks, AmigaDOS file comments and the writerly intent of deceased authors.

Your choice of OpenDocument Text (.odt) file in your thought experiment was a good one: you know that they have a very robust creation date attribute that even survives being written as a docx file, edited on a Windows machine, then reopened in LibreOffice? OpenDocument Text files are really zip archives of XML files. Stored in the meta.xml file of every odt file is an entity meta:creation-date that stores the timestamp of when the document was created, before it's even saved as a file. This field makes it through to a docx file (which are also zip files of XML documents) as the dcterms:created entity in the file docProps/core.xml.

Since ext4 is very much in the minority in storing creation time, if that field is valuable to you, then find and use file formats that retain it explicitly. Getting Ted to add your proposed fields to ext5 might not be enough, as even he wrote in 2006 that “… it may be quite a while before [the creation time field] will be easily available to user programs …”

Even if the information is written to a file perfectly, format obsolescence can still get in the way. Ask me how much fun it was to try (and currently fail) to recover the code for a half-remembered flight simulator that was stored in an archaically compressed (sq) tokenized MBASIC (MS's basic for CP/M, the 8-bit predecessor to MS-DOS) file in a Commodore 64 emulator disk image format (d64) specially adapted for running on a C128's CP/M mode …
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

Heater
Posts: 9719
Joined: Tue Jul 17, 2012 3:02 pm

Re: TUTORIAL: The True Nature of a Linux File – And its Timestamps

Wed Jun 27, 2018 3:41 pm

RPi_Mike,
Most computer scientists would agree that in an ideal world, a filesystem should be the SERVANT OF DATA, not the other way around. In other words, it should be as unobtrusive as possible and fully "respect the data" – both the file's content data and, whenever possible, its metadata.
Can you link us to a quote of even one computer scientist who has ever expressed such an opinion?

I want a file system to preserve my data as best it can. Be that one single byte or billions. Of course it had better give me a filename or some other identity such that I can get my data back again.

Any meta-data saved with the file may be a bonus. Generally not much use beyond knowing the file size. And if it's an executable or sensitive information I need meta-data that tells me who can run it on my system or read/change it's content.

Afterall, when you download some data from a file I have on my web server or whatever, all that meta-data is gone. You just get the bytes I put in the file.

Meta-data is a bottomless pit. You talk about creation and modification times etc. What about, who created it? What were the geo-coordinates of its creation? What program created or modified it? How many versions of it have there been over time? Who made those edits, where, when? And so on. And so on.

If I'm serious about a file's meta-data, for example when dealing with a projects source code, I use a version management system. For example git. Then all that lovely meta-data is stored in the repo forever and travels with the repo when anyone clones it here and there.

Storing such change information in the file format itself is great and all. Until two people edit the same file thus producing two new versions that may disagree. At which point they start to realize they need git :)

And, well, storing such meta-data in the file format itself is a really bad idea. What if you don't want the recipient of your document to know what was in the old versions? This "feature" has gotten people into hot water many times.

User avatar
scruss
Posts: 1774
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: TUTORIAL: The True Nature of a Linux File – And its Timestamps

Thu Jun 28, 2018 1:36 am

Heater wrote:
Wed Jun 27, 2018 3:41 pm
And, well, storing such meta-data in the file format itself is a really bad idea. What if you don't want the recipient of your document to know what was in the old versions? This "feature" has gotten people into hot water many times.
But most times, you do want that preserved. The same would happen if someone cloned one of your git repos: all old changes can be dug up. Git doesn't even capture a useful subset of Dublin Core, a metadata standard I used to use a lot.

The legal world manages fine without git, and even MS Word's basic track changes feature makes diffs look like a toy.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

RPi_Mike
Posts: 85
Joined: Sat Dec 09, 2017 12:57 am
Location: United States

Re: TUTORIAL: The True Nature of a Linux File – And its Timestamps

Thu Jun 28, 2018 2:05 am

scruss wrote:
Wed Jun 27, 2018 2:57 pm
If you need metadata to stick around, you have to write it to the file itself. Anything not written in the file will go the way of Classic Mac OS resource forks, AmigaDOS file comments and the writerly intent of deceased authors.

I have no issue with your excellent observations. On the topic of timestamp metadata, many of your thoughts vs. my thoughts come down to is vs. ought – how things are vs. how I might like them to be. We have no dispute on how things actually are!

NOTE TO READERS: If we zoom into the details, there are of course all kinds of metadata – ownership, permissions, timestamps, file size, etc. But if we keep things at the 40,000-foot level, there are only 2 kinds of metadata – "external" metadata and "embedded" metadata. In the case of Linux, all "external" metadata, except for the file name, is stored inside the inode – whereas "embedded" metadata is stored inside the data of the file itself. A great example of this is EXIF metadata for JPEG images (and ODT files, as scruss points out!). The advantage of embedded metadata is that it's automatically retained inside the file's data. So if you send a JPEG file from your Raspberry to someone on a Windows machine, for example, they will still receive the EXIF metadata – because it's actually inside the data portion of the file. This means that "rival" computers do NOT have to use the same filesystem technology in order for embedded metadata to survive. External metadata, however, is a very different thing – such as the timestamp metadata stored in the Linux ext4 inode.

I'm sure you and I would agree that transferring files from an ext-based filesystem to a different filesystem that does not support ext-based metadata is a lost cause when it comes to preserving non-embedded timestamps.[*] I don't see that ever changing, and I don't really have a problem with it. After all, it seems hopelessly unrealistic that there will ever be an "international treaty" that requires all rival filesystem technologies to handle external metadata in the same way. [*] Note: This does not include the ability of some data-transfer software, such as youtube-dl, to retrieve a remote filesystem's timestamp metadata and then incorporate that into the newly-downloaded file's metadata (that feature, however, is provided by the software – not the filesystem itself).

However, when it comes to moving files from one ext4 partition to another ext4 partition – even when it involves two physically separate storage devices – atime and mtime are completely preserved. This demonstrates that it is indeed possible to transfer external timestamp metadata from one Linux system to another Linux system. In fact, it's not only possible – it happens all the time! This of course is because both partitions are using the exact same filesystem technology – so it's clearly performing a partial "inode to inode" metadata transfer in addition to the transfer of the file's raw data. If that wasn't happening, the metadata would not be preserved.

Therefore, I feel my "proposal" is quite modest. In other words, in addition to atime and mtime preservation – which has already been a reality for many years – I'm simply suggesting that the file's "birth time" (btime) should also be preserved. And yes – it would be nice if crtime were given a non-misleading label in the future – such as itime (inode creation time) instead of crtime (creation time) – to reflect that it's not really the file's "creation time", but is actually "the time of most recent inode creation".

Let me explicitly mention to the readers, by the way, that I'm under no delusion whatsoever that my so-called "proposal" will have any impact on any future system. I simply felt like "going off" on this particular topic on this particular week!

Finally, on the subject of format obsolescence, I agree completely. 50 years from now, for unknown cultural, technological and other reasons that none of us can anticipate, there may be very few ext-compatible computers in existence. That would certainly cause serious problems for those who wish to read our file-related activities of today – data or metadata!

Return to “General programming discussion”

Who is online

Users browsing this forum: No registered users and 4 guests