@maldus: thank you very much, I'm glad to hear my tutorials were useful to you!
To the OP: basically, when you say loading and executable, you have 4 layers to dealt with.
1. reading sectors into memory (this is done by programming the eMMC to read the SD card)
2. interpreting the file system (understanding the FAT is essential to figure out which sectors belong to the file you want to load)
3. interpreting the executable binary (creating segments filled up with contents provided in the executable)
4. creating the proper environment for the process (which could include dynamic run-time linking as well)
...and then you can jump to the entry point.
Layers 1 and 2 are covered in my tutorials, so I'll focus on 3 and 4 to answer your question.
It's important to understand that executable files are not memory images, therefore you can't load them just as-is. I'll use the phrase binary
to refer to the executable on disk, and process
to refer to the executable in the memory.
Now there are many binary formats, I'll cover only a few to help you understand. But all of them are contiguous bit-chunks, a single file content if you like.
Process in memory on the other hand consist of distinct, probably non-contigous memory areas, usually called segments. There's one for the machine code (called text), one for read-only data (rodata), initialized data (data) and uninitialized data (block started by symbol, bss in short), and one or more thread-local storage (called stack, every process must have at least one). For the 4th step it is essential to understand these segments, as the loaded code has some assumptions about them which your loader must provide.
Layer 3, file formats
This is the simplest, the raw binary contains both machine code and initialized data in the file (storing rodata and bss segments and relocation records is not possible). Consider it as a combined text+data segment image. No parsing needed, but you must load the binary at a specific location, and you should set up some system registers (like stack pointer for example). Execution starts at the very first byte of the segment. Think of a flat binary as process' properties are not stored in the file, rather agreed upon in advance. This format is used for example in kernel7.img and kernel8.img.
The original executable format
of UNIX. This is very-very simple, means the raw binary is prefixed by a small table, and suffixed by some linker information. That table (called struct exec
) contains all the necessary information, like how long are the segments, the entry point etc. Segments are then stored in a specific order in the binary. While this format is now obsolete, it's very popular among hobby OS developers for it's simplicity and because it's still supported by many compilers. In worst case you can create this structure in Assembly too with a few "dw" or "dd" instructions.
Executable and Linkable Format
is the successor of a.out. It's highly complex to support all machines and many-many different memory addressing schemes. But in short, it has a header (which includes the entry point), and a list of segments (called program header). Program header does not distinguish named segments, rather it contains flags, the file offset, memory location and segment size. Your loader should copy the contents of the ELF file to the given location. Program headers flags, like executable or read-only allows you to decide which one is the text, which one is the data etc. Those segments can be in any order in the binary, there's absolutely no restriction. The execution starts at the entry point specified in the ELF header. (Just a sidenote, this is called "program loader view". ELF also has a "linker view", which uses sections instead of segments. Section header describe the same binary as the program header, but in a different way, creating a view only needed for linkers. There's a relation: a section can be included in only one segment, but one segment may contain more sections.) This format is used by BSDs, Linux, many non-unixy OSes (like VMS), safe to say almost everything that's not a M$ product.
Windows use yet another structure to describe the process' segments. This is similar to a.out, but contains much more information. It has two headers (the MZ header at the start of the file, and the actual PE/COFF header pointed by a field in the MZ header). It is not as flexible as the ELF format, but it still can store all the segments a process would ever need. Being not as flexible is a good thing, because that also means it is not so complex, and therefore PE is easier by a magnitude to implement. Used by Windows and EFI.
Layer 4, process image
Now that we have the memory images for each segments, and we have loaded them at their corresponding locations in the memory, we're not done. You should set up memory protection too (so that data cannot be executed, and text cannot be modified for example). What's more, the machine code could be compiled for a different address, therefore text and data segment may need relocation. For this you have to know that there are two basic memory addressing modes: absolute addresses and relative addresses. Absolute addresses certainly need relocations, but relative addressing (also called position independent) only if the executable refer to external segments. That is, segments loaded from a different file, called shared objects, .so in UNIX terminology and dynamic linkable library, .dll in the Windows world. If you have everything statically linked into one binary, it's safe to assume you won't need run-time linking (resolving relocation records), because both AArch32 and AArch64 encoding use position relative addressing only. On the other hand, if the executable refers to external resources, then you must load other files as well into the process' memory and resolve the references by iterating on the relocation records and modifying bytes in the text and data segments accordingly.
Okay, we have discussed text and data segments. The rodata segment is the same as data, but you should set up write-protection (you can use the MMU for that). Text segment should be read-only too, and it should be the only executable segment. Regardless to the binary format, the bss is NOT stored in the binary at all, you have to create it by zeroing out the segment. It worth mentioning that unlike PE, the ELF format does not have a bss segment description per se. It has only one data segment, for which the file size could be smaller than the memory size. In other words, with ELF, the bss segment always have to strictly follow the initialized data segment. You MUST clear the bss segment by zeroing out that memory. Finally, it's not mandatory but recommended to zero out the stack segment too.
If you have done all of these, every segment is in memory at the specified locations and they are altered properly (relocated or zerod out), then, and only then you can jump to the entry point.