Well, that's not entirely true, malloc is not independent of MMU, not at all.
Without MMU: you use the free memory between EndOfProgram - MMIO_BASE. You have a fixed size of memory to play with, and you just record which memory area is allocated and which is free. The linked code examples are good for this.
Keywords: fixed sized memory buffer, single-level allocator
With MMU: you'll need at least two memory allocators. One is responsible for allocating and freeing page frames (4K, 64K etc., depending on your MMU configuration), often called PMM (Physical Memory Manager). The other is a higher level allocator, often called VMM (Virtual Memory Manager), able to allocate as small as 8 or 16 bytes of memory, and which starts with an empty memory buffer. This one is responsible to keep track of allocated and free memory only within this resizable buffer. When it needs memory to allocate and the buffer is not sufficient (either too small or it's full or fragmented), the higher level allocator must call the page frame allocator to allocate a frame (or more) and map that (those) at the end of the buffer. Typically this is done via a 'sbrk' or 'mmap' syscall, but in a bare metal app could be a direct function call as well. This is how dlmalloc, jemalloc and others work. Most of them never free the memory in a sense that they only record free memory in the buffer (at VMM level), but never unmap it and free the page frames (at PMM level). This is of course waste of RAM, but also means that subsequent malloc calls do not need to call PMM allocator again. It also worth mentioning that systems that implement supervisor- and user space, the PMM is always running in supervisor mode (EL1), and needs exclusive access to RAM allocation data, while the VMM could be running in user space (at EL0), probably with more instances, each using it's own buffer allocation data.
Keywords: dynamic sized buffers (one for each address space), multi-level allocators (PMM + VMM)
Now it's possible to convert a higher level memory allocator from one scenario to the other, and it's rather easy:
MMU-less allocator to MMU-aware allocator: start with an empty memory buffer, and when out of memory call 'sbrk'/'mmap' to make it grow. (Like modifying the one in Circle)
MMU-aware to MMU-less allocator: instead of empty buffer, set up a fixed size memory buffer at start (like heap_low - heap_end), and replace all 'sbrk'/'mmap' calls with displaying an 'out of memory' error message. (Like modifying dlmalloc, jemalloc etc.)
Porting dlmalloc (either in MMU-aware form as-is, or modified to MMU-less version) to a bare metal application should be straightforward, as portability is one of its main goals. As far as I can see, Circle's allocator is basicaly using very similar concepts as dlmalloc. More information on Doug Lea's allocator
(outdated, but it's a good start to understand what it does).
Now dlmalloc is a small, portable, general-purpose allocator, and of course as such it can perform really poorly under certain circumstances. Therefore most game developers hate it
, and they implement their own special allocators in their game engines. One of those is jemalloc
, roboust, fast, multi-threaded, bit harder to port to a bare metal application, but maybe needed if those special circumstances apply.
Oh, almost forgot to mention: both dlmalloc and jemalloc supports alignment.