justinvoss
Posts: 3
Joined: Tue Jun 11, 2019 9:14 pm

Changing TTBR1_EL1 while memory caching enabled?

Tue Jun 11, 2019 10:02 pm

I'm working on writing a kernel for my Raspberry Pi 3 (Model B), and things are going fairly well, except that I cannot figure out how to swap the TTBR1_EL1 register without causing an exception when memory caching is enabled.

Here's my setup:

  1. My kernel is an ELF file, which expects to be loaded in the higher-half (i.e. the entry point of the ELF is at 0xffff800000100000)
  2. In order to load the kernel code where it expects to run, I have a "bootloader" that runs first, parses the ELF headers, and configures an initial set of page tables that loads the kernel into the higher-half, then jumps into the kernel entry point. (This bootloader and the kernel ELF are concatenated together to form the kernel8.img that's actually booted)
  3. The kernel then creates it's own set of page tables that maps itself into the higher-half (this winds up basically creating a copy of the existing page tables, but in memory that's controlled by the kernel, not the bootloader)
  4. The kernel then sets TTBR1_EL, causing the MMU to use the new, kernel-defined tables.
This worked fine when I had all of memory marked as Device nGnRnE, but that caused things to run slowly (and things got really slow when I started writing text to the framebuffer).

So I read about MAIR_EL1 and how I should probably be setting most memory to be Normal, but leave the peripheral IO area as Device nGnRnE. I modified the bootloader to do that, and things do in fact run much faster, but now when the kernel attempts to set TTBR1_EL1, the CPU thinks that nothing is mapped, and I get an endless loop of exceptions (since the exception handler itself is now no longer mapped).

My initial suspicion was that I was creating the new page tables incorrectly, so I added some code to walk the entire hierarchy of tables and print out the entries. Printing all the entries is slow, but just printing them caused the problem to go away!

That made me think that this is actually a caching issue, and the writes to the new pages tables haven't actually been written to RAM before the TTBR gets updated. In that case, I just needed to make sure the writes to the new pages tables were flushed before swapping the page tables, but that doesn't seem to help. Here's my most recent attempt:

Code: Select all

dsb ish           // Ensure writes to tables have completed.
msr ttbr1_el1, %0 // Set the root table.
tlbi vmalle1      // Flush TLB.
dsb ish           // Ensure TLB flush has completed.
isb               // Re-fetch instructions using new page tables.

I though the first "dsb ish" instruction would make sure that any pending writes in the cache were written out to main memory before the "msr" instruction could update the TTBR, but it doesn't seem to help. I also tried "dsb ishst", but that didn't seem to make a difference.

Here are the values of some other relevant registers:

Code: Select all

TCR_EL1   = 0x80100010
MAIR_EL1  = 0xff00
SCTLR_EL1 = 0x501805

Can someone with more experience working with the MMU help me figure out what I'm doing wrong?

LdB
Posts: 1143
Joined: Wed Dec 07, 2016 2:29 pm

Re: Changing TTBR1_EL1 while memory caching enabled?

Wed Jun 12, 2019 1:19 am

After setting all it really should just need is an "isb".
As for table the TTBR1_EL1 is the easy one because usually you have the 1:1 mapping on TTBR0_EL1 so you can just initially use a top level table all zeroed and try bringing it online. You don't actually need a table in the virtual space at all until you want to setup a virtual block, and typically you allocate them and grow the virtual space. So typically your kernel will allocate the first virtual block and then shove itself into the block.

So just a table of 512 uint64_t and aligned on a 4K boundary, zero it and you should be able to set it to TTBR1_EL1.
So create this and try setting it to your TTBR0_EL1

Code: Select all

static uint64_t __attribute__((aligned(4096))) page_table_virtualmap[512] = { 0 };
My gut feel is your TTBR0_EL0 map will be the problem not TTBR1_EL1. The fact you can traverse the table with code make me even more suspicious your 1:1 map is the problem or the virtual table overlaps the 1:1 map. If that works just allocate a block(s) for the kernel and try throwing it into it.

I extended bzt original memory sample with caching to allow synchronization primitives (LDREX/STREX)
https://github.com/LdB-ECM/Raspberry-Pi ... tualmemory
It is a little bit messy because I merged the AARCH32 onto it as well.

It covers everything you are dealing with the MAIR register value and the mapping.
That is the 5 types of memory it sets up you can see the MAIR1VAL it creates

Code: Select all

.equ MT_DEVICE_NGNRNE,	0
.equ MT_DEVICE_NGNRE,	1
.equ MT_DEVICE_GRE,		2
.equ MT_NORMAL_NC,		3
.equ MT_NORMAL,		    4
.equ MAIR1VAL, ( (0x00ul << (MT_DEVICE_NGNRNE * 8)) |\
                 (0x04ul << (MT_DEVICE_NGNRE * 8)) |\
				 (0x0cul << (MT_DEVICE_GRE * 8)) |\
                 (0x44ul << (MT_NORMAL_NC * 8)) |\
(0xfful << (MT_NORMAL * 8)) )
Last edited by LdB on Wed Jun 12, 2019 2:03 am, edited 3 times in total.

justinvoss
Posts: 3
Joined: Tue Jun 11, 2019 9:14 pm

Re: Changing TTBR1_EL1 while memory caching enabled?

Wed Jun 12, 2019 1:57 am

LdB wrote: As for table the TTBR1_EL1 is the easy one because usually you have the 1:1 mapping on TTBR0_EL1 so you can just initially use a top level table all zeroed and try bringing it online. You don't actually need a table in the virtual space at all until you want to setup a virtual block, and typically you allocate them and grow the virtual space. So typically your kernel will allocate the first virtual block and then shove itself into the block.

I am actually doing this: when the device starts up, I set up a 1:1 mapping in TTBR0_EL1, then enable the MMU. Then, as I parse the ELF headers of the kernel, I set up entries in TTBR1_EL1. That part isn't the issue.

The issue is when, after I'm already running code up in the area mapped by TTBR1_EL1, I want to change the value of TTBR1_EL1. That doesn't work if I have certain memory caching settings enabled, and I'm increasingly suspicious that it's related to having data stuck in the cache and not in RAM.

As an experiment, I changed the value I'm writing in MAIR_EL1: I originally was using 0xff00 (which caused the Normal memory to use write-back, meaning writes stay in the cache until the cache is cleaned), so instead I tried 0xbb00 (which causes Normal memory to use write-through, which means that data is written to both the cache and RAM at the same time).

That 0xbb00 value actually works (although it's noticeably slower than 0xff00) so that makes me even more confident that I need to somehow get the cache to flush it's value to RAM before setting TTBR1_EL1 for the second time. I was under the impression that a "dsb ..." instruction would cause any pending cache activity to complete, but that doesn't seem to be the case.

I wonder if I need to use some form of the "dc ..." instructions to manually clear the data cache?

LdB
Posts: 1143
Joined: Wed Dec 07, 2016 2:29 pm

Re: Changing TTBR1_EL1 while memory caching enabled?

Wed Jun 12, 2019 2:13 am

I agree with what you are saying ... it has to be cache issue.

However what you are doing is slightly strange usually you just add and delete blocks from the virtual table because you have a live kernel running in the virtual memory. I don't think I have ever reset TTBR1_EL1 from anything but the initial table on the fly because it would crash my kernel if I did that. Just shutting the kernel down so I could jump out and reset the TTBR1_EL1 virtualization is a lot of work for me for zero gain when I can just map and unmap a block of the initial table. Not saying you can't do it but just it leads to some interesting problems because nothing can run in the virtualization until the change is completed.

To me it comes across a bit like manually moving code and data to create a malloc function rather than just using allocating memory on the heapspace. You can do it .. I just don't get why.

So I am going to have to leave it with you because I don't think I can help as it is not something I have ever coded or even want to do. I do see the problem nothing can run in the virtualization until the full cache flush completes but how you do that I have no idea.

justinvoss
Posts: 3
Joined: Tue Jun 11, 2019 9:14 pm

Re: Changing TTBR1_EL1 while memory caching enabled?

Wed Jun 12, 2019 7:32 pm

I think I solved it! The trick was revealed by this sentence in the ARM Programmer's Guide:
The attributes specified in the TCR_EL1 must be the same as those specified for the virtual memory region in which the translation tables are stored. Caching the translation tables is the normal default behavior.

I think what happened here was that the TCR was telling the MMU to look in non-cached RAM, but the writes to the page tables were getting cached due to the MAIR settings. But it's possible to change the TCR to tell the MMU to also use the cache, in which case everything is in sync and the code works.

Here are the values I used:

Code: Select all

TCR_EL1  = 0xb5103510
MAIR_EL1 = 0xff00

This tells the MMU to user inner-shareability for table walks, and to use write-back for both the inner and outer caches for table walks.

LdB
Posts: 1143
Joined: Wed Dec 07, 2016 2:29 pm

Re: Changing TTBR1_EL1 while memory caching enabled?

Thu Jun 13, 2019 5:09 am

That is a hideously big virtualization space for the little PI ... 2^48 :-)

I am not sure whether to be impressed or scared with what you are doing.

Return to “Bare metal, Assembly language”