dpotop
Posts: 78
Joined: Mon Nov 24, 2014 2:14 pm

MMU config of RPi3 - problem with ARMv8 ARM

Tue Feb 26, 2019 4:56 pm

Hello,

I try to reconfigure the MMU beyond what is done in the examples of LdB.
I try to do this by building upon an example of ARMv8 ARM, which can be found
here:
https://static.docs.arm.com/ddi0487/da/ ... v8_arm.pdf

The example is the one in section K7.1.2, fig. K7-11, page 7293.

I find the information in ARMv8 ARM to be conflicting. In fig. K7-11 I can see
the level 3 page descriptor using bits 12-16 for the OA (output address). But
in fig. D5-15, section D5.3.1, page 2445, these bits are used for something
else (field nT and a 4-bit RES0).

Please, tell me which one is correct. The example is more logical, because
bits 12-16 make the OA have the right size, whereas usign them for nT and
RES0 requires quite complex work-arounds.

Regards,
Dumitru
dpotop

User avatar
Paeryn
Posts: 2735
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: MMU config of RPi3 - problem with ARMv8 ARM

Wed Feb 27, 2019 12:09 am

dpotop wrote:
Tue Feb 26, 2019 4:56 pm
I find the information in ARMv8 ARM to be conflicting. In fig. K7-11 I can see the level 3 page descriptor using bits 12-16 for the OA (output address). But in fig. D5-15, section D5.3.1, page 2445, these bits are used for something else (field nT and a 4-bit RES0).
Diagram D5-15 is for level 0, 1 & 2 descriptor formats, the level 3 page descriptor format is diagram D5-17 on page 2448.
She who travels light — forgot something.

dpotop
Posts: 78
Joined: Mon Nov 24, 2014 2:14 pm

Re: MMU config of RPi3 - problem with ARMv8 ARM

Wed Feb 27, 2019 7:28 am

Paeryn wrote:
Wed Feb 27, 2019 12:09 am
Diagram D5-15 is for level 0, 1 & 2 descriptor formats, the level 3 page descriptor format is diagram D5-17 on page 2448.
Excellent, thanks a lot!

Can you also point me to the place where they explain how the initial
translation level is chosen for Stage 1 translation? I only found a way to
set it for Stage 2.

For all I understand, for Stage 1 it's implicit, done by the machine depending
on VA size and granule size.

For instance, if I set the VA size to 36 (instead of the usual 48) and the
granule to 16k, does the system automatically know it should start at
Level 2?

Is this correct ?

Also, if VA size is 32 and the granule is 16k, which level will it start on? Is it
3?

Dumitru
dpotop

bzt
Posts: 393
Joined: Sat Oct 14, 2017 9:57 pm

Re: MMU config of RPi3 - problem with ARMv8 ARM

Wed Feb 27, 2019 10:24 am

Hi,
dpotop wrote:
Wed Feb 27, 2019 7:28 am
Can you also point me to the place where they explain how the initial
translation level is chosen for Stage 1 translation? I only found a way to
set it for Stage 2.
You are running at an execution level defined by the CurrentEL system register, let's call it X. When page address translation is needed (sctrl_elX's MMU bit is set), then ttbr0_elX or ttbr1_elX registers are used to get the root of the translation tables, and tcr_elX to get the characteristics.
dpotop wrote:
Wed Feb 27, 2019 7:28 am
For instance, if I set the VA size to 36 (instead of the usual 48) and the
granule to 16k, does the system automatically know it should start at
Level 2?
...and the tcr_elX.tysz value in those aformentioned system registers specify the table walk. The DDI0487 document on page D5-2419 has a nice comparision table D5-13, called "TnSZ values for and input address rangesa for starting at this level" which compares possible values, and tells you the initial lookup level. In short, with 16k pages if you want to start at translation level 2, then you have to set TnSZ between 28 and 38 (depending on the wanted VA size).

Cheers,
bzt

dpotop
Posts: 78
Joined: Mon Nov 24, 2014 2:14 pm

Re: MMU config of RPi3 - problem with ARMv8 ARM

Wed Feb 27, 2019 11:58 am

bzt wrote:
Wed Feb 27, 2019 10:24 am
The DDI0487 document on page D5-2419 has a nice comparision table D5-13, called "TnSZ values for and input address rangesa for starting at this level" which compares possible values, and tells you the initial lookup level. In short, with 16k pages if you want to start at translation level 2, then you have to set TnSZ between 28 and 38 (depending on the wanted VA size).
Thanks bzt !

Now I seem to have only one question left before re-starting to code:
In the table you mention (D5-13), assume a granule size of 64k.

1. If T0SZ=35, I am directly in Level 3, and VAs have 29 bits. When
building the PA, the bits [28:16] are provided by the page descriptor.
The bits [11:0] are always taken unchanged from the VA.
How about bits [15:12]? Are they also taken from the VA?

2. If T0SZ=39, I am again at L3, VAs have 25 bits. Then, bits
[11:0] are always taken unchanged from the VA, and bits
[24:12] are taken from the page descriptor. Right ?

Best regards,
Dumitru
dpotop

User avatar
Paeryn
Posts: 2735
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: MMU config of RPi3 - problem with ARMv8 ARM

Wed Feb 27, 2019 12:36 pm

dpotop wrote:
Wed Feb 27, 2019 11:58 am
bzt wrote:
Wed Feb 27, 2019 10:24 am
The DDI0487 document on page D5-2419 has a nice comparision table D5-13, called "TnSZ values for and input address rangesa for starting at this level" which compares possible values, and tells you the initial lookup level. In short, with 16k pages if you want to start at translation level 2, then you have to set TnSZ between 28 and 38 (depending on the wanted VA size).
Thanks bzt !

Now I seem to have only one question left before re-starting to code:
In the table you mention (D5-13), assume a granule size of 64k.

1. If T0SZ=35, I am directly in Level 3, and VAs have 29 bits. When
building the PA, the bits [28:16] are provided by the page descriptor.
The bits [11:0] are always taken unchanged from the VA.
How about bits [15:12]? Are they also taken from the VA?

2. If T0SZ=39, I am again at L3, VAs have 25 bits. Then, bits
[11:0] are always taken unchanged from the VA, and bits
[24:12] are taken from the page descriptor. Right ?

Best regards,
Dumitru
With 64K granularity then PA[15:0] = VA[15:0]
With 16K granularity then PA[13:0] = VA[13:0]
With 4K granularity then PA[11:0] = VA[11:0]
She who travels light — forgot something.

dpotop
Posts: 78
Joined: Mon Nov 24, 2014 2:14 pm

Re: MMU config of RPi3 - problem with ARMv8 ARM

Wed Feb 27, 2019 1:15 pm

Paeryn wrote:
Wed Feb 27, 2019 12:36 pm
dpotop wrote:
Wed Feb 27, 2019 11:58 am
2. If T0SZ=39, I am again at L3, VAs have 25 bits. Then, bits
[11:0] are always taken unchanged from the VA, and bits
[24:12] are taken from the page descriptor. Right ?
With 64K granularity then PA[15:0] = VA[15:0]
Ok, but then, in the case mentioned above, the pages pointed to
by two successive descriptors can overlap, because the 13 bits
used as table index (VA[24:12]) and the 16 bits you mention
overlap. Right?

In fact, you end up pointing to pages smaller than the 64k max.
In this case, the pages will have size 2^12=4kB.

If this is true, then I finally understood the way it works. My mistake
was to assume that pages have a fixed size (e.g. 64k for a 64k
granule).

Can you please tell me if I got it right this time?

Dumitru

PS: If I use a block descriptor at a level higher than L3, is the
part of VA copied to PA increased to cover the missing bits?
dpotop

User avatar
Paeryn
Posts: 2735
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: MMU config of RPi3 - problem with ARMv8 ARM

Wed Feb 27, 2019 1:53 pm

dpotop wrote:
Wed Feb 27, 2019 1:15 pm
Paeryn wrote:
Wed Feb 27, 2019 12:36 pm
dpotop wrote:
Wed Feb 27, 2019 11:58 am
2. If T0SZ=39, I am again at L3, VAs have 25 bits. Then, bits
[11:0] are always taken unchanged from the VA, and bits
[24:12] are taken from the page descriptor. Right ?
With 64K granularity then PA[15:0] = VA[15:0]
Ok, but then, in the case mentioned above, the pages pointed to
by two successive descriptors can overlap, because the 13 bits
used as table index (VA[24:12]) and the 16 bits you mention
overlap. Right?

In fact, you end up pointing to pages smaller than the 64k max.
In this case, the pages will have size 2^12=4kB.

If this is true, then I finally understood the way it works. My mistake
was to assume that pages have a fixed size (e.g. 64k for a 64k
granule).

Can you please tell me if I got it right this time?
Dumitru
No, with 64k granularity pages are on a 64k boundary (by definition).

VA[47:29] is used as an index in the level 2 lookup (if used) to give the address of the level 3 table,
VA[28:16] is used as an index in the level 3 lookup to give you PA[47:16],
VA[15:0] gives you PA[15:0] directly.

To have VA[24:12] used in the lookup would require 4k granularity and levels 2 & 3 (level 2 resolves VA[29:21], level 3 resolves VA[20:12] which would give you PA[47:12], then VA[11:0] is used directly for PA[11:0]).
She who travels light — forgot something.

dpotop
Posts: 78
Joined: Mon Nov 24, 2014 2:14 pm

Re: MMU config of RPi3 - problem with ARMv8 ARM

Wed Feb 27, 2019 4:49 pm

Paeryn wrote:
Wed Feb 27, 2019 1:53 pm
No, with 64k granularity pages are on a 64k boundary (by definition).

VA[47:29] is used as an index in the level 2 lookup (if used) to give the address of the level 3 table,
VA[28:16] is used as an index in the level 3 lookup to give you PA[47:16],
VA[15:0] gives you PA[15:0] directly.
I guess the first index is VA[41:29], not VA[47:29] (so that it's exactly 13 bits).

Paeryn wrote:
Wed Feb 27, 2019 1:53 pm
To have VA[24:12] used in the lookup would require 4k granularity and levels 2 & 3 (level 2 resolves VA[29:21], level 3 resolves VA[20:12] which would give you PA[47:12], then VA[11:0] is used directly for PA[11:0]).
The issue is that I don't really want PA to be 48-bit.
Ideally, they would have to be 32-bit (4GB, like the physical memory
of the Pi3). But here I try to understand the actual rules, so I'm assuming
T0SZ=39, hence VAs on 25 bits.

According to table D5-15 (page 2423 of ARMv8 ARM) I can have 25-bit VAs
with 64kB granule. The question is - how it works...

Dumitru
dpotop

User avatar
Paeryn
Posts: 2735
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: MMU config of RPi3 - problem with ARMv8 ARM

Wed Feb 27, 2019 7:43 pm

dpotop wrote:
Wed Feb 27, 2019 4:49 pm
Paeryn wrote:
Wed Feb 27, 2019 1:53 pm
No, with 64k granularity pages are on a 64k boundary (by definition).

VA[47:29] is used as an index in the level 2 lookup (if used) to give the address of the level 3 table,
VA[28:16] is used as an index in the level 3 lookup to give you PA[47:16],
VA[15:0] gives you PA[15:0] directly.
I guess the first index is VA[41:29], not VA[47:29] (so that it's exactly 13 bits).
Oops, yes you're right, I re-wrote that part several times and managed to combine the two. [47:42] at level 1, [41:29] at level 2.
dpotop wrote:
Wed Feb 27, 2019 4:49 pm
Paeryn wrote:
Wed Feb 27, 2019 1:53 pm
To have VA[24:12] used in the lookup would require 4k granularity and levels 2 & 3 (level 2 resolves VA[29:21], level 3 resolves VA[20:12] which would give you PA[47:12], then VA[11:0] is used directly for PA[11:0]).
The issue is that I don't really want PA to be 48-bit.
Ideally, they would have to be 32-bit (4GB, like the physical memory
of the Pi3). But here I try to understand the actual rules, so I'm assuming
T0SZ=39, hence VAs on 25 bits.

According to table D5-15 (page 2423 of ARMv8 ARM) I can have 25-bit VAs
with 64kB granule. The question is - how it works...

Dumitru
You always end up with a 48-bit address as the final level 3 descriptor provides all but the low 12/14/16 bits depending on granularity. For 32-bit PAs just have the level 3 entries all with the top 16 bits of OA clear (so the final address will have bits 47:32 = 0).

With a VA of 25 bits you'll have a 32MB virtual address space, at a granularity of 64K only level 3 is used. Level 3 can map up to 512MB of virtual address at 64K, it contains 8192 page entries, 1 for each of VA[28:16], if you only use a 25-bit VA then the first 1024 are needed so set the other 7168 with bit 0 clear (to mark them invalid).
She who travels light — forgot something.

dpotop
Posts: 78
Joined: Mon Nov 24, 2014 2:14 pm

Re: MMU config of RPi3 - problem with ARMv8 ARM

Wed Feb 27, 2019 8:04 pm

Ok, so with your help (thank you again) I start to understand more of the ARM documents,
and contribute productively to this discussion.

I'm now looking into the "Programmer’s Guide for ARMv8-A", which can be found here:
http://infocenter.arm.com/help/topic/co ... ure_PG.pdf

In section 12.3, page 170 the figure shows a scenario with 64kB granule and 42 VA bits.
You can see that for a block descriptor at level 2 the PA takes more than the standard
16 bits from the VA. So I guess this answers my question of where the missing bits
come from.

In other words, the 16 bits seem to be a minimum for the number of bits taken from
the VA into the PA for 64kB granule.

In the same section, page 171, there's an example that shows how to deal with cases
where the number of decode bits is not a multiple of the decode bits per level. The
first level is then incomplete (it uses a shorter table).

This answers my questions.

Dumitru
dpotop

Return to “Bare metal, Assembly language”