krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Trying Bare Metal on Raspberry Pi 2

Fri Feb 06, 2015 12:26 pm

Hi guys,

I am lucky enough to have my RPi 2 already, & love the challenge of getting my code to work bare-metal on any new system...
I really want to get all my bare metal code already done for the original RPi, to work on the RPi2, and improve them etc.
e.g I have some fractal demos I would like optimize using the NEON instructions, then try to get that same code to work across all 4 cores =D

So I started with a simple blinking LED program:

Code: Select all

PERIPHERAL_BASE = $3F000000 ; Raspberry Pi 2 Peripheral Base Address

GPBASE  = $200000 ; $3F200000
GPFSEL1 =      $4 ; $3F200004
GPSET0  =     $1C ; $3F20001C
GPCLR0  =     $28 ; $3F200028

org $8000

mov r0,PERIPHERAL_BASE
orr r0,GPBASE ; R0 = GPBASE
ldr r1,[r0,GPFSEL1] ; R1 = GPFSEL1
mov r2,7
and r1,r2,lsl 18 ; &= 7 << 18
mov r2,1
orr r1,r2,lsl 18 ; |= 1 << 18
str r1,[r0,GPFSEL1]

mov r2,r2,lsl 16 ; 1 << 16
Loop:
  str r2,[r0,GPSET0]
  mov r1,$100000
  WaitA:
    subs r1,1
    bne WaitA
  str r2,[r0,GPCLR0]
  mov r1,$100000
  WaitB:
    subs r1,1
    bne WaitB

  b Loop
This code runs as a kernel.img on the original Raspberry Pi (256MB) to blink the LED.
Along with the latest bootcode.bin & start.elf files.

All I have changed is the Peripheral Base Address from $20000000 to $3F000000, but unfortuantly it does not work atm :(

I have some questions to ask, in case my understanding of ARM Cortex is not correct:

1. Do I need to change the origin "ORG $8000", e.g does the multi core ARM CPU start code from a different offset?
2. Have any of the GPIO register locations changed from the original locations?
3. Do I need any special config.txt or cmdline.txt options to get the RPi 2 to swing into action?
4. As the Peripheral Base Address has changed to $3F000000
does this mean the Raspberry Pi 2 can not access the whole 1GB region: $00000000..$40000000?

Any help would be much appreciated =D

rst
Posts: 410
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: Trying Bare Metal on Raspberry Pi 2

Fri Feb 06, 2015 12:46 pm

Not sure but I suppose the Act LED on the Raspberry Pi 2 is connected to GPIO47 (not 16) as it is on Pi 1 Model A+ and B+.

ShiftPlusOne
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 6031
Joined: Fri Jul 29, 2011 5:36 pm
Location: The unfashionable end of the western spiral arm of the Galaxy

Re: Trying Bare Metal on Raspberry Pi 2

Fri Feb 06, 2015 12:49 pm

Have you saved it as kernel7.img for pi 2?

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Trying Bare Metal on Raspberry Pi 2

Fri Feb 06, 2015 1:19 pm

Thanks rst for the quick response.
Using your advice, I got my 1st Raspberry Pi 2 Bare metal demo to run!
It now blinks the LED perfectly using GPIO47 for the Green LED =D

Here is the full working source:

Code: Select all

format binary as 'img'

PERIPHERAL_BASE = $3F000000 ; Raspberry Pi 2 Peripheral Base Address

GPBASE  = $200000 ; $3F200000
GPFSEL1 =      $4 ; $3F200004
GPSET1  =     $20 ; $3F200020
GPCLR1  =     $2C ; $3F20002C

org $8000

mov r0,PERIPHERAL_BASE
orr r0,GPBASE ; R0 = GPBASE
ldr r1,[r0,GPFSEL1] ; R1 = GPFSEL1
mov r2,7
and r1,r2,lsl 18 ; &= 7 << 18
mov r2,1
orr r1,r2,lsl 18 ; |= 1 << 18
str r1,[r0,GPFSEL1]

mov r2,r2,lsl 15 ; 1 << 15
Loop:
  str r2,[r0,GPSET1]
  mov r1,$100000
  WaitA:
    subs r1,1
    bne WaitA
  str r2,[r0,GPCLR1]
  mov r1,$100000
  WaitB:
    subs r1,1
    bne WaitB

  b Loop
**EDIT
Oh ShiftPlusOne is it meant to be kernel7.img, I have it named to kernel.img and it still works, I had no idea kernel7.img is what is used by the RPi2!!

User avatar
rpdom
Posts: 15385
Joined: Sun May 06, 2012 5:17 am
Location: Chelmsford, Essex, UK

Re: Trying Bare Metal on Raspberry Pi 2

Fri Feb 06, 2015 1:36 pm

I *think* it will look for a kernel7.img on the Pi2 and if it isn't there it will load a kernel.img instead. I could be wrong, I haven't got a 2 to try it on.

Note that the action of GPIO 47 is inverted compared to GPIO 16. ie. the LED is lit when the GPIO is high, rather than when it is low.

ShiftPlusOne
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 6031
Joined: Fri Jul 29, 2011 5:36 pm
Location: The unfashionable end of the western spiral arm of the Galaxy

Re: Trying Bare Metal on Raspberry Pi 2

Fri Feb 06, 2015 1:44 pm

Here's how it works:

At first, the firmware will set kernel to kernel.img.
If you have a kernel specified in config.txt, it will overwrite it.
If kernel==kernel.img && pi2, load kernel7.img. If that fails (or doesn't happen at all), load kernel.

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Trying Bare Metal on Raspberry Pi 2

Fri Feb 06, 2015 1:45 pm

Cheers rpdom, good to know about the exact high/low state for GPIO 47 =D

Also to confirm, if the kernel is saved as kernel.img it does find it on the RPi2.
I have tested using kernel7.img, and will use this as this correct name for all my Raspberry Pi 2 demos, so as to help people know it is a RPi2 program =D

Thanks for all the help guys, I'll continue on making more demos, and upload them all to my github.

dpotop
Posts: 78
Joined: Mon Nov 24, 2014 2:14 pm

Re: Trying Bare Metal on Raspberry Pi 2

Fri Feb 06, 2015 2:39 pm

Thanks krom for your example. It seems that your code is single-core.
Can one make the assumption that only core 0 is active when the RPi
starts executing kernel.img?

Yours,
Dumitru
dpotop

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Trying Bare Metal on Raspberry Pi 2

Fri Feb 06, 2015 3:17 pm

Hi dpotop,
Yes I assume the ARM Cortex starts in single core mode when it boots, & would require extra setup to make it execute programs across all 4 cores...
I will try to make multi core demo examples asap once I can figure out how to do it =D

dpotop
Posts: 78
Joined: Mon Nov 24, 2014 2:14 pm

Re: Trying Bare Metal on Raspberry Pi 2

Fri Feb 06, 2015 8:19 pm

Ok, so if **all** other peripherals function in the same way,
the "cortex a programmer manual" and "cortex a7 TRM"
should be enough to do bare metal. Cool!

Still, your question 4 (on the peripherals that overlap with
RAM) remains unanswered.

Dumitru
dpotop

ShiftPlusOne
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 6031
Joined: Fri Jul 29, 2011 5:36 pm
Location: The unfashionable end of the western spiral arm of the Galaxy

Re: Trying Bare Metal on Raspberry Pi 2

Fri Feb 06, 2015 8:34 pm

4) yes.

User avatar
rpdom
Posts: 15385
Joined: Sun May 06, 2012 5:17 am
Location: Chelmsford, Essex, UK

Re: Trying Bare Metal on Raspberry Pi 2

Fri Feb 06, 2015 9:22 pm

As the ARM will never get all of the RAM for itself anyway (seeing as the GPU will always have at least that missing 16MB of it), is that a problem? Seems that is not the case.
Last edited by rpdom on Wed Feb 11, 2015 4:25 pm, edited 1 time in total.

PlutoniumBob
Posts: 16
Joined: Sun Feb 17, 2013 1:24 pm
Location: 1313 Mockingbird Lane

Re: Trying Bare Metal on Raspberry Pi 2

Fri Feb 06, 2015 9:35 pm

ShiftPlusOne wrote:4) yes.
So we loose the top 16M page, no probs, I can see how keeping everything in the
same 1G memory frame would be a good idea and it's still over 1,000,000,000 bytes...

Hmmm, 16M, you do realize that is over 8000 Arduino Due's. (ATmega328's) :D

And we can build two versions of our stuff and put RPi1x code in kernel.img
and RPi2x code in kernel7.img.

Bill

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Trying Bare Metal on Raspberry Pi 2

Mon Feb 09, 2015 1:16 am

I have converted all my demos on my github (apart from the GB emu) to Raspberry 2 compatible kernel7.img files:
https://github.com/PeterLemon/RaspberryPi

All the demos are exactly the same code with the only change being the Raspberry Pi 2 peripheral base address $3F000000.
Raspberry Pi uses kernel.asm & assembles to a kernel.img binary file.
Raspberry Pi 2 uses kernel7.asm & assembles to a kernel7.img binary file.

I am using the newest firmware bootcode.bin & start.elf files from:
https://github.com/raspberrypi/firmware ... aster/boot

The only other file I use is this config.txt file:

Code: Select all

disable_overscan=1
framebuffer_swap=0
I kept a log of the things I came across in the conversion process:

1. DMA does not seem to work correctly atm (Single Shot DMA, DMA + DREQ, & DMA + Stride Works).
Seems to be a problem issuing another DMA after the 1st has been started.
This means that all my Sound DMA + DREQ demos, only play the 1st buffer of sample data,
And all my GFX printing demos using DMA + Stride only display the 1st tile drawn, atm on the Raspberry Pi 2

2. Old Frame Buffer code does not work, only MailBox Tags Frame Buffer works.
The older way of setting the frame buffer, does not seem to exist on the Raspberry Pi 2.
So I now use the newer Mailbox Property Interface Tags way of setting up the frame buffer in all of my demos.

3. ARM Cortex A7 VFP Setup has changed.
I converted my VFP Fractal demos to run on the Raspberry Pi 2, & noticed I needed to turn on the VFP unit in a different way.
I found this url to help me: http://infocenter.arm.com/help/index.js ... 01s02.html

Code: Select all

; 1. Set the CPACR for access to CP10 and CP11, and clear the ASEDIS and D32DIS bits:
LDR r0, =(0xF << 20)
MCR p15, 0, r0, c1, c0, 2

; 2. Set the FPEXC EN bit to enable the NEON MPE:
MOV r3, #0x40000000 
VMSR FPEXC, r3
**Note** I found the VFP execution on the Raspberry Pi extremely slow :(
This same Julia animation demo seems to perform like 8..10X faster on the old Raspberry Pi:
https://github.com/PeterLemon/Raspberry ... ctal/Julia
I really hope it is something simple I am doing wrong that is making it this slow...
Do I need to set 900MHz clock for the CPU, or is it something strange todo with Double VFP calculations on this new HW?

4. ARM Branch Prediction & L1 Cache Setup has changed.
I have not worked out howto turn this on yet...

5. V3D_RFC (Render Frame Count) Register does not seem to be updating correctly at the end of a full frame.
All of my single frame V3D demos work, apart from the bouncing triangle refresh multiple frame demo here:
https://github.com/PeterLemon/Raspberry ... st/Refresh
It is very strange that this is not working, and feels like the same thing going wrong with DMA...

Also I could not run any code from the new uncached SDRAM region which I think is at $C0000000.

I hope this helps anyone getting started on bare metal programming on the Raspberry Pi 2.
Any help would be appreciated with any of my findings, I really want to get all my Raspberry Pi 2 demos working stable & fast =D

rst
Posts: 410
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: Trying Bare Metal on Raspberry Pi 2

Mon Feb 09, 2015 11:44 am

Thank you for posting your porting experiences. Some notes:
krom wrote:2. Old Frame Buffer code does not work, only MailBox Tags Frame Buffer works.
The older way of setting the frame buffer, does not seem to exist on the Raspberry Pi 2.
For me the older method works well. There must be another influence.
I found the VFP execution on the Raspberry Pi extremely slow :(
This same Julia animation demo seems to perform like 8..10X faster on the old Raspberry Pi:
This may have to do with the cache setup. I found the Pi 2 on bare metal without any cache setup about 6 times slower than the Pi 1 in a simple delay loop.
4. ARM Branch Prediction & L1 Cache Setup has changed.
I have not worked out howto turn this on yet...
But with the L1 instruction cache and branch prediction turned on it's getting more than 30 times faster in this delay loop. This can be done by:

Code: Select all

#define ARM_AUX_CONTROL_SMP	(1 << 6)

#define ARM_CONTROL_BRANCH_PREDICTION	   (1 << 11)
#define ARM_CONTROL_L1_INSTRUCTION_CACHE   (1 << 12)

	u32 nAuxControl;
	asm volatile ("mrc p15, 0, %0, c1, c0,  1" : "=r" (nAuxControl));
	nAuxControl |= ARM_AUX_CONTROL_SMP;
	asm volatile ("mcr p15, 0, %0, c1, c0,  1" : : "r" (nAuxControl));   // SMP bit must be set according to ARM TRM

	u32 nControl;
	asm volatile ("mrc p15, 0, %0, c1, c0,  0" : "=r" (nControl));
	nControl |= ARM_CONTROL_BRANCH_PREDICTION | ARM_CONTROL_L1_INSTRUCTION_CACHE;
	asm volatile ("mcr p15, 0, %0, c1, c0,  0" : : "r" (nControl) : "memory");
Some things to add:
  • The interrupt controller works as expected from Pi 1.
  • The caching system has heavily changed. At least there is no function to clear and/or invalidate the whole data cache anymore. I think this has to be done by enumerating over the targeted memory range to clear/invalidate. This should also be better for the performance. A cache line of the L1 data cache is 64 byte (16 words) long.
  • There are also changes to the MMU.

tvjon
Posts: 710
Joined: Mon Jan 07, 2013 9:11 am

Re: Trying Bare Metal on Raspberry Pi 2

Mon Feb 09, 2015 4:15 pm

krom wrote:...
It is very strange that this is not working, and feels like the same thing going wrong with DMA...

Also I could not run any code from the new uncached SDRAM region which I think is at $C0000000.

I hope this helps anyone getting started on bare metal programming on the Raspberry Pi 2.
Any help would be appreciated with any of my findings, I really want to get all my Raspberry Pi 2 demos working stable & fast =D
It does help, thank you (& of course, rst)

This may help you, hopefully.

http://www.raspberrypi.org/forums/viewt ... 15#p689689

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Trying Bare Metal on Raspberry Pi 2

Mon Feb 09, 2015 5:48 pm

Hi rst, thanks for your help on this =D
rst wrote:For me the older method works well. There must be another influence.
O.k I'll have to look back into this, if the old framebuffer code works for you, I must be doing something wrong that still worked on the old Raspberry Pi.

I have updated my fractal demos to use L1 cache, & finally the Raspberry Pi 2 is running VFP Double calculation code noticeably faster than than the old Raspberry Pi:
https://github.com/PeterLemon/Raspberry ... FP/Fractal
So I am happy to start my NEON multi-core development work =D

I did stumble across another problem, now with standard ARM Integer calculation speeds, I noticed this when making a quick mock up Video Codec:
https://github.com/PeterLemon/Raspberry ... GRBLZVideo
On the old Raspberry this video plays back at ~80 frames per second, on the Raspberry Pi 2 it plays back at ~12 FPS!
I am using the cache & it does speed it up from ~7FPS, but it seems so slow, as if the ARM is only clocked at ~100MHz...

I am gonna make a quick test to print on screen the MHz speed of the ARM Core & SDRAM Memory speeds, cause it really does seem slow to me atm using my current bare-metal setup.

rst
Posts: 410
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: Trying Bare Metal on Raspberry Pi 2

Mon Feb 09, 2015 10:58 pm

Hi krom,
krom wrote:I did stumble across another problem, now with standard ARM Integer calculation speeds, I noticed this when making a quick mock up Video Codec:
https://github.com/PeterLemon/Raspberry ... GRBLZVideo
On the old Raspberry this video plays back at ~80 frames per second, on the Raspberry Pi 2 it plays back at ~12 FPS!
I am using the cache & it does speed it up from ~7FPS, but it seems so slow, as if the ARM is only clocked at ~100MHz...
I think this is caused by the data cache which is not working without enabling the MMU because all memory is "Strongly-ordered" in this case even if bit 2 of the "System Control Register" is set. I suppose your Video Codec uses much more data load and stores so this has a greater influence on it but on the fractal demo. (?)

I did some measurements with this simple GPIO sampling routine:

Code: Select all

fastloop:
	ldr	r6, [r5]             @read GPLEV0
	str	r6, [r0], #4
	subs	r1, r1, #1
	bhi	fastloop
On Pi 2 with instruction cache and branch prediction enabled (no MMU) the sampling rate is about 2 MHz, on Pi 1 without any cache (no MMU) it is about 6 MHz and on Pi 1 with MMU (all caches enabled) it is over 12 MHz.

This should be caused by the more complex memory architecture of the Pi 2. I guess that should not be a problem on the Pi 2 with MMU enabled. I'm working on it.
So I am happy to start my NEON multi-core development work =D
Have fun!

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Trying Bare Metal on Raspberry Pi 2

Tue Feb 10, 2015 6:51 pm

Hi rst I got some clock rates from the Raspberry Pi 2 initial State:

EMMC Max Clock = $EE6B280 (250MHz) Default = $EE6B280 (250MHz)
UART Max Clock = $3B9ACA00 (1GHz) Default = $2DC6C0 (3MHz)
ARM Max Clock = $35A4E900 (900MHz) Default = $23C34600 (600MHz)
CORE Max Clock = $EE6B280 (250MHz) Default = $EE6B280 (250MHz)
V3D Max Clock = $EE6B280 (250MHz) Default = $0 (0MHz)
H264 Max Clock = $EE6B280 (250MHz) Default = $0 (0MHz)
ISP Max Clock = $EE6B280 (250MHz) Default = $0 (0MHz)
SDRAM Max Clock = $1AD27480 (450MHz) Default = $17BF1A00 (398.4MHz)
PIXEL Max Clock = $8F0D1800 (2.4GHz) Default = $92DDA80 (154MHz)
PWM Max Clock = $1DCD6500 (500MHz) Default = $0 (0MHz)

So This shows the Video Core & V3D still run at 250Mhz.
rst wrote:On Pi 2 with instruction cache and branch prediction enabled (no MMU) the sampling rate is about 2 MHz, on Pi 1 without any cache (no MMU) it is about 6 MHz and on Pi 1 with MMU (all caches enabled) it is over 12 MHz.
This should be caused by the more complex memory architecture of the Pi 2. I guess that should not be a problem on the Pi 2 with MMU enabled. I'm working on it.
Cheers for these figures, this makes lots of sense now, I wish you luck in being able to get the MMU setup correctly =D
Would be great if you could tell me howto do it if you do manage it =D

I'll stick to doing NEON demos for now, as I am very happy with the speed of my VFP Fractal demos atm, so it should speed them up greatly =D

mimi123
Posts: 583
Joined: Thu Aug 22, 2013 3:32 pm

Re: Trying Bare Metal on Raspberry Pi 2

Tue Feb 10, 2015 6:53 pm

krom wrote:Hi rst I got some clock rates from the Raspberry Pi 2 initial State:

EMMC Max Clock = $EE6B280 (250MHz) Default = $EE6B280 (250MHz)
UART Max Clock = $3B9ACA00 (1GHz) Default = $2DC6C0 (3MHz)
ARM Max Clock = $35A4E900 (900MHz) Default = $23C34600 (600MHz)
CORE Max Clock = $EE6B280 (250MHz) Default = $EE6B280 (250MHz)
V3D Max Clock = $EE6B280 (250MHz) Default = $0 (0MHz)
H264 Max Clock = $EE6B280 (250MHz) Default = $0 (0MHz)
ISP Max Clock = $EE6B280 (250MHz) Default = $0 (0MHz)
SDRAM Max Clock = $1AD27480 (450MHz) Default = $17BF1A00 (398.4MHz)
PIXEL Max Clock = $8F0D1800 (2.4GHz) Default = $92DDA80 (154MHz)
PWM Max Clock = $1DCD6500 (500MHz) Default = $0 (0MHz)

So This shows the Video Core & V3D still run at 250Mhz.
rst wrote:On Pi 2 with instruction cache and branch prediction enabled (no MMU) the sampling rate is about 2 MHz, on Pi 1 without any cache (no MMU) it is about 6 MHz and on Pi 1 with MMU (all caches enabled) it is over 12 MHz.
This should be caused by the more complex memory architecture of the Pi 2. I guess that should not be a problem on the Pi 2 with MMU enabled. I'm working on it.
Cheers for these figures, this makes lots of sense now, I wish you luck in being able to get the MMU setup correctly =D
Would be great if you could tell me howto do it if you do manage it =D

I'll stick to doing NEON demos for now, as I am very happy with the speed of my VFP Fractal demos atm, so it should speed them up greatly =D
You can get the CPU to run at 900MHz instead of 600MHz. :-)

rst
Posts: 410
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: Trying Bare Metal on Raspberry Pi 2

Wed Feb 11, 2015 7:24 am

Hi krom,
krom wrote:Hi rst I got some clock rates from the Raspberry Pi 2 initial State:
[...]
ARM Max Clock = $35A4E900 (900MHz) Default = $23C34600 (600MHz)
CORE Max Clock = $EE6B280 (250MHz) Default = $EE6B280 (250MHz)
V3D Max Clock = $EE6B280 (250MHz) Default = $0 (0MHz)
[...]
So This shows the Video Core & V3D still run at 250Mhz.
That's interesting. Thanks for the figures. Does it mean we could raise the ARM clock to 900 MHz as mimi123 suggested (by using the "Set clock rate" mailbox property function) and why is it not 900 MHz by default? Maybe I will try this later.
I wish you luck in being able to get the MMU setup correctly =D
Would be great if you could tell me howto do it if you do manage it =D
I got it running yet. The MMU has not changed as much as I thought before. Most changes can be ignored for our bare metal cases.

But I'm not satisfied with it yet. I have to set the memory region attributes to "outer and inner write through" that it works with memory mapped functions (like the mailbox property tags). That's slower than expected. Perhaps this can be ensured also by using the cache maintenance operations right but it doesn't work for me so far. Another possibility could be a special "device memory pool" to be used by those functions which requires some extra memory management.

If you want to try it anyway here is the C code:

Code: Select all

void EnableMMU (void)     // not fully optimized
{
  static volatile __attribute__ ((aligned (0x4000))) unsigned PageTable[4096];

  unsigned base;
  for (base = 0; base < 1024-16; base++)
  {
    // section descriptor (1 MB)
    // outer and inner write back, write allocate, not shareable (fast but unsafe)
    //PageTable[base] = base << 20 | 0x0140E;
    // outer and inner write through, no write allocate, shareable (safe but slower)
    PageTable[base] = base << 20 | 0x1040A;
  }
  for (; base < 4096; base++)
  {
    // shared device, never execute
    PageTable[base] = base << 20 | 0x10416;
  }

  // set SMP bit in ACTLR
  unsigned auxctrl;
  asm volatile ("mrc p15, 0, %0, c1, c0,  1" : "=r" (auxctrl));
  auxctrl |= 1 << 6;
  asm volatile ("mcr p15, 0, %0, c1, c0,  1" :: "r" (auxctrl));

  // set domain 0 to client
  asm volatile ("mcr p15, 0, %0, c3, c0, 0" :: "r" (1));

  // always use TTBR0
  asm volatile ("mcr p15, 0, %0, c2, c0, 2" :: "r" (0));

  // set TTBR0 (page table walk inner and outer non-cacheable, non-shareable memory)
  asm volatile ("mcr p15, 0, %0, c2, c0, 0" :: "r" (0 | (unsigned) &PageTable));

  asm volatile ("isb" ::: "memory");

  // enable MMU, caches and branch prediction in SCTLR
  unsigned mode;
  asm volatile ("mrc p15, 0, %0, c1, c0, 0" : "=r" (mode));
  mode |= 0x1805;
  asm volatile ("mcr p15, 0, %0, c1, c0, 0" :: "r" (mode) : "memory");
}

mimi123
Posts: 583
Joined: Thu Aug 22, 2013 3:32 pm

Re: Trying Bare Metal on Raspberry Pi 2

Wed Feb 11, 2015 12:30 pm

rst wrote:Hi krom,
krom wrote:Hi rst I got some clock rates from the Raspberry Pi 2 initial State:
[...]
ARM Max Clock = $35A4E900 (900MHz) Default = $23C34600 (600MHz)
CORE Max Clock = $EE6B280 (250MHz) Default = $EE6B280 (250MHz)
V3D Max Clock = $EE6B280 (250MHz) Default = $0 (0MHz)
[...]
So This shows the Video Core & V3D still run at 250Mhz.
That's interesting. Thanks for the figures. Does it mean we could raise the ARM clock to 900 MHz as mimi123 suggested (by using the "Set clock rate" mailbox property function) and why is it not 900 MHz by default? Maybe I will try this later.
I wish you luck in being able to get the MMU setup correctly =D
Would be great if you could tell me howto do it if you do manage it =D
I got it running yet. The MMU has not changed as much as I thought before. Most changes can be ignored for our bare metal cases.

But I'm not satisfied with it yet. I have to set the memory region attributes to "outer and inner write through" that it works with memory mapped functions (like the mailbox property tags). That's slower than expected. Perhaps this can be ensured also by using the cache maintenance operations right but it doesn't work for me so far. Another possibility could be a special "device memory pool" to be used by those functions which requires some extra memory management.

If you want to try it anyway here is the C code:

Code: Select all

void EnableMMU (void)     // not fully optimized
{
  static volatile __attribute__ ((aligned (0x4000))) unsigned PageTable[4096];

  unsigned base;
  for (base = 0; base < 1024-16; base++)
  {
    // section descriptor (1 MB)
    // outer and inner write back, write allocate, not shareable (fast but unsafe)
    //PageTable[base] = base << 20 | 0x0140E;
    // outer and inner write through, no write allocate, shareable (safe but slower)
    PageTable[base] = base << 20 | 0x1040A;
  }
  for (; base < 4096; base++)
  {
    // shared device, never execute
    PageTable[base] = base << 20 | 0x10416;
  }

  // set SMP bit in ACTLR
  unsigned auxctrl;
  asm volatile ("mrc p15, 0, %0, c1, c0,  1" : "=r" (auxctrl));
  auxctrl |= 1 << 6;
  asm volatile ("mcr p15, 0, %0, c1, c0,  1" :: "r" (auxctrl));

  // set domain 0 to client
  asm volatile ("mcr p15, 0, %0, c3, c0, 0" :: "r" (1));

  // always use TTBR0
  asm volatile ("mcr p15, 0, %0, c2, c0, 2" :: "r" (0));

  // set TTBR0 (page table walk inner and outer non-cacheable, non-shareable memory)
  asm volatile ("mcr p15, 0, %0, c2, c0, 0" :: "r" (0 | (unsigned) &PageTable));

  asm volatile ("isb" ::: "memory");

  // enable MMU, caches and branch prediction in SCTLR
  unsigned mode;
  asm volatile ("mrc p15, 0, %0, c1, c0, 0" : "=r" (mode));
  mode |= 0x1805;
  asm volatile ("mcr p15, 0, %0, c1, c0, 0" :: "r" (mode) : "memory");
}
600MHz because of powerman.(you can set arm_freq_min to 900MHz in config.txt, or use the mailbox call)

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Trying Bare Metal on Raspberry Pi 2

Wed Feb 11, 2015 1:06 pm

Hi rst,
rst wrote:That's interesting. Thanks for the figures. Does it mean we could raise the ARM clock to 900 MHz as mimi123 suggested (by using the "Set clock rate" mailbox property function) and why is it not 900 MHz by default? Maybe I will try this later.
Yep, I manged to force it to 900MHz on initialization, by using the config.txt option: force_turbo=1
Which disables the dynamic cpufreq driver and minimum settings, but **WARNING** enabling this may set the warranty bit!!
But I am pretty sure setting the config.txt options: arm_freq=900 & arm_freq_min=900
Would force the ARM CPU to 900MHz in a more safe way too =D

I use the resource here to help me with config.txt options:
http://www.raspberrypi.org/documentatio ... fig-txt.md

I think it is set to 600MHz as default to save power, as under heavy CPU load it will climb upto 900MHz.
I just like to force it to the max rate, so I can be sure stuff is running as fast as possible for my tests =D
rst wrote:If you want to try it anyway here is the C code:
Thanks so much for this, I'll def give it a go & tell you if it improves my data heavy integer video codec code execution speed =D

Thanks again for all your help on this =D

rst
Posts: 410
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: Trying Bare Metal on Raspberry Pi 2

Wed Feb 11, 2015 4:10 pm

Thanks krom and mimi123 for explaining this. I didn't knew before that there is a dynamic CPU clock management.

I tried arm_freq_min=900 and both arm_freq=900 & arm_freq_min=900 but there was no change.

In the end I gave the warranty bit a go and tried arm_freq=900 & force_turbo=1 and it worked. I had no warranty issues so far and do not expect one. So this should be no problem for me. But thanks for the warning! Good to know this.

I'm not sure if the dynamic CPU clock management does only work on Linux with a supporting driver on the ARM side? How does the GPU otherwise know the current system load? That would mean it can't be used easily for bare metal.

OK, good to know how to get the full speed. Because I do not need it at the moment I will stay at 600 MHz to give the CPU a chill. :)
krom wrote:
rst wrote:If you want to try it anyway here is the C code:
Thanks so much for this, I'll def give it a go & tell you if it improves my data heavy integer video codec code execution speed =D
I am happy if I could help. I'm interested in your results. Let me know!

mimi123
Posts: 583
Joined: Thu Aug 22, 2013 3:32 pm

Re: Trying Bare Metal on Raspberry Pi 2

Wed Feb 11, 2015 4:14 pm

rst wrote:Thanks krom and mimi123 for explaining this. I didn't knew before that there is a dynamic CPU clock management.

I tried arm_freq_min=900 and both arm_freq=900 & arm_freq_min=900 but there was no change.

In the end I gave the warranty bit a go and tried arm_freq=900 & force_turbo=1 and it worked. I had no warranty issues so far and do not expect one. So this should be no problem for me. But thanks for the warning! Good to know this.

I'm not sure if the dynamic CPU clock management does only work on Linux with a supporting driver on the ARM side? How does the GPU otherwise know the current system load? That would mean it can't be used easily for bare metal.

OK, good to know how to get the full speed. Because I do not need it at the moment I will stay at 600 MHz to give the CPU a chill. :)
krom wrote:
rst wrote:If you want to try it anyway here is the C code:
Thanks so much for this, I'll def give it a go & tell you if it improves my data heavy integer video codec code execution speed =D
I am happy if I could help. I'm interested in your results. Let me know!
Linux just uses the mailbox interface

Return to “Bare metal, Assembly language”