rst
Posts: 404
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: Trying Bare Metal on Raspberry Pi 2

Wed Feb 11, 2015 5:06 pm

mimi123 wrote:Linux just uses the mailbox interface
So for bare metal one have to determine the "system load" and the SoC temperature and to switch the CPU clock as wanted via the mailbox. Thanks for info!

mimi123
Posts: 583
Joined: Thu Aug 22, 2013 3:32 pm

Re: Trying Bare Metal on Raspberry Pi 2

Wed Feb 11, 2015 5:24 pm

rst wrote:
mimi123 wrote:Linux just uses the mailbox interface
So for bare metal one have to determine the "system load" and the SoC temperature and to switch the CPU clock as wanted via the mailbox. Thanks for info!
The SoC temp check is done in the start.elf change frequency code.

rst
Posts: 404
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: Trying Bare Metal on Raspberry Pi 2

Wed Feb 11, 2015 5:34 pm

mimi123 wrote:The SoC temp check is done in the start.elf change frequency code.
OK. Thanks!

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Trying Bare Metal on Raspberry Pi 2

Thu Feb 12, 2015 5:55 pm

I have made my 1st NEON optimized Fractal demos (Raspberry Pi 2 Only):
https://github.com/PeterLemon/Raspberry ... ON/Fractal

These are much faster than the original non optimized VFP fractal demos on the Raspberry Pi 2 =D
I am gonna try to get Multi Core (SMP) stuff working now, to get a 400% speed increase in these same demos.

Also I have still not tried the MMU setup code by rst yet, but I'll update you guys when I do.

mimi123
Posts: 583
Joined: Thu Aug 22, 2013 3:32 pm

Re: Trying Bare Metal on Raspberry Pi 2

Thu Feb 12, 2015 6:11 pm

krom wrote:I have made my 1st NEON optimized Fractal demos (Raspberry Pi 2 Only):
https://github.com/PeterLemon/Raspberry ... ON/Fractal

These are much faster than the original non optimized VFP fractal demos on the Raspberry Pi 2 =D
I am gonna try to get Multi Core (SMP) stuff working now, to get a 400% speed increase in these same demos.

Also I have still not tried the MMU setup code by rst yet, but I'll update you guys when I do.
Get MMU support working. If you want, you can try running that code on VC4_VPU. (it has a 16-way SIMD unit, is dual-core at 250MHz), for better perf on older Pis

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Trying Bare Metal on Raspberry Pi 2

Thu Feb 12, 2015 8:00 pm

mimi123 wrote:Get MMU support working. If you want, you can try running that code on VC4_VPU. (it has a 16-way SIMD unit, is dual-core at 250MHz), for better perf on older Pis
Cheers mimi123, yep I have made an LED blink using the VC4 & assembling my own bootcode.bin file, but I have never setup the Frame Buffer using that VC4 CPU...
Is it possible to still use MailBox Property Interface to set up the Frame Buffer, using just VC4 code in a bootcode.bin?
Or is that whole interface only available from the ARM side, after booting from bootcode.bin & start.elf...
Do you know a way to setup the screen Frame Buffer using low level GPU commands in VC4 mode, if so please share the code as I would love it =D

mimi123
Posts: 583
Joined: Thu Aug 22, 2013 3:32 pm

Re: Trying Bare Metal on Raspberry Pi 2

Fri Feb 13, 2015 7:04 am

krom wrote:
mimi123 wrote:Get MMU support working. If you want, you can try running that code on VC4_VPU. (it has a 16-way SIMD unit, is dual-core at 250MHz), for better perf on older Pis
Cheers mimi123, yep I have made an LED blink using the VC4 & assembling my own bootcode.bin file, but I have never setup the Frame Buffer using that VC4 CPU...
Is it possible to still use MailBox Property Interface to set up the Frame Buffer, using just VC4 code in a bootcode.bin?
Or is that whole interface only available from the ARM side, after booting from bootcode.bin & start.elf...
Do you know a way to setup the screen Frame Buffer using low level GPU commands in VC4 mode, if so please share the code as I would love it =D
You can run VC4 code from Linux. (github.com/freeblob/samples). You can see that it only uses the mailbox interface

mimi123
Posts: 583
Joined: Thu Aug 22, 2013 3:32 pm

Re: Trying Bare Metal on Raspberry Pi 2

Fri Feb 13, 2015 11:40 am

AFAIK. you can generate a start.elf. ( github.com/freeblob/freeblob uses that)
No HDMI and probably never HDMI,...

mrvn
Posts: 58
Joined: Wed Jan 09, 2013 6:50 pm

Re: Trying Bare Metal on Raspberry Pi 2

Tue Feb 17, 2015 11:33 pm

krom wrote:Here is the full working source:

Code: Select all

format binary as 'img'

PERIPHERAL_BASE = $3F000000 ; Raspberry Pi 2 Peripheral Base Address

GPBASE  = $200000 ; $3F200000
GPFSEL1 =      $4 ; $3F200004
GPSET1  =     $20 ; $3F200020
GPCLR1  =     $2C ; $3F20002C

org $8000

mov r0,PERIPHERAL_BASE
orr r0,GPBASE ; R0 = GPBASE
ldr r1,[r0,GPFSEL1] ; R1 = GPFSEL1
mov r2,7
and r1,r2,lsl 18 ; &= 7 << 18
mov r2,1
orr r1,r2,lsl 18 ; |= 1 << 18
str r1,[r0,GPFSEL1]

mov r2,r2,lsl 15 ; 1 << 15
Loop:
  str r2,[r0,GPSET1]
  mov r1,$100000
  WaitA:
    subs r1,1
    bne WaitA
  str r2,[r0,GPCLR1]
  mov r1,$100000
  WaitB:
    subs r1,1
    bne WaitB

  b Loop
I think you have a bug in there. You are loading GPFSEL1 into R1, then you mask out all the bits that aren't for the LED and or one bit for the LED function selector. This a) resets all the functions for the other pins and b) does not reset the top two function bits for the LED. I noticed this because it breaks the UART. You have to negate your mask.

I changed the code to load the shifted constants directly instead of shifting them on use. This allows me to include the negation of the mask (just the init code):

Code: Select all

	mov	r0,#PERIPHERAL_BASE
	orr	r0,#GPBASE		// R0 = GPBASE
	ldr	r1,[r0,#GPFSEL1]	// R1 = GPFSEL1
	mov	r2,#~(7 << 18)
	and	r1,r2			// &= ~(7 << 18)
	mov	r2,#(1 << 18)
	orr	r1,r2			// |= 1 << 18
	str	r1,[r0,#GPFSEL1]

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Trying Bare Metal on Raspberry Pi 2

Wed Feb 18, 2015 3:55 am

Hi guys, here is an update on my Raspberry Pi 2 work:

I have optimized all my fractal demos for the Raspberry Pi & Raspberry Pi 2, including the NEON demos:
https://github.com/PeterLemon/Raspberry ... FP/Fractal
https://github.com/PeterLemon/Raspberry ... ON/Fractal
The NEON demos are really fast now compared to the equivalent scalar VFP demos on the Raspberry Pi 2.
I am very happy with the performance on a single ARM Cortex A7 core =D

I have started on my Multi Core (SMP) work, there is very little help on this but I did find a few helpful URLS:
http://stackoverflow.com/questions/2005 ... tart-addre
http://www.carbondesignsystems.com/virt ... A15-System

I have checked out that all my bare-metal code is running on CPU ID "0" (Core 0) atm,
using the Multiprocessor Affinity Register (MPIDR) system register:

Code: Select all

; Return CPU ID (0..3) Of The CPU Executed On
mrc p15,0,r0,c0,c0,5 ; R0 = Multiprocessor Affinity Register (MPIDR)
and r0,3 ; R0 = CPU ID (Bits 0..1)
I can also see that there are 4 CPU Cores available from the CLUSTERID,
using the same Multiprocessor Affinity Register (MPIDR) system register:

Code: Select all

; Return Value In CLUSTERID Configuration Pin
mrc p15,0,r0,c0,c0,5 ; R0 = Multiprocessor Affinity Register (MPIDR)
lsr r0,8 ; R0 = Cluster ID (Bits 8..11)
and r0,$F
There are 3 possible states that the Raspberry Pi 2's extra CPU cores might be in,
when booted up into a bare-metal state from the official firmware bootcode.bin & start.elf files:
A) The extra CPU Cores are all Powered ON and are all booting from the same start offset of code (0x8000).

B) The extra CPU Cores are all Powered ON but are in a WFI (Wait For Interrupt) state to wake them up, & boot from a specified Jump Address.

C) The extra CPU Cores are all Powered OFF, and need powering on to even start from a state like A or B.

I think that each of the 4 CPU Cores, have an individual Jump Address that are set to Null when waiting to be woken up.
I would like to think these offsets are within the system area (0x0000..0x7FFF), but I have not found where they are yet.

I also think that to wake any CPU cores up, we need to use the ARM instruction SEV (Send EVent), causing an event to be signalled to all processors in the multiprocessor system.

As a 1st test, I am not concerned atm about any scheduling, I just want the 4 cores todo some work & infinite loop when they have finished their respective code blocks.

If anyone has any idea or hints as to how the Raspberry Pi 2 handles SMP I would love any help on this =D

rst
Posts: 404
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: Trying Bare Metal on Raspberry Pi 2

Wed Feb 18, 2015 10:58 am

There is a SMP initialization example for the Cortex-A9 and -A5 in the ARM information center:
http://infocenter.arm.com/help/index.js ... 13675.html

One problem is that there seems to be no GIC (interrupt controller) in the BCM2836.

Maybe one can only look into the Linux source. I have found something SMP related here:
https://github.com/raspberrypi/linux/bl ... /bcm2709.c

According to this I suppose inter-CPU-communication works via mailboxes. But not sure.

I think that's a real challenge.

User avatar
Julien_Nantes
Posts: 2
Joined: Wed Feb 18, 2015 4:08 pm
Location: Nantes, France
Contact: Website

Re: Trying Bare Metal on Raspberry Pi 2

Wed Feb 18, 2015 4:14 pm

krom wrote:Hi guys, here is an update on my Raspberry Pi 2 work:

I have optimized all my fractal demos for the Raspberry Pi & Raspberry Pi 2, including the NEON demos:
https://github.com/PeterLemon/Raspberry ... FP/Fractal
https://github.com/PeterLemon/Raspberry ... ON/Fractal
The NEON demos are really fast now compared to the equivalent scalar VFP demos on the Raspberry Pi 2.
I am very happy with the performance on a single ARM Cortex A7 core =D
Hi Krom,

Thanks for sharing this code ! :)

I am compiling it with the win32 fasmarm, running under wine on my OS X computer, and this little "toolchain" works perfectly to produce working kernel images for my Raspberry Pi 2.

But not all your sources are working. The VPF Mandelbrot fractal works very well, for instance. The NEON one just gives me a black screen. And the simple "Hello World" only displays a white "H"...

Any idea what's wrong ?
Raspberry Pi A and 2.

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Trying Bare Metal on Raspberry Pi 2

Wed Feb 18, 2015 8:50 pm

Hi Julien_Nantes, thanks for your feedback =D
Julien_Nantes wrote:The VPF Mandelbrot fractal works very well, for instance. The NEON one just gives me a black screen.
Heh, I just tried the NEON Mandelbrot Fractal again, and it was a black screen like you said, I then did a few more power ups & it does work sometimes!
I'll try to find out what is going on here & fix it up, my NEON Julia demo works every time, so very strange!
Julien_Nantes wrote:And the simple "Hello World" only displays a white "H"...
This is a known bug in all my DMA demos running on the Raspberry Pi 2, where only the 1st DMA ever gets written, and all subsequent ones do not work :(
It also affects my DMA + DREQ Sound playback demos, where only the 1st sample buffer plays.
I will try todo a fix for this, as I think the Raspberry Pi 2 needs correct MMU setup, to run DMA properly...

If you run my hello world CPU demo here:
https://github.com/PeterLemon/Raspberry ... oWorld/CPU
You can see what the DMA demo should look like, as this is running correctly on the Raspberry Pi 2.
rst wrote:I think that's a real challenge.
Hi rst, cheers for all the links you provided, tis a great help =D
I will continue plodding on & update here if I find out howto do it.

k2tom
Posts: 10
Joined: Thu Feb 19, 2015 8:39 pm
Location: Cape Cod, MA

Re: Trying Bare Metal on Raspberry Pi 2

Thu Feb 19, 2015 9:32 pm

For what it's worth.. I just got my RPi2 ("6.28") Tuesday.. and finally got to play with it today (Thursday). Here's what I did:

(I've been working on "bare metal" projects (mostly of the RYO OS flavor) on the RPi (A, A+ and B) for a while.)

I copied bootcode.bin and start.elf from the raspbian download to the uSD. My config.txt is literally a one-liner: "kernel=mykernel.img"

I have a define (BASE) in my rpi.h file. It was "# define BASE 0x20000000", now it's conditionally defined to 0x20000000 (for the RPi) or 0x3f000000 (for the RPi2).

My bootloader (mykernel.img) sets up the interrupt controller (IC), GPIO (for the UART pins), UART (with TX and RX interrupts), TIMER (set to 1ms, with interrupts) and SYSTIMER (with an interrupt for C1). I just simply recompiled it (with the RPi2 definition).. copied it to the uSD card.. and it JUST WORKS (interrupts and all)!

The bootloader (among other things) supports xmodem, so I'm able to simply download a compressed copy of my hobby OS (which has a lot more "stuff," but from a hardware and interrupts standpoint, it's only slightly more sophisticated than the bootloader).. uncompress it.. and jump to it. And it JUST WORKS!

(Obviously, this is a very simple configuration, but still..) I don't have separate projects (makefiles) for the RPi vs the Pi2.. When I need to "switch horses," I just do a "make clean" and then a make with one of RPi or RPi2 defined. The same uSD works on all the RPi(s) AND the RPi2, I just need to change the kernel= line in config.txt.

Regards,
Tom

p.s.: Having "progressed" through several ARM cores: ARM7TDMI, ARM926EJ-S, ARM1176JZF-S and Cortex-M3 (CM3), I was really nervous that I was going to be in deep, deep trouble with the Cortex-A7 (CA7). I was expecting the worse. If you've ever worked with the CM3, you know that it DOESN'T support the ARM instruction set.. only the Thumb2 instruction set. And the CM3 integrates the interrupt controller (NVIC) and OS timer (SysTimer) (and a lot more!) into the core "proper." So I was worried that the CA7 core would be even worse. To my CONSIDERABLE surprise (shock even), that's not the case at all. The regular ARM instruction set is supported (and unless I have a compelling reason to do otherwise, I usually compile with GNUARM's (and/or YAGARTO's) defaults, which in my case means the ARM7TDMI (core) (the ARMv4T architecture)), and all the external-to-the-core peripherals (IC, TIMER, SYSTIMER, ...) are the same as they are on the RPi (except for the BASE).

rst
Posts: 404
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: Trying Bare Metal on Raspberry Pi 2

Fri Feb 20, 2015 5:14 pm

Reading this posting by dom, the referenced boot code and some Linux source files I understood what the CPU cores 1-3 are doing after system boot: http://www.raspberrypi.org/forums/viewt ... 74#p697474.

You only need to write a physical ARM address to:

0x4000008C + 0x10 * core // core := 1..3

and the respective core jumps to this address. Core 0 jumps to 0x8000 by default. This should be valid without kernel_old=1 in config.txt only.

I'm not sure so far if the Snoop Control Unit is on by default so that the memory is coherent between the cores without intervention.

mimi123
Posts: 583
Joined: Thu Aug 22, 2013 3:32 pm

Re: Trying Bare Metal on Raspberry Pi 2

Sat Feb 21, 2015 4:37 pm

k2tom wrote:For what it's worth.. I just got my RPi2 ("6.28") Tuesday.. and finally got to play with it today (Thursday). Here's what I did:

(I've been working on "bare metal" projects (mostly of the RYO OS flavor) on the RPi (A, A+ and B) for a while.)

I copied bootcode.bin and start.elf from the raspbian download to the uSD. My config.txt is literally a one-liner: "kernel=mykernel.img"

I have a define (BASE) in my rpi.h file. It was "# define BASE 0x20000000", now it's conditionally defined to 0x20000000 (for the RPi) or 0x3f000000 (for the RPi2).

My bootloader (mykernel.img) sets up the interrupt controller (IC), GPIO (for the UART pins), UART (with TX and RX interrupts), TIMER (set to 1ms, with interrupts) and SYSTIMER (with an interrupt for C1). I just simply recompiled it (with the RPi2 definition).. copied it to the uSD card.. and it JUST WORKS (interrupts and all)!

The bootloader (among other things) supports xmodem, so I'm able to simply download a compressed copy of my hobby OS (which has a lot more "stuff," but from a hardware and interrupts standpoint, it's only slightly more sophisticated than the bootloader).. uncompress it.. and jump to it. And it JUST WORKS!

(Obviously, this is a very simple configuration, but still..) I don't have separate projects (makefiles) for the RPi vs the Pi2.. When I need to "switch horses," I just do a "make clean" and then a make with one of RPi or RPi2 defined. The same uSD works on all the RPi(s) AND the RPi2, I just need to change the kernel= line in config.txt.

Regards,
Tom

p.s.: Having "progressed" through several ARM cores: ARM7TDMI, ARM926EJ-S, ARM1176JZF-S and Cortex-M3 (CM3), I was really nervous that I was going to be in deep, deep trouble with the Cortex-A7 (CA7). I was expecting the worse. If you've ever worked with the CM3, you know that it DOESN'T support the ARM instruction set.. only the Thumb2 instruction set. And the CM3 integrates the interrupt controller (NVIC) and OS timer (SysTimer) (and a lot more!) into the core "proper." So I was worried that the CA7 core would be even worse. To my CONSIDERABLE surprise (shock even), that's not the case at all. The regular ARM instruction set is supported (and unless I have a compelling reason to do otherwise, I usually compile with GNUARM's (and/or YAGARTO's) defaults, which in my case means the ARM7TDMI (core) (the ARMv4T architecture)), and all the external-to-the-core peripherals (IC, TIMER, SYSTIMER, ...) are the same as they are on the RPi (except for the BASE).
It is worth noting that WindowsRT does only use Thumb2 instructions and you can't use ARM instructions at all.( the CPU is switched to Thumb2 mode invariably after each context switch). Try ARM, and get your program crash :lol:

rst
Posts: 404
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: Trying Bare Metal on Raspberry Pi 2

Sat Feb 21, 2015 6:15 pm

Hi krom,
krom wrote:The NEON demos are really fast now compared to the equivalent scalar VFP demos on the Raspberry Pi 2.
In fact your NEON Mandelbrot demo is as fast that the fractal picture is there when my display comes up. It seems you need a new task for multi-core. :)
There are 3 possible states that the Raspberry Pi 2's extra CPU cores might be in,
when booted up into a bare-metal state from the official firmware bootcode.bin & start.elf files:
A) The extra CPU Cores are all Powered ON and are all booting from the same start offset of code (0x8000).

B) The extra CPU Cores are all Powered ON but are in a WFI (Wait For Interrupt) state to wake them up, & boot from a specified Jump Address.

C) The extra CPU Cores are all Powered OFF, and need powering on to even start from a state like A or B.
There is a slightly different state they are in:

B2) The extra CPU Cores are all Powered ON but are waiting in a program loop continuously reading a mailbox register for becoming unequal zero. If that happens take the read value as Jump Address an go there. Please see my previous posting in this topic for information where to write this address to.
I also think that to wake any CPU cores up, we need to use the ARM instruction SEV (Send EVent), causing an event to be signalled to all processors in the multiprocessor system.
SEV is not needed in in this case. Maybe it can be useful later.
As a 1st test, I am not concerned atm about any scheduling, I just want the 4 cores todo some work & infinite loop when they have finished their respective code blocks.
I did this too. I have let the cores 1-3 write the contents of its affinity register to a fixed memory location and dumped it to the screen with core 0. They did it as expected.
Hi rst, cheers for all the links you provided, tis a great help =D
Thank you. I can give this back to you and all others who gave information on the Raspberry Pi (1 and 2) here before and after. I think we have to work together to understand this great machine and to get it running on bare metal.

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Trying Bare Metal on Raspberry Pi 2

Sat Feb 21, 2015 8:00 pm

Hi rst,
Wow thanks so much for your help, this is really amazing stuff, & I am so happy you have shared your SMP findings with me here =D
I am gonna try this all out and update here with any SMP demos I make.

Also I would like to state that you are my biggest Raspberry Pi 2 hero rst, because of all the help you have given me, you are really great.
Once my Raspberry Pi 2 SMP demos are up, I'll give you rst a special thanks in my readme.md file on my Raspberry Pi github page, for helping me out =D

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Trying Bare Metal on Raspberry Pi 2

Sun Feb 22, 2015 4:31 am

Thanks to rst it was easy for me to make my 1st SMP test demo:
https://github.com/PeterLemon/Raspberry ... MP/SMPINIT
It shows all 4 cores running code & printing info to the same frame buffer video memory, at the same time =D

All I had todo was follow exactly what rst said:
I placed each of the cores program code offsets to the memory location $4000008C + ($10 * Core) // Core := 1..3
And it automatically boots & runs my code exactly as expected on each core =D

Next up, I want to make an optimized NEON fractal julia animation demo, that uses all 4 cores for a hefty speed boost.
Another thing I want to try in this demo, is to use a single block of code for all 4 cores, but using the CPU ID from my demo above, I want to make it calculate the correct pixel offsets accordingly.
It will be interesting to see if this works =D

So a big thankyou to rst for helping me unlock this huge speed-up for the Raspberry Pi 2 =D

rst
Posts: 404
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: Trying Bare Metal on Raspberry Pi 2

Sun Feb 22, 2015 9:24 am

Thank you very very much, krom and congrats for getting the first SMP demo running!

User avatar
DexOS
Posts: 876
Joined: Wed May 16, 2012 6:32 pm
Contact: Website

Re: Trying Bare Metal on Raspberry Pi 2

Mon Feb 23, 2015 5:39 pm

Great work krom, i am looking forward to my raspberry pi 2 coming, so i can test your demos.
Batteries not included, Some assembly required.

mrvn
Posts: 58
Joined: Wed Jan 09, 2013 6:50 pm

Problem with the MMU in multi-core mode

Mon Feb 23, 2015 7:10 pm

Thanks for the SMP info. Turns out it is realy easy to start cores.

But that brings me to the next problem: MMU/Caches with multi core. I have the MMU and caches all working in single core mode. Here is what I do for SMP:

Code: Select all

/**********************************************************************
 * MMU                                                                *
 **********************************************************************/
namespace MMU {
#define CACHED_TLB
//#undef CACHED_TLB

    static volatile __attribute__ ((aligned (0x4000))) uint32_t page_table[4096];
    static volatile __attribute__ ((aligned (0x400))) uint32_t leaf_table[256];

    struct page {
	uint8_t data[4096];
    };

    extern "C" {
	extern page _mem_start[];
	extern page _mem_end[];
    }

    void init_page_table() {
	uint32_t base;
	// initialize page_table
	// 1024MB - 16MB of kernel memory (some belongs to the VC)
	for (base = 0; base < 1024 - 16; base++) {
	    // section descriptor (1 MB)
#ifdef CACHED_TLB
	    // outer and inner write back, write allocate, not shareable (fast
	    // but unsafe)
	    page_table[base] = base << 20 | 0x0140E;
	    // outer and inner write back, write allocate, shareable (fast but
	    // unsafe)
	    //page_table[base] = base << 20 | 0x1140E;
#else
	    // outer and inner write through, no write allocate, shareable
	    // (safe but slower)
	    page_table[base] = base << 20 | 0x1040A;
#endif
	}

	// unused up to 0x3F000000
	for (; base < 1024 - 16; base++) {
	    page_table[base] = 0;
	}

	// 16 MB peripherals at 0x3F000000
	for (; base < 1024; base++) {
	    // shared device, never execute
	    page_table[base] = base << 20 | 0x10416;
	}

	// 1 MB mailboxes
	// shared device, never execute
	page_table[base] = base << 20 | 0x10416;
	++base;
	
	// unused up to 0x7FFFFFFF
	for (; base < 2048; base++) {
	    page_table[base] = 0;
	}

	// one second level page tabel (leaf table) at 0x80000000
	page_table[base++] = (intptr_t)leaf_table | 0x1;

	// 2047MB unused (rest of address space)
	for (; base < 4096; base++) {
	    page_table[base] = 0;
	}

	// initialize leaf_table
	for (base = 0; base < 256; base++) {
	    leaf_table[base] = 0;
	}
    }
    
    void init() {
	// set SMP bit in ACTLR
	uint32_t auxctrl;
	asm volatile ("mrc p15, 0, %0, c1, c0,  1" : "=r" (auxctrl));
	auxctrl |= 1 << 6;
	asm volatile ("mcr p15, 0, %0, c1, c0,  1" :: "r" (auxctrl));

        // setup domains (CP15 c3)
	// Write Domain Access Control Register
        // use access permissions from TLB entry
	asm volatile ("mcr     p15, 0, %0, c3, c0, 0" :: "r" (0x55555555));

	// set domain 0 to client
	asm volatile ("mcr p15, 0, %0, c3, c0, 0" :: "r" (1));

	// always use TTBR0
	asm volatile ("mcr p15, 0, %0, c2, c0, 2" :: "r" (0));

#ifdef CACHED_TLB
	// set TTBR0 (page table walk inner and outer write-back,
	// write-allocate, cacheable, shareable memory)
	asm volatile ("mcr p15, 0, %0, c2, c0, 0"
		      :: "r" (0b1001010 | (unsigned) &page_table));
	// set TTBR0 (page table walk inner and outer write-back,
	// write-allocate, cacheable, non-shareable memory)
	//asm volatile ("mcr p15, 0, %0, c2, c0, 0"
	//	      :: "r" (0b1101010 | (unsigned) &page_table));
#else
	// set TTBR0 (page table walk inner and outer non-cacheable,
	// non-shareable memory)
	asm volatile ("mcr p15, 0, %0, c2, c0, 0"
		      :: "r" (0 | (unsigned) &page_table));
#endif
	asm volatile ("isb" ::: "memory");

	/* SCTLR
	 * Bit 31: SBZ     reserved
	 * Bit 30: TE      Thumb Exception enable (0 - take in ARM state)
	 * Bit 29: AFE     Access flag enable (1 - simplified model)
	 * Bit 28: TRE     TEX remap enable (0 - no TEX remapping)
	 * Bit 27: NMFI    Non-Maskable FIQ (read-only)
	 * Bit 26: 0       reserved
	 * Bit 25: EE      Exception Endianness (0 - little-endian)
	 * Bit 24: VE      Interrupt Vectors Enable (0 - use vector table)
	 * Bit 23: 1       reserved
	 * Bit 22: 1/U     (alignment model)
	 * Bit 21: FI      Fast interrupts (probably read-only)
	 * Bit 20: UWXN    (Virtualization extension)
	 * Bit 19: WXN     (Virtualization extension)
	 * Bit 18: 1       reserved
	 * Bit 17: HA      Hardware access flag enable (0 - enable)
	 * Bit 16: 1       reserved
	 * Bit 15: 0       reserved
	 * Bit 14: RR      Round Robin select (0 - normal replacement strategy)
	 * Bit 13: V       Vectors bit (0 - remapped base address)
	 * Bit 12: I       Instruction cache enable (1 - enable)
	 * Bit 11: Z       Branch prediction enable (1 - enable)
	 * Bit 10: SW      SWP/SWPB enable (maybe RAZ/WI)
	 * Bit 09: 0       reserved
	 * Bit 08: 0       reserved
	 * Bit 07: 0       endian support / RAZ/SBZP
	 * Bit 06: 1       reserved
	 * Bit 05: CP15BEN DMB/DSB/ISB enable (1 - enable)
	 * Bit 04: 1       reserved
	 * Bit 03: 1       reserved
	 * Bit 02: C       Cache enable (1 - data and unified caches enabled)
	 * Bit 01: A       Alignment check enable (1 - fault when unaligned)
	 * Bit 00: M       MMU enable (1 - enable)
	 */
	
	// enable MMU, caches and branch prediction in SCTLR
	uint32_t mode;
	asm volatile ("mrc p15, 0, %0, c1, c0, 0" : "=r" (mode));
	// mask: 0b0111 0011 0000 0010 0111 1000 0010 0111
	// bits: 0b0010 0000 0000 0000 0001 1000 0010 0111
#ifdef CACHED_TLB
	mode &= 0x73027827;
	mode |= 0x20001827;
#else
	// no caches
	mode &= 0x73027827;
	mode |= 0x20000023;
#endif
	asm volatile ("mcr p15, 0, %0, c1, c0, 0" :: "r" (mode) : "memory");

	// instruction cache makes delay way faster, slow panic down
#ifdef CACHED_TLB
	panic_delay = 0x2000000;
#endif
    }

/**********************************************************************
 * SMP                                                                *
 **********************************************************************/
namespace SMP {
    // Setup SMP (Boot Offset = $4000008C + ($10 * Core), Core = 1..3)
    enum {
	CORE_BASE = 0x4000008C,

	Core1Boot = 0x10, // Core 1 Boot Offset
	Core2Boot = 0x20, // Core 2 Boot Offset
	Core3Boot = 0x30, // Core 3 Boot Offset
    };

    typedef void (*fn)(void);

#define CORE_REG(x) ((volatile fn *)(CORE_BASE + (x)))

    void core_wakeup(void) {
	puts("core is up\n");
	MMU::init();
	puts("core is virtual\n");
	while(true) { }
    }
    
    void init() {
	puts("starting core 1\n");
	blink(panic_delay * 0x10);
	*CORE_REG(Core1Boot) = core_wakeup;
	blink(panic_delay * 0x10);
	puts("started core 1\n");
	blink(panic_delay * 0x10);

    	puts("starting core 2\n");
	blink(panic_delay * 0x10);
	*CORE_REG(Core2Boot) = core_wakeup;
	blink(panic_delay * 0x10);
	puts("started core 2\n");
	blink(panic_delay * 0x10);

    	puts("starting core 3\n");
	blink(panic_delay * 0x10);
	*CORE_REG(Core3Boot) = core_wakeup;
	blink(panic_delay * 0x10);
	puts("started core 3\n");
	blink(panic_delay * 0x10);
}

}

void kernel_main(uint32_t r0, uint32_t model_id, void *atags) {
    UNUSED(r0);
    UNUSED(model_id);
    UNUSED(atags);
    
    LED::init();
    for(int i = 0; i < 3; ++i) {
	blink(0x100000);
    }

    UART::init();
    puts("\nHello\n");
    delay(0x100000);

    MMU::init_page_table();
    MMU::init();
    SMP::init();
    puts("\ndone\n");
    panic();
}
As you can see I first initialize the page tables with the caches still turned of to avoid problems with the cache snooping. Then I switch the MMU and caching on on core0 and one after the other on the other cores. Here is the output I get:

Code: Select all

Hello
starting core 1
core is up
core is virtual
started core 1
starting core 2
started core 2
starting core 3
core is up
core is virtual
started core 3
done
So what happened to core2 there? Why isn't it printing anything? Note: If I don't enable the MMU in /core_wakeup()/ then all cores print their texts.

mrvn
Posts: 58
Joined: Wed Jan 09, 2013 6:50 pm

Solved: MMU problems with multi-core

Mon Feb 23, 2015 8:06 pm

Ok, verry stupid mistake here. I didn't set any stack for the other cores. So as soon as the code got complex enough to need stack core 2 crashes.

I've added a little asm stub to set up a stack and then call the C code now and that works. I've also added

Code: Select all

	int id = get_mpidr() & 3;
	while(true) { count[id]++; }
to run on each core and print count[1..3] from core 0 now. That gives me:

Code: Select all

Hello
starting core 1
core is up: MPIDR = 0x80000F01
core is virtual
started core 1
starting core 2
core is up: MPIDR = 0x80000F02
core is virtual
started core 2
starting core 3
core is up: MPIDR = 0x80000F03
core is virtual
started core 3
counts = 0x64424C16 0x3DDC4110 0x18F1777A
counts = 0x650DC696 0x3EA7BD58 0x19BCF21D
counts = 0x65D94226 0x3F733725 0x1A886DD0
counts = 0x66A4BD74 0x403EB2B2 0x1B53E86D
counts = 0x67703B42 0x410A2B95 0x1C1F628E
counts = 0x683BB707 0x41D5A7F1 0x1CEADAF9
counts = 0x69073487 0x42A12288 0x1DB6555B
counts = 0x69D2B05C 0x436C9FAA 0x1E81CDBD
counts = 0x6A9E2A32 0x44381CA0 0x1F4D47D1
counts = 0x6B69A5DD 0x450399F1 0x2018C0CC
[code]
Next step, Framebuffer. Jippey.

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Trying Bare Metal on Raspberry Pi 2

Tue Feb 24, 2015 8:22 am

DexOS wrote:Great work krom, i am looking forward to my raspberry pi 2 coming, so i can test your demos.
Hi DexOS, this is wonderful news, I am so glad you are getting a Raspberry Pi 2 =D
I'll try to get some more cool stuff done in time, for when it arrives for you!

rst
Posts: 404
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: Solved: MMU problems with multi-core

Tue Feb 24, 2015 10:50 am

mrvn wrote:Turns out it is realy easy to start cores.
Yes, it's not difficult. I think it will become more challenging when it comes to interrupts and synchronizing the cores.
Then I switch the MMU and caching on on core0 and one after the other on the other cores.
So the MMU is also running on multi-core. Well done!

Return to “Bare metal, Assembly language”