dpotop
Posts: 78
Joined: Mon Nov 24, 2014 2:14 pm

RPi3 - start cores and spinlock sync code needed

Mon Feb 04, 2019 8:10 pm

Hello everybody,

After writing a small OS on the RPi1 (open-sourced here https://gforge.inria.fr/projects/rpi653), I'm now trying to make it work on the RPi3 multi-core.

I start small, to understand the functioning of the multi-core. I am using this tutorial to make things work https://github.com/bztsrc/raspi3-tutorial .

Upto now I managed to start some code and print on UART.
I've been trying to run all cores and synchronize them,
but not only synchronization does not work, I'm not even
sure that all cores start execution.

Here is my assembly file:

Code: Select all

.global _start
.global _sync
	
_start:
	// read 2-bit #cpuid into x0
	mrs     x0, mpidr_el1
	and     x0, x1, #3

	// set stack before our code at address
	// 0x8000-256*#cpuid
	mov	 x3, #256
	mul	 x1, x0, x3
	ldr     x2, =_start
	sub	x2, x2, x1
	mov     sp, x2

	// clear bss, if on core #1
	sub	x5, x0, 1
	cbnz	x5, 4f
	ldr     x1, =__bss_start
	ldr     w2, =__bss_size
3:  	cbz     w2, 4f
	str     xzr, [x1], #8
	sub     w2, w2, #1
	cbnz    w2, 3b

	// jump to C code, should not return
4:  	bl      main
	// for failsafe, halt this core too
5:	wfe
	b       5b

// global barrier semaphore, initially 1 to be outside
// .bss
_sync:	.dword 1
and here is my C code, which prints nothing anymore (it did print
something when all work was done on CPU0):

Code: Select all

#include "uart.h"

extern volatile int _sync ;

void main(int i)
{
  switch(i) {
  case 1:
    // set up serial console
    uart_init();
    uart_puts("Hello World(CPU1)!\n");
    _sync = 0 ;

    for(;;) ;
    
  case 0:
    for(;_sync==1;) ;
    uart_puts("Hello World(CPU0)!\n");
    for(;;) ;
  default:
    for(;;) ;
  }
}
Can someone help me understand what I do wrong?
First, why this code does not print anything.
Then, how should I perform synchronization (I could not find
spinlock code).

Best regards,
Dpotop
dpotop

User avatar
Paeryn
Posts: 2704
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: RPi3 - start cores and spinlock sync code needed

Mon Feb 04, 2019 11:06 pm

dpotop wrote:
Mon Feb 04, 2019 8:10 pm

Code: Select all

_start:
	// read 2-bit #cpuid into x0
	mrs     x0, mpidr_el1
	and     x0, x1, #3
Can someone help me understand what I do wrong?
First, why this code does not print anything.
Then, how should I perform synchronization (I could not find
spinlock code).

Best regards,
Dpotop
The second instruction is wrong. You started by copying mpidr_el1 into x0 and then immediately overwrite x0 with x1 & 3, x0 no longer contains the core number. I assume you meant

Code: Select all

    and x0, x0, #3
She who travels light — forgot something.

LdB
Posts: 1283
Joined: Wed Dec 07, 2016 2:29 pm

Re: RPi3 - start cores and spinlock sync code needed

Tue Feb 05, 2019 2:50 am

Aside from your programming error .... cores1,2,3 are parked already in a spinloop sleeping and reading a memory mailbox.
That is done by the Pi startup stub which ran before you ever got core0 the only core you ever see straight up
https://github.com/raspberrypi/tools/bl ... armstub8.S

I don't think bzt's tutorial ever touches on that detail from memory it is more setup for QEMU where it sleeps the other
cores which you would never see on the real hardware. If you look at the stub you also get all the cores in EL2 mode not
EL3 like QEMU would do.

You need to write an address for the core to be sent and fire a sev instruction to wake them.
In your case the address will be 0x80000 if you want them to enter like core 0

Core 1 jump address mailbox is at 0xe0
Core 2 jump address mailbox is at 0xe8
Core 3 jump address mailbox is at 0xf0

There is actually a hardware mailbox you can use once you have setup the MMU it is described in QA7
https://www.raspberrypi.org/documentati ... rev3.4.pdf

So I basically do the same thing bring the other cores thru, setup them up and send them into the scheduler.

Also be aware unless bzt has fixed his mmu code you won't be able to do primitive synchronization between cores the ldrex instruction will lock up (he concentrated on the virtualization not the caching and he only tested the virtualization).
I patched it at one point but I know he has done some more later changes
https://github.com/LdB-ECM/Raspberry-Pi ... tualmemory
You will see the test code does a full semaphore suspend from core 0 of the other 3 cores for 10 seconds to check it
https://github.com/LdB-ECM/Raspberry-Pi ... ory/main.c
So basically what I am saying is makes sure you check your sync primitives with your MMU code.

dpotop
Posts: 78
Joined: Mon Nov 24, 2014 2:14 pm

Re: RPi3 - start cores and spinlock sync code needed

Tue Feb 05, 2019 1:50 pm

Thanks a lot to both of you.
Dpotop
dpotop

dpotop
Posts: 78
Joined: Mon Nov 24, 2014 2:14 pm

Re: RPi3 - start cores and spinlock sync code needed

Fri Feb 08, 2019 9:04 am

Hello LdB,

Thanks again for your reply and for the link to your source code.
I'm now trying to understand it, and I have a question related to
memory mapping and mailboxes, which I don't fully understand.

The code that I don't understand is in your file:
https://github.com/LdB-ECM/Raspberry-Pi ... tStart64.S

There, I think I understand the SecondarySpin code - each of
the cores 1, 2, and 3 waits on its mailbox 3 read/clear register
for the address of the function it has to branch to. The adresses
of these mailbox registers can be found in page 15 of:
https://www.raspberrypi.org/documentati ... rev3.4.pdf
This code is paired with that of CoreExecute, which uses the
corresponding mailbox 3 write registers.

But I do not understand the writing protocol
Core 1 jump address mailbox is at 0xe0
Core 2 jump address mailbox is at 0xe8
Core 3 jump address mailbox is at 0xf0
These adresses seem to overlap with the Core 2 read/clear registers 2 and 3
of core 2 and with the read/clear register 0 of core 3.

I am assuming here the memory map of the Raspberry Pi 1, where the
RAM was once at address 0, then (L2 cache coherent) at 0x80000000,
then (uncached) at 0xC0000000. I took this from
https://www.raspberrypi.org/app/uploads ... herals.pdf

I also have a second question concerning the boot code: I saw you
don't initialize the vector table for cores 1, 2, and 3. In this small
application I guess it's OK, but what happens in general? Should one
create 1 vector table per core, or one vector table for all cores, with
test code to choose?

BTW: beautiful assembly code!

Best regards,
Dpotop
dpotop

LdB
Posts: 1283
Joined: Wed Dec 07, 2016 2:29 pm

Re: RPi3 - start cores and spinlock sync code needed

Fri Feb 08, 2019 2:17 pm

dpotop wrote:
Fri Feb 08, 2019 9:04 am
But I do not understand the writing protocol
Core 1 jump address mailbox is at 0xe0
Core 2 jump address mailbox is at 0xe8
Core 3 jump address mailbox is at 0xf0
Okay the cores in 64bit mode don't get parked on the physical mailbox .. I have no idea why only the Pi devs could answer that.

Look at the bootstub
https://github.com/raspberrypi/tools/bl ... armstub8.S
follow what it does

Code: Select all

/*  this reads the cpu core id and turns to a number 0-3 in x6 */1
        mrs x6, MPIDR_EL1
	and x6, x6, #0x3
	cbz x6, primary_cpu

/* this loads the address 0xd8 which is core0 mailbox into x5 */
	adr x5, spin_cpu0
secondary_spin:

/* core sent to sleep */
	wfe

/* when core  wakes it loads the value of x5 + 8*x6 */
/* So each core reads a different memory 0xd8, 0xe0, 0xe8, 0xf0 */
	ldr x4, [x5, x6, lsl #3]

/* if the value read is zero core loops back to sleep */
	cbz x4, secondary_spin

/* x0 is set to zero and it jump down to boot kernel */
	mov x0, #0
        b boot_kernel
primary_cpu:
	ldr w4, kernel_entry32
	ldr w0, dtb_ptr32

boot_kernel:
/* the core will zero these registers and then branch to the address read */
	mov x1, #0
	mov x2, #0
	mov x3, #0
        br x4
Those addresses are just a normal memory address .. I have no idea why they selected there.
Okay so core1,core2, core3 are basically asleep waiting for and address to be written and the woken up

So to send core1 to an address here is the code ... it is that simple

Code: Select all

	mov x1, #0xe0			// Spin core1 jump address
	ldr x2, =address_you_want	// Address where you want core to go
	str x2, [x1]			// Store the address to core1 memory address it reads
        sev                              // Now wake the core up
use the other core addresses to make the other core jump
dpotop wrote:
Fri Feb 08, 2019 9:04 am
I also have a second question concerning the boot code: I saw you
don't initialize the vector table for cores 1, 2, and 3. In this small
application I guess it's OK, but what happens in general? Should one
create 1 vector table per core, or one vector table for all cores, with
test code to choose?
Correct it was a small app I didn't bother.

Remember there isn't just one vector table there is 1 table for each EL level on the core
So EL0,EL1,EL2,EL3 ... 4 tables x 4 cores ... so up to 16 tables in total are possibly needed

This sample does timer on core3 on EL1 .. I simply share the same table on EL1 for each core.
https://github.com/LdB-ECM/Raspberry-Pi ... 3Interrupt

You set the vector table address via the VBAR register for each EL ... so EL1 vector table set is something like this

Code: Select all

ldr x0, =VectorTable						
msr vbar_el1,x0
All VBAR registers start as 0 (so all overlapping) but obviously you could have up to 16 different tables as needed.

dpotop
Posts: 78
Joined: Mon Nov 24, 2014 2:14 pm

Re: RPi3 - start cores and spinlock sync code needed

Fri Feb 08, 2019 10:39 pm

Hello LdB,

Thanks again.
Starting from your code I managed to make critical sections work,
after starting the caches.

Do you have some reference for the synchronization code (the
semaphores)? The semantics of the non-blocking "dec" looks
a bit weird. I spent some time understanting it before I
was able to make it work.

Also, why did you say that synchronization interferes with memory
allocation ? If everything is cached, it should be OK, or is it
something I don't yet see ?

Best,
Dpotop
dpotop

LdB
Posts: 1283
Joined: Wed Dec 07, 2016 2:29 pm

Re: RPi3 - start cores and spinlock sync code needed

Sat Feb 09, 2019 5:04 am

dpotop wrote:
Fri Feb 08, 2019 10:39 pm
Also, why did you say that synchronization interferes with memory
allocation ? If everything is cached, it should be OK, or is it
something I don't yet see ?
The L1 internal state machine locks if you don't adhere to the restrictions and it just punches a data abort exception .
http://infocenter.arm.com/help/index.js ... IBJGE.html

There is a white paper from ARM or 32bit primitives but not 64bit I believe all ARM has for 64 bit are these :-)
https://static.docs.arm.com/100934/0100 ... 100_en.pdf
https://static.docs.arm.com/ddi0487/da/ ... v8_arm.pdf section B2

This worth a read
https://www.slideshare.net/vh21/pre-kno ... oncurrency

Other than that for the linux standard primitives read the code
https://elixir.bootlin.com/linux/v4.9/s ... spinlock.h

dpotop
Posts: 78
Joined: Mon Nov 24, 2014 2:14 pm

Re: RPi3 - start cores and spinlock sync code needed

Sat Feb 09, 2019 1:41 pm

I have no idea why only the Pi devs could answer that.
Be cause the professionals live on it, whereas Pi developers are
hobbyists and can afford to be open. :) And the Pi is the only
platform where such a community developed.

Too bad the information is not even more easily accessible.
I wonder how complicated it would be to write a book about it
(are there IP issues)...

Best regards,
Dumitru

PS: I am now writing a simple bootloader to avoid using the SD
card, and I wonder why the boot address changed from 0x8000 to
0x80000 when moving from Pi1 ro Pi3. Why on Earth do they
leave 512kB before the load address ?
dpotop

bzt
Posts: 393
Joined: Sat Oct 14, 2017 9:57 pm

Re: RPi3 - start cores and spinlock sync code needed

Sat Feb 09, 2019 2:27 pm

dpotop wrote:
Sat Feb 09, 2019 1:41 pm
PS: I am now writing a simple bootloader to avoid using the SD
card
Probably already written :-) Take a look at raspbootin and my 64 bit rewrite for booting over serial. Or you can boot from USB stick as well.
and I wonder why the boot address changed from 0x8000 to
0x80000 when moving from Pi1 ro Pi3. Why on Earth do they
leave 512kB before the load address ?
That's not accurate. The load address changes if you move from AArch32 to AArch64 (Pi3 is able to load AArch32 kernels). Before the boot loader, start.elf places ATAGS. I suppose they though it'd be better to have more space for them on 64 bit. How knows? According to https://www.raspberrypi.org/forums/viewtopic.php?t=6685 the load address was choosen because that's the typical address for Linux kernel (which implies that 0x80000 is the typical for 64 bit kernels). I'm not sure, but tbh, it doesn't really matter. Use the given address and that's all :-)

Cheers,
bzt

LdB
Posts: 1283
Joined: Wed Dec 07, 2016 2:29 pm

Re: RPi3 - start cores and spinlock sync code needed

Sun Feb 10, 2019 2:50 am

Why it is where it is is given by linux aarch64 specification the decompressed kernel is expected to fill the space.

https://github.com/torvalds/linux/blob/ ... ooting.txt

dpotop
Posts: 78
Joined: Mon Nov 24, 2014 2:14 pm

Re: RPi3 - start cores and spinlock sync code needed

Fri Feb 15, 2019 5:06 pm

Dear LdB and bzt,

Thanks a lot, your insight into Linux booting is quite useful!

If the de-compressed Linux kernel can go between 0x8000 and 0x80000,
it also means I can load my code there, isn't it? I'll probably give it a try.

Best regards,
Dpotop
dpotop

Return to “Bare metal, Assembly language”