phil95
Posts: 141
Joined: Wed Sep 12, 2012 8:10 am
Location: Paris

RPI2 speed

Fri Feb 19, 2016 10:00 am

Hello
I have worked on RPI 1 and I'm testing RPI 2 today.
I wrote a very simple prog to test performance
of my boards: an assembly routine build dynamically
of 1000 NOPs, and that routine is called 1000 times.
Then the test is how long time to execute 1000000 NOPS.
Results are:
RPI1: 9212 microsec (100 NOPs / microsec)
after cache activation (mrc p15,0,r0,c1,c0/orr r0,#0x1800/mcr p15,0,r0,c1,c0)
(set bits I and Z of control register)
RPI1 with cache: 1662 microsec (600 NOPs / microsec)

RPI2: 37529 microsec (30 NOPs / microsec)
I don't know how activate cache.in RPI2

I'm using Jessie SD card and I have replaced kernel.img and
kernel7.img with my generated kernel.img

In these conditions, proc is in SECURE MONITOR mode.

If someone has the good intruction sequence ..
Thank's you

Philippe
Last edited by phil95 on Sat Feb 20, 2016 9:51 am, edited 1 time in total.

dwelch67
Posts: 966
Joined: Sat May 26, 2012 5:32 pm

Re: RPI2 speed

Fri Feb 19, 2016 9:24 pm

an arm nop might be encoded as an add r0,r0,#0 or something like that, string those together and there are pipeline dependencies from one instruction to the next unless the core is smart enough to recognize that r0 is not being modified.

shorter answer you want to eventually mix instructions together that dont have dependencies on each other (the result of one is not needed by the input of the next).

https://github.com/dwelch67/raspberrypi or actually there is a sticky link at the top of the baremetal forum with lots of good websites. a number of them can show you how to enable the instruction cache. data cache is tricky because you need to setup the mmu first. but icache is somewhat trivial.

David

User avatar
rpdom
Posts: 17214
Joined: Sun May 06, 2012 5:17 am
Location: Chelmsford, Essex, UK

Re: RPI2 speed

Sat Feb 20, 2016 7:49 am

dwelch67 wrote:an arm nop might be encoded as an add r0,r0,#0 or something like that, string those together and there are pipeline dependencies from one instruction to the next unless the core is smart enough to recognize that r0 is not being modified.

shorter answer you want to eventually mix instructions together that dont have dependencies on each other (the result of one is not needed by the input of the next).

https://github.com/dwelch67/raspberrypi or actually there is a sticky link at the top of the baremetal forum with lots of good websites. a number of them can show you how to enable the instruction cache. data cache is tricky because you need to setup the mmu first. but icache is somewhat trivial.
The ARM core is a lot more complicated than when I first started using it (which is a good thing). In those days, if we wanted a NOP instruction we used the "Never" condition on any instruction. It gave a one cycle delay while it ignored the instruction completely. Obviously the pipeline and instruction set is a lot more complicated that it used to be :)

phil95
Posts: 141
Joined: Wed Sep 12, 2012 8:10 am
Location: Paris

Re: RPI2 speed

Sat Feb 20, 2016 9:54 am

Many thanks for your answers.
I have tried with:
0xe1a00000 mov r0, r0
0xeaffffff bra $ + 1
0xe2800001 add r0, #1
And the problem is the same.
I think the arm is in SECURE MONITOR mode when starting kernel7.img
and it is not possible to alter some P15 registers.
- what do you think about it ?
- wich SD version soft are you using ?
(I have modified original post in NOP/microsec and not in NOP/sec ...)
There is a gap between RPI1 without cache and RPI2 without cache nor MMU ...
Philippe

rst
Posts: 453
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: RPI2 speed

Sat Feb 20, 2016 11:54 am

Have a look at this posting.

Please also note that the default CPU clock for bare metal on the RPi2 is 600 MHz while it is 700 MHz on the RPi1.

phil95
Posts: 141
Joined: Wed Sep 12, 2012 8:10 am
Location: Paris

Re: RPI2 speed

Wed Feb 24, 2016 3:34 pm

@rst:
Thank's for your answer, I have tried
arm_control=0x1000
but no change in perf;
And the perf difference is not 15 % but more than 600 %

After some investigations, I have found:

1/ Proc is in SECURE MONITOR mode.
I cannot change that mode using msr CPSR ...
That was working in RPI1

2/ I have tested SWI 0 to go at address 8
After 9 sec (!!!) proc is reset and restart.
That was working in RPI1

Philippe

rst
Posts: 453
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: RPI2 speed

Wed Feb 24, 2016 6:38 pm

phil95 wrote:@rst:
Thank's for your answer, I have tried
arm_control=0x1000
but no change in perf
Maybe this (undocumented) configuration setting is not supported any more. Nevertheless the CPU cores 1-3 compete for bus time with core 0 after boot. If single core performance is an issue it is a good idea to "free" cores 1-3 from the initial loop where they are trying to get a start address from local mailbox 3 and put them into sleep using the "wfi" instruction.

phil95
Posts: 141
Joined: Wed Sep 12, 2012 8:10 am
Location: Paris

Re: RPI2 speed

Mon Feb 29, 2016 3:32 pm

Thank's for your answer
After a lot of tests, I have found:

- I'm not in SECURE MONITOR mode but in HYPERVISOR mode
And I have not found solution to leave that mode and switch
to a more conventionnal mode.

- Copy vectors from 0x8000 to 0x0000 is not sufficient;
you need program HVBAR to 0 too; when starting, I find a
value of 0xd33fb2c0 in HVBAR; puting 0 in it after copy is equivalent
to put directly 8000 on it.

- when I execute a SWI 0 instruction, I go in the Swi routine,
but always in HYP mode.

- When I try in the routine to access spsr_HYP:
- for read, I have a trap and leave in HYP mode
- for write, no write occurs

How it is possible to leave that mode ?
Philippe

rst
Posts: 453
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: RPI2 speed

Mon Feb 29, 2016 6:21 pm

phil95 wrote:- I'm not in SECURE MONITOR mode but in HYPERVISOR mode
...
How it is possible to leave that mode ?
See: viewtopic.php?p=863006#p863006

dwelch67
Posts: 966
Joined: Sat May 26, 2012 5:32 pm

Re: RPI2 speed

Mon Feb 29, 2016 6:22 pm

If you figure it out we would love to know. I think the idea is that you dont leave that mode. ARM docs indicate you can go into and out of but digging for the how to get out you only find the how to get in....

I think some folks have been successful in using config.txt to not go into hyp mode in the first place, legacy boot/and or boot from 0x00000000, avoiding the code laid down by the gpu which boots and puts you in hyp mode and then branches to 0x8000.

David

dradford
Posts: 18
Joined: Mon Feb 15, 2016 3:33 pm

Re: RPI2 speed

Wed Mar 02, 2016 3:06 pm

Seems bonkers to be entering the kernel in a non-secure state, with such a complicated path back to secure SVC. I know you can reprogram r14_hyp and spsr_hyp then do an ERET to get out of HYP, but I'm not sure if you're going to end up back in a secure mode. If you don't, the only way I know would be to use SMC from a PL1 to enter MON and transition from there, but that would mean rewriting the vector table temporarily just to catch the SMC, and you can't reprogram secure/monitor VBAR from non-secure mode, so you have to just 'trust' that it's still set to 0. With multiple cores booting it's going to get messy.

It seems simpler for the kernel to switch to HYP on start if it wants to, and just enter in secure SVC like normal. It's not like you can't ask the processor if it supports HYP (via ID_PFR1).

AlfredJingle
Posts: 69
Joined: Thu Mar 03, 2016 10:43 pm

Re: RPI2 speed

Sat Mar 05, 2016 10:29 am

I am just a noob trying to learn ARM-assembly so for what it is worth: this is the code I use for leaving hyp-mode on a pi2. It also works on a pi3 in 32bit mode.

Code: Select all

		mrs r0, cpsr
		eor r0, r0, #0x1A			
		tst r0, #0x1F			@ test for HYP mode
		
		bic r0, r0, #0x1F		@ clear mode bits

		orr r0, r0, #0xC0		@ mask IRQ and FIQ bits
		orr r0, r0, #0x13		@ and set SVC mode
		
		bne 1f				@ branch if not HYP mode
		
		orr r0, r0, #0x100		@ mask Abort bit (why??)
		msr spsr_cxsf, r0
		
		ldv32 r1, 2f			@ ldv32 loads a 32 bit value in a reg 
						@ without using a literal pool
		msr ELR_hyp, r1
		eret
		
	1:	msr cpsr_c, r0
	2:	nop
		nop
		isb				@ now in NON-secure system mode
The advantage is that it only leaves hyp-mode if you actually entered in hyp-mode. It does however not change the security level. So if you enter in non-secure state, you leave in non-secure state. I haven't found a way yet to change to secure mode during boot from a non-secure state. You can only change the security-level from monitor-mode but to get in monitor-mode you need to set a vector to that mode, which can only be done from a secure PL1-level.
As I now use the config.txt-hack to enter in secure sys-mode I stopped searching.

Once you are in sys-mode you can setup the page-translation table and activate the caches, branch prediction and MMU to get full speed. You need to do this for each core separately if you want to use more than one, but they can share one page-translation table. I have tested the running of multiple cores extensively and they do not slow each other down. But the raspberry gets a lot hotter if 4 cores are working at full speed!
going from a 6502 on an Oric-1 to an ARMv8 is quite a big step...

dwelch67
Posts: 966
Joined: Sat May 26, 2012 5:32 pm

Re: RPI2 speed

Sat Mar 05, 2016 2:48 pm

sounds like they are doing the same thing with aarch64. dropping the raspi3 into 32 bit mode which we cant get back out of. rather than leaving it and having the operating system or bootloader do the switch if it feels the need.

AlfredJingle
Posts: 69
Joined: Thu Mar 03, 2016 10:43 pm

Re: RPI2 speed

Sat Mar 05, 2016 5:26 pm

Adding:

Code: Select all

arm_control=0x200
to config.txt makes that the pi3 starts in 64 bit mode.
going from a 6502 on an Oric-1 to an ARMv8 is quite a big step...

dwelch67
Posts: 966
Joined: Sat May 26, 2012 5:32 pm

Re: RPI2 speed

Sat Mar 05, 2016 10:22 pm

yep, not what I was talking about. the non-config.txt default is supposedly 32 bit instead of 64.

AlfredJingle
Posts: 69
Joined: Thu Mar 03, 2016 10:43 pm

Re: RPI2 speed

Sat Mar 05, 2016 11:06 pm

Is there a reason why you do not want to use the config.txt? (my apologies if this is a silly question)
going from a 6502 on an Oric-1 to an ARMv8 is quite a big step...

dradford
Posts: 18
Joined: Mon Feb 15, 2016 3:33 pm

Re: RPI2 speed

Mon Mar 07, 2016 1:20 pm

Apart from anything else, if you have multiple images on a card that you keep switching between, it's a nuisance having to remember to change the magic settings that need to be in your config.txt to get each to work.

But you shouldn't really need to mess with config.txt to get a kernel to work. Really, the default settings should just do the minimum and let the kernel image do the rest. It *is* convenient for cores 1-3 to get 'parked' in a little loop at (say) 0xf0, and it would be nice if the bootloader could pass some information to the kernel to help it choose its startup code (eg. 0..2 for the 3 different core types), and maybe provide the timer freq to set.

Speaking of which, the wait loop for cores 1-3 really hammers the bus, and it doesn't need to. Something like the following should work with far less overhead (and without need to access peripherals) [completely untested!!]:

Code: Select all

...
// core number in r0 (also used later)
0xe8: cmp r0, #0
0xec: beq 0x8000
0xf0: wfe
0xf4: dmb
0xf8: ldr pc, [pc, #-4]
0xfc : .int 0xf0
Then, to wake up the cores, just do:

Code: Select all

// r1 = address to jump the cores to (r0 will contain the core number from earlier)
mov r0, #0xfc
str r1, [r0]
dmb
sev
I believe the bootloader checks the kernel image for a signature that identifies whether it wants dtb instead of ATAG? Could it perhaps add a tag that selects 'linux' mode for the bootloader? Then Linux would get the environment it wants, and bare metal would get the environment it wants, and it would only mean a small change to the existing Linux stuff.

Also, it would be nice if the bootloader leaves 0x4..0x3f as zero, or if core 0 waits for cores 1..3 to park before jumping to 0x8000, since a lot of simple startup code just writes 8xldr pc to 0x00 and 8xaddress to 0x20, which could cause the other cores to crash if they were delayed and haven't parked yet.

dradford
Posts: 18
Joined: Mon Feb 15, 2016 3:33 pm

Re: RPI2 speed

Mon Mar 07, 2016 3:08 pm

Whoops! Those DMBs need to be DSBs.

Return to “Bare metal, Assembly language”