So for bare metal one have to determine the "system load" and the SoC temperature and to switch the CPU clock as wanted via the mailbox. Thanks for info!mimi123 wrote:Linux just uses the mailbox interface
So for bare metal one have to determine the "system load" and the SoC temperature and to switch the CPU clock as wanted via the mailbox. Thanks for info!mimi123 wrote:Linux just uses the mailbox interface
The SoC temp check is done in the start.elf change frequency code.rst wrote:So for bare metal one have to determine the "system load" and the SoC temperature and to switch the CPU clock as wanted via the mailbox. Thanks for info!mimi123 wrote:Linux just uses the mailbox interface
OK. Thanks!mimi123 wrote:The SoC temp check is done in the start.elf change frequency code.
Get MMU support working. If you want, you can try running that code on VC4_VPU. (it has a 16-way SIMD unit, is dual-core at 250MHz), for better perf on older Piskrom wrote:I have made my 1st NEON optimized Fractal demos (Raspberry Pi 2 Only):
https://github.com/PeterLemon/Raspberry ... ON/Fractal
These are much faster than the original non optimized VFP fractal demos on the Raspberry Pi 2 =D
I am gonna try to get Multi Core (SMP) stuff working now, to get a 400% speed increase in these same demos.
Also I have still not tried the MMU setup code by rst yet, but I'll update you guys when I do.
Cheers mimi123, yep I have made an LED blink using the VC4 & assembling my own bootcode.bin file, but I have never setup the Frame Buffer using that VC4 CPU...mimi123 wrote:Get MMU support working. If you want, you can try running that code on VC4_VPU. (it has a 16-way SIMD unit, is dual-core at 250MHz), for better perf on older Pis
You can run VC4 code from Linux. (github.com/freeblob/samples). You can see that it only uses the mailbox interfacekrom wrote:Cheers mimi123, yep I have made an LED blink using the VC4 & assembling my own bootcode.bin file, but I have never setup the Frame Buffer using that VC4 CPU...mimi123 wrote:Get MMU support working. If you want, you can try running that code on VC4_VPU. (it has a 16-way SIMD unit, is dual-core at 250MHz), for better perf on older Pis
Is it possible to still use MailBox Property Interface to set up the Frame Buffer, using just VC4 code in a bootcode.bin?
Or is that whole interface only available from the ARM side, after booting from bootcode.bin & start.elf...
Do you know a way to setup the screen Frame Buffer using low level GPU commands in VC4 mode, if so please share the code as I would love it =D
I think you have a bug in there. You are loading GPFSEL1 into R1, then you mask out all the bits that aren't for the LED and or one bit for the LED function selector. This a) resets all the functions for the other pins and b) does not reset the top two function bits for the LED. I noticed this because it breaks the UART. You have to negate your mask.krom wrote:Here is the full working source:Code: Select all
format binary as 'img' PERIPHERAL_BASE = $3F000000 ; Raspberry Pi 2 Peripheral Base Address GPBASE = $200000 ; $3F200000 GPFSEL1 = $4 ; $3F200004 GPSET1 = $20 ; $3F200020 GPCLR1 = $2C ; $3F20002C org $8000 mov r0,PERIPHERAL_BASE orr r0,GPBASE ; R0 = GPBASE ldr r1,[r0,GPFSEL1] ; R1 = GPFSEL1 mov r2,7 and r1,r2,lsl 18 ; &= 7 << 18 mov r2,1 orr r1,r2,lsl 18 ; |= 1 << 18 str r1,[r0,GPFSEL1] mov r2,r2,lsl 15 ; 1 << 15 Loop: str r2,[r0,GPSET1] mov r1,$100000 WaitA: subs r1,1 bne WaitA str r2,[r0,GPCLR1] mov r1,$100000 WaitB: subs r1,1 bne WaitB b Loop
Code: Select all
mov r0,#PERIPHERAL_BASE
orr r0,#GPBASE // R0 = GPBASE
ldr r1,[r0,#GPFSEL1] // R1 = GPFSEL1
mov r2,#~(7 << 18)
and r1,r2 // &= ~(7 << 18)
mov r2,#(1 << 18)
orr r1,r2 // |= 1 << 18
str r1,[r0,#GPFSEL1]
Code: Select all
; Return CPU ID (0..3) Of The CPU Executed On
mrc p15,0,r0,c0,c0,5 ; R0 = Multiprocessor Affinity Register (MPIDR)
and r0,3 ; R0 = CPU ID (Bits 0..1)
Code: Select all
; Return Value In CLUSTERID Configuration Pin
mrc p15,0,r0,c0,c0,5 ; R0 = Multiprocessor Affinity Register (MPIDR)
lsr r0,8 ; R0 = Cluster ID (Bits 8..11)
and r0,$F
Hi Krom,krom wrote:Hi guys, here is an update on my Raspberry Pi 2 work:
I have optimized all my fractal demos for the Raspberry Pi & Raspberry Pi 2, including the NEON demos:
https://github.com/PeterLemon/Raspberry ... FP/Fractal
https://github.com/PeterLemon/Raspberry ... ON/Fractal
The NEON demos are really fast now compared to the equivalent scalar VFP demos on the Raspberry Pi 2.
I am very happy with the performance on a single ARM Cortex A7 core =D
Heh, I just tried the NEON Mandelbrot Fractal again, and it was a black screen like you said, I then did a few more power ups & it does work sometimes!Julien_Nantes wrote:The VPF Mandelbrot fractal works very well, for instance. The NEON one just gives me a black screen.
This is a known bug in all my DMA demos running on the Raspberry Pi 2, where only the 1st DMA ever gets written, and all subsequent ones do not workJulien_Nantes wrote:And the simple "Hello World" only displays a white "H"...
Hi rst, cheers for all the links you provided, tis a great help =Drst wrote:I think that's a real challenge.
It is worth noting that WindowsRT does only use Thumb2 instructions and you can't use ARM instructions at all.( the CPU is switched to Thumb2 mode invariably after each context switch). Try ARM, and get your program crashk2tom wrote:For what it's worth.. I just got my RPi2 ("6.28") Tuesday.. and finally got to play with it today (Thursday). Here's what I did:
(I've been working on "bare metal" projects (mostly of the RYO OS flavor) on the RPi (A, A+ and B) for a while.)
I copied bootcode.bin and start.elf from the raspbian download to the uSD. My config.txt is literally a one-liner: "kernel=mykernel.img"
I have a define (BASE) in my rpi.h file. It was "# define BASE 0x20000000", now it's conditionally defined to 0x20000000 (for the RPi) or 0x3f000000 (for the RPi2).
My bootloader (mykernel.img) sets up the interrupt controller (IC), GPIO (for the UART pins), UART (with TX and RX interrupts), TIMER (set to 1ms, with interrupts) and SYSTIMER (with an interrupt for C1). I just simply recompiled it (with the RPi2 definition).. copied it to the uSD card.. and it JUST WORKS (interrupts and all)!
The bootloader (among other things) supports xmodem, so I'm able to simply download a compressed copy of my hobby OS (which has a lot more "stuff," but from a hardware and interrupts standpoint, it's only slightly more sophisticated than the bootloader).. uncompress it.. and jump to it. And it JUST WORKS!
(Obviously, this is a very simple configuration, but still..) I don't have separate projects (makefiles) for the RPi vs the Pi2.. When I need to "switch horses," I just do a "make clean" and then a make with one of RPi or RPi2 defined. The same uSD works on all the RPi(s) AND the RPi2, I just need to change the kernel= line in config.txt.
Regards,
Tom
p.s.: Having "progressed" through several ARM cores: ARM7TDMI, ARM926EJ-S, ARM1176JZF-S and Cortex-M3 (CM3), I was really nervous that I was going to be in deep, deep trouble with the Cortex-A7 (CA7). I was expecting the worse. If you've ever worked with the CM3, you know that it DOESN'T support the ARM instruction set.. only the Thumb2 instruction set. And the CM3 integrates the interrupt controller (NVIC) and OS timer (SysTimer) (and a lot more!) into the core "proper." So I was worried that the CA7 core would be even worse. To my CONSIDERABLE surprise (shock even), that's not the case at all. The regular ARM instruction set is supported (and unless I have a compelling reason to do otherwise, I usually compile with GNUARM's (and/or YAGARTO's) defaults, which in my case means the ARM7TDMI (core) (the ARMv4T architecture)), and all the external-to-the-core peripherals (IC, TIMER, SYSTIMER, ...) are the same as they are on the RPi (except for the BASE).
In fact your NEON Mandelbrot demo is as fast that the fractal picture is there when my display comes up. It seems you need a new task for multi-core.krom wrote:The NEON demos are really fast now compared to the equivalent scalar VFP demos on the Raspberry Pi 2.
There is a slightly different state they are in:There are 3 possible states that the Raspberry Pi 2's extra CPU cores might be in,
when booted up into a bare-metal state from the official firmware bootcode.bin & start.elf files:
A) The extra CPU Cores are all Powered ON and are all booting from the same start offset of code (0x8000).
B) The extra CPU Cores are all Powered ON but are in a WFI (Wait For Interrupt) state to wake them up, & boot from a specified Jump Address.
C) The extra CPU Cores are all Powered OFF, and need powering on to even start from a state like A or B.
SEV is not needed in in this case. Maybe it can be useful later.I also think that to wake any CPU cores up, we need to use the ARM instruction SEV (Send EVent), causing an event to be signalled to all processors in the multiprocessor system.
I did this too. I have let the cores 1-3 write the contents of its affinity register to a fixed memory location and dumped it to the screen with core 0. They did it as expected.As a 1st test, I am not concerned atm about any scheduling, I just want the 4 cores todo some work & infinite loop when they have finished their respective code blocks.
Thank you. I can give this back to you and all others who gave information on the Raspberry Pi (1 and 2) here before and after. I think we have to work together to understand this great machine and to get it running on bare metal.Hi rst, cheers for all the links you provided, tis a great help =D
Code: Select all
/**********************************************************************
* MMU *
**********************************************************************/
namespace MMU {
#define CACHED_TLB
//#undef CACHED_TLB
static volatile __attribute__ ((aligned (0x4000))) uint32_t page_table[4096];
static volatile __attribute__ ((aligned (0x400))) uint32_t leaf_table[256];
struct page {
uint8_t data[4096];
};
extern "C" {
extern page _mem_start[];
extern page _mem_end[];
}
void init_page_table() {
uint32_t base;
// initialize page_table
// 1024MB - 16MB of kernel memory (some belongs to the VC)
for (base = 0; base < 1024 - 16; base++) {
// section descriptor (1 MB)
#ifdef CACHED_TLB
// outer and inner write back, write allocate, not shareable (fast
// but unsafe)
page_table[base] = base << 20 | 0x0140E;
// outer and inner write back, write allocate, shareable (fast but
// unsafe)
//page_table[base] = base << 20 | 0x1140E;
#else
// outer and inner write through, no write allocate, shareable
// (safe but slower)
page_table[base] = base << 20 | 0x1040A;
#endif
}
// unused up to 0x3F000000
for (; base < 1024 - 16; base++) {
page_table[base] = 0;
}
// 16 MB peripherals at 0x3F000000
for (; base < 1024; base++) {
// shared device, never execute
page_table[base] = base << 20 | 0x10416;
}
// 1 MB mailboxes
// shared device, never execute
page_table[base] = base << 20 | 0x10416;
++base;
// unused up to 0x7FFFFFFF
for (; base < 2048; base++) {
page_table[base] = 0;
}
// one second level page tabel (leaf table) at 0x80000000
page_table[base++] = (intptr_t)leaf_table | 0x1;
// 2047MB unused (rest of address space)
for (; base < 4096; base++) {
page_table[base] = 0;
}
// initialize leaf_table
for (base = 0; base < 256; base++) {
leaf_table[base] = 0;
}
}
void init() {
// set SMP bit in ACTLR
uint32_t auxctrl;
asm volatile ("mrc p15, 0, %0, c1, c0, 1" : "=r" (auxctrl));
auxctrl |= 1 << 6;
asm volatile ("mcr p15, 0, %0, c1, c0, 1" :: "r" (auxctrl));
// setup domains (CP15 c3)
// Write Domain Access Control Register
// use access permissions from TLB entry
asm volatile ("mcr p15, 0, %0, c3, c0, 0" :: "r" (0x55555555));
// set domain 0 to client
asm volatile ("mcr p15, 0, %0, c3, c0, 0" :: "r" (1));
// always use TTBR0
asm volatile ("mcr p15, 0, %0, c2, c0, 2" :: "r" (0));
#ifdef CACHED_TLB
// set TTBR0 (page table walk inner and outer write-back,
// write-allocate, cacheable, shareable memory)
asm volatile ("mcr p15, 0, %0, c2, c0, 0"
:: "r" (0b1001010 | (unsigned) &page_table));
// set TTBR0 (page table walk inner and outer write-back,
// write-allocate, cacheable, non-shareable memory)
//asm volatile ("mcr p15, 0, %0, c2, c0, 0"
// :: "r" (0b1101010 | (unsigned) &page_table));
#else
// set TTBR0 (page table walk inner and outer non-cacheable,
// non-shareable memory)
asm volatile ("mcr p15, 0, %0, c2, c0, 0"
:: "r" (0 | (unsigned) &page_table));
#endif
asm volatile ("isb" ::: "memory");
/* SCTLR
* Bit 31: SBZ reserved
* Bit 30: TE Thumb Exception enable (0 - take in ARM state)
* Bit 29: AFE Access flag enable (1 - simplified model)
* Bit 28: TRE TEX remap enable (0 - no TEX remapping)
* Bit 27: NMFI Non-Maskable FIQ (read-only)
* Bit 26: 0 reserved
* Bit 25: EE Exception Endianness (0 - little-endian)
* Bit 24: VE Interrupt Vectors Enable (0 - use vector table)
* Bit 23: 1 reserved
* Bit 22: 1/U (alignment model)
* Bit 21: FI Fast interrupts (probably read-only)
* Bit 20: UWXN (Virtualization extension)
* Bit 19: WXN (Virtualization extension)
* Bit 18: 1 reserved
* Bit 17: HA Hardware access flag enable (0 - enable)
* Bit 16: 1 reserved
* Bit 15: 0 reserved
* Bit 14: RR Round Robin select (0 - normal replacement strategy)
* Bit 13: V Vectors bit (0 - remapped base address)
* Bit 12: I Instruction cache enable (1 - enable)
* Bit 11: Z Branch prediction enable (1 - enable)
* Bit 10: SW SWP/SWPB enable (maybe RAZ/WI)
* Bit 09: 0 reserved
* Bit 08: 0 reserved
* Bit 07: 0 endian support / RAZ/SBZP
* Bit 06: 1 reserved
* Bit 05: CP15BEN DMB/DSB/ISB enable (1 - enable)
* Bit 04: 1 reserved
* Bit 03: 1 reserved
* Bit 02: C Cache enable (1 - data and unified caches enabled)
* Bit 01: A Alignment check enable (1 - fault when unaligned)
* Bit 00: M MMU enable (1 - enable)
*/
// enable MMU, caches and branch prediction in SCTLR
uint32_t mode;
asm volatile ("mrc p15, 0, %0, c1, c0, 0" : "=r" (mode));
// mask: 0b0111 0011 0000 0010 0111 1000 0010 0111
// bits: 0b0010 0000 0000 0000 0001 1000 0010 0111
#ifdef CACHED_TLB
mode &= 0x73027827;
mode |= 0x20001827;
#else
// no caches
mode &= 0x73027827;
mode |= 0x20000023;
#endif
asm volatile ("mcr p15, 0, %0, c1, c0, 0" :: "r" (mode) : "memory");
// instruction cache makes delay way faster, slow panic down
#ifdef CACHED_TLB
panic_delay = 0x2000000;
#endif
}
/**********************************************************************
* SMP *
**********************************************************************/
namespace SMP {
// Setup SMP (Boot Offset = $4000008C + ($10 * Core), Core = 1..3)
enum {
CORE_BASE = 0x4000008C,
Core1Boot = 0x10, // Core 1 Boot Offset
Core2Boot = 0x20, // Core 2 Boot Offset
Core3Boot = 0x30, // Core 3 Boot Offset
};
typedef void (*fn)(void);
#define CORE_REG(x) ((volatile fn *)(CORE_BASE + (x)))
void core_wakeup(void) {
puts("core is up\n");
MMU::init();
puts("core is virtual\n");
while(true) { }
}
void init() {
puts("starting core 1\n");
blink(panic_delay * 0x10);
*CORE_REG(Core1Boot) = core_wakeup;
blink(panic_delay * 0x10);
puts("started core 1\n");
blink(panic_delay * 0x10);
puts("starting core 2\n");
blink(panic_delay * 0x10);
*CORE_REG(Core2Boot) = core_wakeup;
blink(panic_delay * 0x10);
puts("started core 2\n");
blink(panic_delay * 0x10);
puts("starting core 3\n");
blink(panic_delay * 0x10);
*CORE_REG(Core3Boot) = core_wakeup;
blink(panic_delay * 0x10);
puts("started core 3\n");
blink(panic_delay * 0x10);
}
}
void kernel_main(uint32_t r0, uint32_t model_id, void *atags) {
UNUSED(r0);
UNUSED(model_id);
UNUSED(atags);
LED::init();
for(int i = 0; i < 3; ++i) {
blink(0x100000);
}
UART::init();
puts("\nHello\n");
delay(0x100000);
MMU::init_page_table();
MMU::init();
SMP::init();
puts("\ndone\n");
panic();
}
Code: Select all
Hello
starting core 1
core is up
core is virtual
started core 1
starting core 2
started core 2
starting core 3
core is up
core is virtual
started core 3
done
Code: Select all
int id = get_mpidr() & 3;
while(true) { count[id]++; }
Code: Select all
Hello
starting core 1
core is up: MPIDR = 0x80000F01
core is virtual
started core 1
starting core 2
core is up: MPIDR = 0x80000F02
core is virtual
started core 2
starting core 3
core is up: MPIDR = 0x80000F03
core is virtual
started core 3
counts = 0x64424C16 0x3DDC4110 0x18F1777A
counts = 0x650DC696 0x3EA7BD58 0x19BCF21D
counts = 0x65D94226 0x3F733725 0x1A886DD0
counts = 0x66A4BD74 0x403EB2B2 0x1B53E86D
counts = 0x67703B42 0x410A2B95 0x1C1F628E
counts = 0x683BB707 0x41D5A7F1 0x1CEADAF9
counts = 0x69073487 0x42A12288 0x1DB6555B
counts = 0x69D2B05C 0x436C9FAA 0x1E81CDBD
counts = 0x6A9E2A32 0x44381CA0 0x1F4D47D1
counts = 0x6B69A5DD 0x450399F1 0x2018C0CC
[code]
Next step, Framebuffer. Jippey.
Hi DexOS, this is wonderful news, I am so glad you are getting a Raspberry Pi 2 =DDexOS wrote:Great work krom, i am looking forward to my raspberry pi 2 coming, so i can test your demos.
Yes, it's not difficult. I think it will become more challenging when it comes to interrupts and synchronizing the cores.mrvn wrote:Turns out it is realy easy to start cores.
So the MMU is also running on multi-core. Well done!Then I switch the MMU and caching on on core0 and one after the other on the other cores.