LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Audio output on the Raspberry Pi 3B

Mon May 07, 2018 1:08 am

At the moment I have no idea how to achieve this (I seem to say that a lot) however how could I play specific frequencies out of the raspberry pi 3.5mm audio jack. At the moment my kernel halts all processor cores that aren't core 0 and it has access to the GPIO so how can I start core 1 and use it to play frequencies?

LdB
Posts: 856
Joined: Wed Dec 07, 2016 2:29 pm

Re: Audio output on the Raspberry Pi 3B

Mon May 07, 2018 3:48 am

Obviously rather than halting the core your you provide code which runs producing the sound.

Not sure if you are in C or assembler but your core 0 will be going to code you call kernel.
So make a Kernel1 code block and send core 1 to it exactly as you did core 0.

The assumption here is only core 1 will ever access the sound functions, if you want to share it between cores you will need a resource share lock like a semaphore.

The sound thing isn't complicated it's just a matter of keeping data going to the PWM
Peter lemon has both CPU and DMA versions in assembler
https://github.com/PeterLemon/Raspberry ... /Sound/PWM

The CPU version is the easiest to start playing it essentially a simple data move loop once you have done all the PWM setup

Code: Select all

Loop:
  imm32 r1,SND_Sample ; R1 = Sound Sample
  imm32 r2,SND_SampleEOF ; R2 = End Of Sound Sample
  FIFO_Write:
    ldrb r3,[r1],1 ; Write 1 Byte To FIFO
    str r3,[r0,PWM_FIF1] ; FIFO Address
    FIFO_Wait:
      ldr r3,[r0,PWM_STA]
      tst r3,PWM_FULL1 ; Test Bit 1 FIFO Full
      bne FIFO_Wait
    cmp r1,r2 ; Check End Of Sound Sample
    bne FIFO_Write

  b Loop ; Play Sample Again
The harder part is if you want to read one of the standard sound formats and extract them to the raw bits.

Peter Lemon has his sound sample extracted to a binary which gets included into the img binary file as the label SND_Sample.. if you look at the bottom of his assembler file you will see this

Code: Select all

SND_Sample: ; 
  file 'Sample.bin'
  SND_SampleEOF: 
So he doesn't have a file system or any code that reads and decodes a sound file format :-)

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: Audio output on the Raspberry Pi 3B

Mon May 07, 2018 4:19 am

I am using C so how do I send CPU core 1 to a C function?

LdB
Posts: 856
Joined: Wed Dec 07, 2016 2:29 pm

Re: Audio output on the Raspberry Pi 3B

Mon May 07, 2018 6:11 am

You are overthinking it ... You said "My kernel halts all processor cores that aren't core 0" and you are using C

That means you have a piece of code that does something like this

Code: Select all

mrc p15, 0, r0, c0, c0, 5			    ;@ Read core id on ARM7 & ARM8
ands r0, r0, #0x3				    ;@ Make core 2 bit bitmask in R0
beq  .cpu0_exit			                    ;@ Core 0 jumps out to label cpu0_exit
DeadloopSpin:
wfe                                                 ;@ sleep the cores 1,2,3
b    DeadloopSpin;				    ;@ Just for safety deadloop spin
cpu0_exit:

... <snip lots of setup code for core 0 .. but ends with the jump to C >

bl kernel_main ;@ Finally core 0 will enter your C code with a branch .. name will be what you call it 

So its trivial to change it to only park core 2 & 3 and branch the other two cores.

Code: Select all

mrc p15, 0, r0, c0, c0, 5			    ;@ Read core id on ARM7 & ARM8
ands r0, r0, #0x3			            ;@ Make core 2 bit bitmask in R0
beq  .cpu0_exit			                    ;@ Core 0 jumps out to label cpu0_exit
cmp r0, #01                                         ;@ Check for core1
beq  .cpu1_exit			                    ;@ Core 1 jumps out to label cpu1_exit
DeadloopSpin:
wfe                                                 ;@ sleep only cores 2,3
b    DeadloopSpin;				    ;@ Just for safety deadloop spin
cpu0_exit:

... <snip lots of setup code for core 0 but ends with the jump to C>

bl kernel_main ;@ Finally core 0 will enter your C code with a branch .. name will be what you call it

cpu1_exit:

.... <snip lots of setup code for core 1 but ends with the jump to C>

bl sound_main ;@ Finally core 1 will enter your C sound code with a branch .. name will be what you call it
So the only requirement is you have two C functions that have names that match the jump from your kernel loader. In my case I used the names kernel_main and sound_main but they are whatever you want them to be. The loader simply splits where processors branch based on the core Id.

Now generally you will get away with that but there is a complication with shared memory of the compiler and two CPU's running different C code and to understand that we will discuss the BSS section which I will do as a new post so this doesn't get long.

LdB
Posts: 856
Joined: Wed Dec 07, 2016 2:29 pm

Re: Audio output on the Raspberry Pi 3B

Mon May 07, 2018 7:18 am

So the C code will have a BSS section
https://en.wikipedia.org/wiki/.bss

Now the problem is that section needs to be zeroed but it is shared by your two cores running c code. We can't clear it with all cores because one core may change a value from zero (remember they start as zero but later can change) and a core running late would then clear the change. You also can't have just one core clear it because the other cores may be already started before the core assigned to clear the BSS gets to it.

What is required is the cores to start in a synchronized manner initially and one of the cores clear the BSS and then after that is done it allows the other cores to start. This has absolutely nothing to do multitasking or multiple cores it is a requirement because you are running two cores on the same C code with shared memory sections. If you wanted two get around this you would compile two different C programs and then combine them in the final img file. That way each program for each core would have it's own BSS at different places and that is an option but it also comes with it's own drawbacks.

I like having the single code with C synchronization it's easier to maintain especially if you have memory shares and locks between the two cores.
So that is what is happening here
https://github.com/LdB-ECM/Raspberry-Pi ... /Multicore

Essentially the cores have a two start process, initially they are parked in a secondary spin reading a mailbox. Only core 0 actually starts and clears the BSS and if I was running shared memory sections it would setup the locks on those as well. Only when core 0 has done all the memory setups does it release core 1,2 & 3 to there C functions.

You need to be clear this secondary spin situation has absolutely nothing to do with multitasking it is a startup requirement for running multiple cores on the same C code block with shared memory sections. I have given you the alternative which is to compile the code for each core in isolation and merge them in the img file. The BSS and memory sections for each core would thus be physically separated and each core would take responsibility for clearing its own BSS etc in that case.

Generally I do pretty much what you are doing I have a number of the cores running an O/S (written in C) and one or more core running baremetal or a different O/S. However because I want all the code compiled and merged at one time you have to have this strange startup synchronization.

Later on you will no doubt get into wanting to have core 0 for example read and decode the sound and core 1 play the sound. When you go to exchange the data you will need clear locks like semaphores on which core can access the shared C data and again the initial locks will need to be synchronized before you release the core if you used the shared compilation model.

You know the alternative which is to compile two different programs for each core and merge them only at the img. However you are then exposed to the risk the two codes are compiled different/wrong and incompatible and you will need some memory management which lies inside the O/S or outside both. For example if you look at getting GPIO on the pi under linux with the statement

Code: Select all

mmap(0, 1, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0x20200000);
You think you are playing around with the GPIO directly but you might want to read up what the O/S is actually doing :-)

User avatar
Ultibo
Posts: 135
Joined: Wed Sep 30, 2015 10:29 am
Location: Australia
Contact: Website

Re: Audio output on the Raspberry Pi 3B

Mon May 07, 2018 11:13 am

LizardLad_1 wrote:
Mon May 07, 2018 1:08 am
At the moment I have no idea how to achieve this (I seem to say that a lot) however how could I play specific frequencies out of the raspberry pi 3.5mm audio jack. At the moment my kernel halts all processor cores that aren't core 0 and it has access to the GPIO so how can I start core 1 and use it to play frequencies?
If you are using C (and can at least read C++) then the Circle project has a good example of sound output.

It also shows in a clear (and understandable ;)) way how to startup the secondary cores.
Ultibo.org | Make something amazing
https://ultibo.org

Threads, multi-core, OpenGL, Camera, FAT, NTFS, TCP/IP, USB and more in 3MB with 2 second boot!

LdB
Posts: 856
Joined: Wed Dec 07, 2016 2:29 pm

Re: Audio output on the Raspberry Pi 3B

Mon May 07, 2018 8:03 pm

As someone seems hell bent on causing confusion if you follow circle it does exactly the same as I do because it compiles using the shared C/C++ model (multiple cores are running around the one C/C++ code).. there is simply no choice in this mode.

So lets follow this startup on circle to statisfy that is indeed what happens and it is the same!!!!!!

Core 0 on circle comes thru start.S simply dropping the core out of hyp mode and setting the stack pointers .. see here from start label
https://github.com/rsta2/circle/blob/ma ... /startup.S
At the end of that core0 is taken to sysinit via the branch instruction

Sysinit takes each other core AKA 1,2,3 to a secondary start code after first enableFIQ's, invalidating L1 cache on core0.
https://github.com/rsta2/circle/blob/4b ... ysinit.cpp

So it ends here running the cores 1,2,3 thru into the secondary startup .. note nCore starts at 1

Code: Select all

for (unsigned nCore = 1; nCore < CORES; nCore++)
	{
		write32 (ARM_LOCAL_MAILBOX3_SET0 + 0x10 * nCore, (u32) &_start_secondary);
}
That works because the boot stub parked the cores 1,2,3 reading the mailbox ... So core0 sends core1,2,3 over to a secondary start.

**** Now the important bit .. core0 continues and you will notice what it does next

Code: Select all

// clear BSS
	extern unsigned char __bss_start;
	extern unsigned char _end;
	for (unsigned char *pBSS = &__bss_start; pBSS < &_end; pBSS++)
	{
		*pBSS = 0;
} 
So core0 clears the BSS .. no other core is allowed to touch BSS and core0 then continues on to create initial objects and enter the C code main.

Now lets track what happens to core1,2,3 they end up at _start_secondary which is here
https://github.com/rsta2/circle/blob/ma ... /startup.S
Look at what it does which is to simply set core1,2,3 stack pointers and then jumps to sysinit_secondary

sysinit_secondary is at the bottom of here
https://github.com/rsta2/circle/blob/4b ... ysinit.cpp

You can see what it does enable the FIQ's, invalidates the L1 cache prints a message and then enters a c function called main_secondary which is here
https://github.com/rsta2/circle/blob/d2 ... ticore.cpp

It simply takes cores 1,2,3 into the multicore support object core0 setup

Code: Select all

void main_secondary (void){
	CMultiCoreSupport::EntrySecondary ();
}
All the shared memory between the cores exists in that object and it was setup to an initial state by core0 before core1,2,3 ever got there.

Now hopefully everyone ( :? ) can see the start sequence is EXACTLY THE SAME as is required by the shared C/C++ model.
In the shared model in C/C++ you are required to synchronize the startup of the cores and only one of the cores can clear the BSS and initialize the shared memory.

I am sorry I had to go thru all this again, I shouldn't have to but it is the only way of dealing with the snide comment and suggestion I am doing anything different or special. So you can code the start sequence in C or C++ it doesn't matter the strict start order requirements remain the same as DEMANDED BY THE COMPILER ITSELF as the cores are sharing code and memory blocks.

Follow my code or follow circle it makes no difference only one of the cores (we both selected core0) is allowed to initialize BSS and any shared memory between the cores because that is how it must work and we end up with my original statement in the original post
Only when core 0 has done all the memory setups does it release core 1,2 & 3 to there C functions.

SIDEBAR: I suspect there is actually the possibility of a bug in the circle code. Cores 1,2,3 and sent to there startup code and then the BSS and objects are initialized. If the startup for core 1,2,3 was fast they could actually get to try to go into the CMultiCoreSupport object before core0 had actually set it up. To be picky the BSS and CMultiCoreSupport object should have been cleared and setup before core 1,2,3 are sent to startup. So the core mailbox code should be down a little from where it currently is in circle. It probably works reliably currently because the clearing of the BSS and object is faster than the startup for core1,2,3. However as your code complexity grows BSS will get bigger and core0 will take longer to clear and at some point it may break with core1,2,3 getting to CMultiCoreSupport object first. Again this highlights the importance of understanding the start sequence you are compelled to do in the shared C/C++ model. RST if you are around you might like to confirm or explain that I am wrong and/or missing something.

rst
Posts: 311
Joined: Sat Apr 20, 2013 6:42 pm
Location: Germany

Re: Audio output on the Raspberry Pi 3B

Tue May 08, 2018 8:24 am

@LdB: I guess, you have overseen that #define and the following comment:

https://github.com/rsta2/circle/blob/ma ... t.cpp#L139

This for-loop is only used, if ARM_ALLOW_MULTI_CORE is not (!) defined to put the secondary cores to sleep, which was required with older firmwares, which did not have a "wfe" in its spin loop in the ARM stub. _start_secondary is very short in this case:

https://github.com/rsta2/circle/blob/ma ... tup.S#L105

The real location to start the secondary cores, with ARM_ALLOW_MULTI_CORE defined, is here:

https://github.com/rsta2/circle/blob/ma ... re.cpp#L82

So, I think, there's no bug in Circle, at least not here. ;)

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: Audio output on the Raspberry Pi 3B

Tue May 08, 2018 8:37 am

I am using AARCH64 so the mrc command is invalid. I'm not sure what to do to get around that as I tried

Code: Select all

        mrs     x1, mpidr_el1 //Read core id on Cortex-A53
        ands    x1, x1, #0x3  // Make core 2 bit bitmask in x1
        beq     1f            // Core 0 jumps out to label 1
        cmp     x1, #01       // Check for core 1
        beq     2f
However it doesn't boot and I can't see why. If any of you can enlighten me please do so.

LdB
Posts: 856
Joined: Wed Dec 07, 2016 2:29 pm

Re: Audio output on the Raspberry Pi 3B

Tue May 08, 2018 2:58 pm

Your code is correct .. so let me have a guess :-)

You are using the bootstub (AKA your code is at 0x80000) and the other 3 cores are still parked by the boot stub in the firmware .. did you remember to retrieve them. Without doing that you will only ever see core0 !!!!!

The code for the ARM8 bootstubs is here
https://github.com/raspberrypi/tools/bl ... armstub8.S

If you look they have have normal ram memory assigned as mailbox read addresses for the secondary spin loop
being 0xd8 (core0), 0xe0 (core1), 0xe8 (core2), 0xf0 (core3) as defined in this code

Code: Select all

.org 0xd8
.globl spin_cpu0
spin_cpu0:
	.quad 0
.org 0xe0
.globl spin_cpu1
spin_cpu1:
	.quad 0
.org 0xe8
.globl spin_cpu2
spin_cpu2:
	.quad 0
.org 0xf0
.globl spin_cpu3
spin_cpu3:

The secondary spin loop is here .. the cores are peeled by ID and 1,2,3 will be sleeping where marked only core0 enters the start of your code.

Code: Select all

in_el2:
	mrs x6, MPIDR_EL1
	and x6, x6, #0x3
	cbz x6, primary_cpu    <<==***** NOTE CORE0 jumps over to the label  primary_cpu and ends up at your start
	adr x5, spin_cpu0
secondary_spin:
	wfe    <<==== ***** CORES 1,2,3 will be sleeping here still in the bootstub code
	ldr x4, [x5, x6, lsl #3]
	cbz x4, secondary_spin
	mov x0, #0
b boot_kernel
primary_cpu:
So okay if you want to assign core1 to go somewhere you need to write and address to 0xe0 and then fire a SEV instruction to wake the core up

So if you want core1 to go thru ALL YOUR START CODE then simply use your start label so it would be like this

Code: Select all

	mov x1, #0xe0 	// Core1 RAM mailbox address
	ldr x2, =start			// Make the core enter start (AKA 0x80000) just like core0
	str	x2, [x1]			// Write the address to the mailbox
        sev                                  // Wake the cores up to check there mailbox

You can send core1 to a different start routine it doesn't have to be the same as core0 just make sure to setup the stacks etc up as it is raw just like core0 when it starts.

There is nothing special about the ram address they just chose it to be between the vector table and the old atags data, so somewhere unlikely to clash with anything.

My sample has a 64bit version which is in the same directory as the 32bit version here is the startup loader
https://github.com/LdB-ECM/Raspberry-Pi ... tStart64.S
If you want to build it there is a batch file for windows you can convert easily to bash script if on linux
https://github.com/LdB-ECM/Raspberry-Pi ... Pi3-64.bat

The sequence my code does is

Core0 does all it's basic setup code like stacks, irq's etc
Now look at the code in the block in the section marked as "Core0 will bring Core 1,2,3 to secondary spin"
Core0 then asks each of the other cores 1,2,3 one at a time to come thru the same code by writing the start address and sev instruction
However when those cores come thru I peel them by core ID making sure they have initialized and then reparking them back to the bootstub
That process occurs just before the BSS section is cleared in the section labelled "Now park Core 1,2,3 into secondary spinloop on BCM2837"
Only core0 gets to go past that point clear the BSS and then enter the normal main C code

So I can then simply send the cores 1,2,3 anywhere I want they are all setup and simply require to be told what C code to enter.
If you want to do C++ like circle you can get core0 to make it's multitask object and after that tell cores 1,2,3 to enter the object it would still work in 64bit. Alternatively stay in C and do whatever, we have done the BSS and the C code will be happy. The only thing to remember is if you share memory between the cores core0 must do any initializing on that memory before you let core1,2,3 into the code. In my code that would mean when core0 enters it start C code it would initialize that memory and then sometime after that I would use the mailbox to tell cores1,2,3 to start now and that is why I repark them to give me flexibility to initialize things using core0. I use the same bootloader for a myriad of O/S and straight baremetal code.

As I said my most common setup is core0 running an RTOS and cores1,2,3 running a linux like O/S and both those codes are just standard C code which is dead easy using my bootloader.

StevoD
Posts: 20
Joined: Tue Aug 29, 2017 11:37 am

Re: Audio output on the Raspberry Pi 3B

Wed May 09, 2018 12:06 am

Sorry to the op, my question is off topic.
LdB wrote:
Tue May 08, 2018 2:58 pm
As I said my most common setup is core0 running an RTOS and cores1,2,3 running a linux like O/S and both those codes are just standard C code which is dead easy using my bootloader.
Do you have a demo image for that?
How do you make more than one o/s run at once?
How do you share hardware across many o/s at once?

LdB
Posts: 856
Joined: Wed Dec 07, 2016 2:29 pm

Re: Audio output on the Raspberry Pi 3B

Wed May 09, 2018 2:41 am

StevoD wrote:
Wed May 09, 2018 12:06 am
Do you have a demo image for that?
I don't because the O/S on 1,2,3 is commercial from mentor graphics (Jaluna/OSware) but I am happy to make a sample with a non commercial o/s on 1,2,3 if you want. There a pile of simple O/S examples around what do you want pre-emptive task switcher?

I am pretty sure I can easily do it with Circle, if you use or are familiar with that.

From there you can progress up to virtualization and up and onward into never ending complexity.

Open-AMP is probably a good start to look at code and why we do this although there are better ways in some situations. Please don't take it as the only way to do this. Why I suggest it as a start point is it's reasonably simple to understand.
https://github.com/OpenAMP/open-amp/wiki/AMP-Intro
StevoD wrote:
Wed May 09, 2018 12:06 am
How do you make more than one o/s run at once?
Strange question .. the cores are independent the O/S don't need to know anything about each other unless they intend to share resources.
The Pi already has two O/S running in all situations .. think about it ... the answer is in the next bit if you don't get it.
StevoD wrote:
Wed May 09, 2018 12:06 am
How do you share hardware across many o/s at once?
Now that is a whole other question and loaded with what are you trying to do questions :-)

However basically 3 techniques
1.) Shared memory locks
2.) Semaphores in one of its various forms
3.) Messaging

In every baremetal example on the Pi the ARM (one O/S) using the GPU (Another O/S) you used the messaging system between them.
You have also written into a framebuffer which is an example of shared memory lock between those two O/S's.
You asked the GPU to give you an area in it's memory in which you both agreed what writing data into that area meant.
So every baremetal example writing to the screen is a default example of 1 & 3.
Remember literally everything on the Pi is shared between the ARM cores and the GPU :-)

Technically if we have 2 O/S on the arm cores we have 3 O/S in total as the GPU is still there with it's O/S.

I haven't tried this but I was one day going to get around to it. I believe in raspbian you can control the number of cpu's it uses by editing /boot/cmdline.txt. with the line maxcpus=3. I believe core3 will be left parked in the bootloader. You should then be able to write a linux program to ask the GPU to allocate you a memory block (look at the mailbox GPU which has it's own memory allocator) then place your baremetal code in the allocated block and the unpark core3 to your baremetal code in the GPU memory block and then exit back to linux. If you ever looked at my GPU pipeline code I lock all the model memory into GPU memory blocks. The GPU memory belongs to the GPU even when running on linux. So at that stage you have linux running normally on cores 0,1,2 and a baremetal code running core3 safely inside GPU allocated memory. You can then use the core3 mailbox address to send messages between linux and the baremetal program. So raspbian should never have a problem or clash because the only resources used are not owned by Raspbian but by the GPU.

If you want a more detailed answer, I need more specifics to the question.

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: Audio output on the Raspberry Pi 3B

Wed May 09, 2018 6:56 am

StevoD wrote: Sorry to the op, my question is off topic.
I don't mind it's an interesting topic anyway.

Apart from that I still have been unable to get it working. I tried to use LdB's solution to make core 1 execute all of the start code but I have still been unsuccessful so I am going to paste my whole start.s file here because there must be an issue somewhere else I reckon.

Code: Select all

.section ".text.boot"

.global _start

_start:
	// read cpu id, stop slave cores
	mov	x1, #0xe0
	ldr	x2, _start
	str	x2, [x1]
	sev
	mrs 	x1, mpidr_el1 // Read core id on AARCH64
	and 	x1, x1, #0x3  // Make core 2 bit bitmask in x1
	cbz  	x1, 2f	      // Core 0 jumps out to label 2
	cmp 	x1, #01       // Check for core1
	beq  	1f
	// cpu id != 1, stop
1:
	wfe
	b       1b
2:	// cpu id == 1

	// set stack before our code
	ldr     x1, =_start
	mov     sp, x1

	// clear bss
	ldr     x1, =__bss_start
	ldr     w2, =__bss_size
3:  
	cbz     w2, 4f
	str     xzr, [x1], #8
	sub     w2, w2, #1
	cbnz    w2, 3b

	// jump to C code, should not return
4:  
	bl      main
	// for failsafe, halt this core too
	b       1b
As you see my code is fairly simple so I'm not sure why I'm failing at this.
I think my issue is simmilar to this viewtopic.php?f=72&t=211729

LdB
Posts: 856
Joined: Wed Dec 07, 2016 2:29 pm

Re: Audio output on the Raspberry Pi 3B

Wed May 09, 2018 2:01 pm

Okay I sort of hacked your code into mine so I could show you it working
I would also suggest you get a github account it makes looking at code you have problems with a lot easier.

So first please load all the files from here on a SDCARD and confirm it works for you
https://github.com/LdB-ECM/Raspberry-Pi ... de/DiskImg
Core0 is spinning the cursor on screen, core1 is flashing the activity LED.

First lets deal with big problems with your code above
1.) Your core1 code you posted did nothing even it it started you just slept it again with wfe
2.) You don't set a stack pointer for core 1 even if I branch it to code

Now I had a couple of issues with your code when trying to merge with my code which aren't problems they are just inconsistent with my code
1.) You didn't give me your linker file so I don't have same labels and couldn't use your BSS clearing code I just substituted mine. It's not important your code is probably fine and matches your linker file.
2.) I can't use the core start straight up like you did because of a resource sharing issue because of what I have the cores doing in demo.
Core0 comes into my main and initializes the screen getting a framebuffer and takes to max speed, both need the mailbox
Core1 flashes the activity LED on/off which needs the mailbox

Hopefully you see the issue with item 2 above, if I release core1 straight up like you did it starts using the mailbox and then when core0 tries to use the mailbox they crash and burn. The mailbox use needs a resource share lock on it if I wanted to use your code as was. So what I did was simply delay asking core1 to start until after core0 had finished with mailbox look at the main C code.
https://github.com/LdB-ECM/Raspberry-Pi ... rce/main.c

So I just poke core1 to start at 0x80000 when core0 is done with the mailbox.

Code: Select all

asm("mov	x1, #0xe0\n"\
		"mov	x2, #0x80000\n" \
		"str	x2, [x1]\n"\
"sev");
This illustrates the discussion of requiring resource locks on stuff that will be required to be shared.

Here is the hacked assembler code and below is the explaination
https://github.com/LdB-ECM/Raspberry-Pi ... tStart64.S

The first two assembler blocks are required by my code for the printf which needs the FPU online, it starts offline.
There is a small section between BSS clear and jumping to C which is all to do with my display it's not important.

Here is the modified core peel .. I have commented it here

Code: Select all

	mrs 	x1, mpidr_el1 // Read core id on AARCH64
	and 	x1, x1, #0x3  // Make core 2 bit bitmask in x1
	cbz  	x1, 2f	      // Core 0 jumps out to label 2
	cmp 	x1, #1        // Check for core1
	beq  	1f
        b hang                  // Anything other than core0,1 goes and hangs
// Core 1 branches here
1:
       ldr     x1, =(_start-0x10000)    // stack pointer 0x10000 below start which will be 0x70000
	mov     sp, x1  // Set core1 sp
	b       core1_main    // branch to core1 c code
	 b hang   // hang if core1 ever returns

// core0 branches here
2:	
    ldr     x1, =_start      // stack pointer to start at start which will be 0x80000  
   mov sp, x1   // set core0 sp

Core1 main is a simple flash the activity LED code .. it gets there from the branch instruction in the code above
The core is running at 450Mhz or whatever the slow default start up speed is on a Pi3.

Code: Select all

   void core1_main (void) {
	while (1) {
		set_Activity_LED(true);
		for (int i = 0; i < 1000000; i++){ asm("nop"); };
		set_Activity_LED(false);
		for (int i = 0; i < 1000000; i++) { asm("nop"); };
	}
}
So there you have it core0 spinning the cursor on screen core1 flashing the activity LED.

I am pretty sure you should be able to fix you code in your files as you want now.

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: Audio output on the Raspberry Pi 3B

Sun May 13, 2018 12:18 am

Thank you soooo much LdB

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: Audio output on the Raspberry Pi 3B

Tue Jul 10, 2018 12:36 am

Hello again,

I have finally gotten around to trying to implement the audio in C. I have so far tried to translate all of the asm to C as you could probably guess I haven't been able to make it work. I know that the issue isn't misalignment because it doesn't hang at any point if it runs for long enough it finishes all the data copies. The issue is that there is no actual sound output. Instead of doing what PeterLemon did I used ffmpeg to take a mp3 file and convert it to a raw unsigned 16 bit little endian audio file that is what he used. Then I used ld to make it a .o file by using the command

Code: Select all

aarch64-linux-gnu-ld -r -b binary -o audio.o audio.bin
Then I used extern unsigned char to get the symbols for the start and the end of the audio file. I then passed that to my function to play the audio. Yes I do use one modulo division in it but I don't know what to replace it with. Here is my code for the audio setup and play functions.

sound.c

Code: Select all

int init_audio_jack()
{
        *((volatile unsigned int *)(MMIO_BASE + 0x200000 + 0x10)) = (0x4 | 0x200000);

        *CM_PWMDIV = CM_PASS | 0x2000; //Set clock block 0

        *CM_PWMCTL = (CM_PASS | 0x10) | (0x01 + 0x05); //Set clock block 1

        *(PWM_BASE + 0x10) = 0x2C48;
        *(PWM_BASE + 0x20) = 0x2C48;

        *(PWM_BASE + 0x0) = 0x20 + 0x100 + 0x2000 + 0x1 + 0x40;

        lfb_print(0, 2, "Well there were no unaligned exeptions");

        return 0;
}

int play_16bit_unsigned_audio(char *start, char *end)
{
        if(end < start) return 1;
        lfb_print(0, 3, "End isn't less than start.");
        if((start - end) % 2 != 0) return 2;
        lfb_print(0, 4, "Is a multiple of two so it is 16bit");
        //FIFO write
        for(int i = 0; &(start[i]) != end; i++)
        {
                uint8_t sample_low = start[i];
                uint8_t sample_high = start[i++];

                uint16_t sample = sample_low | (sample_high << 8);

                sample >>= 2;
                *PWM_FIF1 = sample;
                //FIFO wait
                while(*PWM_STA != 0x1);
        }
        lfb_print(0, 5, "Completed Audio");
        return 0;
}
sound.h

Code: Select all

#ifndef SOUND_H
#define SOUND_H

#include "gpio.h"

#define CM_BASE         ((volatile unsigned int *)(MMIO_BASE+0x101000))
#define CM_PWMDIV       ((volatile unsigned int *)(CM_BASE+0x0A4))
#define CM_PASS         0x5A000000
#define CM_PWMCTL       ((volatile unsigned int *)(CM_BASE+0x0A0))
#define PWM_BASE        ((volatile unsigned int *)(MMIO_BASE+0x20C000))
#define PWM_RNG1        ((volatile unsigned int *)(PWM_BASE+0x10))
#define PWM_RNG2        ((volatile unsigned int *)(PWM_BASE+0x20))
#define PWM_FIF1        ((volatile unsigned int *)(PWM_BASE+0x18))
#define PWM_STA         ((volatile unsigned int *)(PWM_BASE+0x4))

extern volatile unsigned char _binary_src_audio_The_Amazons_bin_start, _binary_src_audio_The_Amazons_bin_end;

int init_audio_jack();
int play_16bit_unsigned_audio(char *start, char *end);

#endif
They are called like this.

main.c

Code: Select all

init_audio_jack();
play_16bit_unsigned_audio((char *)&_binary_src_audio_The_Amazons_bin_start, (char *)&_binary_src_audio_The_Amazons_bin_end);
The audio file is in my GitHub repo as is all this code use the .o file if you compile it because it already has these labels. My GitHub repo is https://github.com/OllieLollie1/Raspi3-Kernel

EDIT 1:
I have decided to have a look at how Circle does it. Sorry I didn't think of doing this before.

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: Audio output on the Raspberry Pi 3B

Tue Jul 17, 2018 2:28 am

Does anyone know what I have done wrong?

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: Audio output on the Raspberry Pi 3B

Mon Jul 23, 2018 7:25 am

Progress report: I am fairly sure my issue is one with the hardware PWM. I haven't made much progress isolating the cause of the issue.

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: Audio output on the Raspberry Pi 3B

Mon Sep 03, 2018 7:28 am

OK update in case anyone else wanted this functionality. I have managed to get it working in asm. I have converted Peter Lemon's code from FASM to AARCH64 asm I have attempted a C version however it fails in both init and play. If anyone can tell the difference between these two files please say something.

Code: Select all

void init_audio_jack_c()//ERROR IN HERE
{
	//Set phone jack to pwm output
	uint32_t *gpio_addr = (uint32_t *)(PERIPHERAL_BASE + GPIO_BASE);
       	uint32_t *gpio_gpfsel4_addr = gpio_addr + GPIO_GPFSEL4;
	*gpio_gpfsel4_addr = GPIO_FSEL0_ALT0 | GPIO_FSEL5_ALT0;

	//Set clock
	uint32_t *clock_manager_addr = (uint32_t *)(((PERIPHERAL_BASE + CM_BASE) & 0x0000FFFF) | ((PERIPHERAL_BASE + CM_BASE) & 0xFFFF0000));
	*(clock_manager_addr + CM_PWMDIV) = (CM_PASSWORD | 0x2000);

	*(clock_manager_addr + CM_PWMCTL) = ((CM_PASSWORD | CM_ENAB) | (CM_SRC_OSCILLATOR + CM_SRC_PLLCPER));

	//Set PWM
	uint32_t *pwm_manager_addr = (uint32_t *)(((PERIPHERAL_BASE + PWM_BASE) & 0x0000FFFF) | ((PERIPHERAL_BASE + PWM_BASE) & 0xFFFF0000));
	*(pwm_manager_addr + PWM_RNG1) = 0x1624;
	*(pwm_manager_addr + PWM_RNG2) = 0x1624;

	*(pwm_manager_addr + PWM_CTL) = PWM_USEF2 + PWM_PWEN2 + PWM_USEF1 + PWM_PWEN1 + PWM_CLRF1;

	printf("[INFO] Audio Init Finished");
}


int32_t play_16bit_unsigned_audio(uint16_t *start, uint16_t *end)
{
	if(end < start) 
	{
		printf("[ERROR] End is less than start.");
		return 1;
	}
	if((start - end) % 2 == 0)
	{
		printf("[ERROR] Isn't a multiple of two so it isn't 16bit");
		return 2;
	}

	uint16_t *end_of_file = (uint16_t *)(uint64_t)(((uint32_t)(uintptr_t)end & 0x0000FFFF) | ((uint32_t)(uintptr_t)end & 0xFFFF0000));

	//FIFO write
	while(start != end_of_file)
	{
		uint16_t sample = start[0];
		sample >>= 3;
		*(uint32_t *)((((uint32_t)(PERIPHERAL_BASE + PWM_BASE) & 0x0000FFFF) | ((uint32_t)(PERIPHERAL_BASE + PWM_BASE) & 0xFFFF0000)) + PWM_FIF1) = sample;
		
		start++;
		sample = start[0];
		sample >>= 3;
		*(uint32_t *)((((uint32_t)(PERIPHERAL_BASE + PWM_BASE) & 0x0000FFFF) | ((uint32_t)(PERIPHERAL_BASE + PWM_BASE) & 0xFFFF0000)) + PWM_FIF1) = sample;
		
		//FIFO wait
		while(*(uint32_t *)((((uint32_t)(PERIPHERAL_BASE + PWM_BASE) & 0x0000FFFF) | ((uint32_t)(PERIPHERAL_BASE + PWM_BASE) & 0xFFFF0000)) + PWM_STA) != PWM_FULL1);
		start++;
	}
	printf("[INFO] Completed Audio");
	return 0;
}
AARCH64 asm:

Code: Select all

.section .text.init_audio_jack, "ax", %progbits
.balign	4
.globl init_audio_jack;
.type init_audio_jack, %function
init_audio_jack:
	mov w0,PERIPHERAL_BASE + GPIO_BASE
	mov w1,GPIO_FSEL0_ALT0
	orr w1,w1,GPIO_FSEL5_ALT0
	str w1,[x0,GPIO_GPFSEL4]

	// Set Clock
	mov w0, PERIPHERAL_BASE
	add w0, w0, CM_BASE
	and w0, w0, 0x0000FFFF

	mov w1, PERIPHERAL_BASE 
	add w1, w1, CM_BASE
	and w1, w1, 0xFFFF0000
	
	orr w0,w0,w1
	mov w1,CM_PASSWORD
	orr w1,w1,0x2000 // Bits 0..11 Fractional Part Of Divisor = 0, Bits 12..23 Integer Part Of Divisor = 2
	brk #0
	str w1,[x0,CM_PWMDIV]

	mov w1,CM_PASSWORD
	orr w1,w1,CM_ENAB
	orr w1,w1,CM_SRC_OSCILLATOR + CM_SRC_PLLCPER // Use 650MHz PLLC Clock
	str w1,[x0,CM_PWMCTL]

	// Set PWM
	mov w0, PERIPHERAL_BASE
	add w0, w0, PWM_BASE
	and w0, w0, 0x0000FFFF
	
	mov w1,PERIPHERAL_BASE
	add w1, w1, PWM_BASE
	and w1, w1, 0xFFFF0000
	
	orr w0,w0,w1
	mov w1,0x1624 // Range = 13bit 44100Hz Mono
	str w1,[x0,PWM_RNG1]
	str w1,[x0,PWM_RNG2]

	mov w1,PWM_USEF2 + PWM_PWEN2 + PWM_USEF1 + PWM_PWEN1 + PWM_CLRF1
	str w1,[x0,PWM_CTL]


.section .text.play_audio, "ax", %progbits
.balign	4
.globl play_audio;
.type play_audio, %function
play_audio:
	Loop:
		adr x1, _binary_src_audio_Interlude_bin_start // X1 = Sound Sample
		ldr w2, =_binary_src_audio_Interlude_bin_end
		and w2, w2, 0x0000FFFF // W2 = End Of Sound Sample
		ldr w3, =_binary_src_audio_Interlude_bin_end
		and w3, w3, 0xFFFF0000
		orr w2,w2,w3
		FIFO_Write:
			ldrh w3,[x1],2 // Write 2 Bytes To FIFO
			lsr w3,w3,3 // Convert 16bit To 13bit
			str w3,[x0,PWM_FIF1] // FIFO Address
			
			ldrh w3, [x1], 2
			lsr w3, w3, 3
			str w3, [x0, PWM_FIF1]
		FIFO_Wait:
			ldr w3,[x0,PWM_STA]
			tst w3,PWM_FULL1 // Test Bit 1 FIFO Full
			b.ne FIFO_Wait
		cmp w1,w2 // Check End Of Sound Sample
		b.ne FIFO_Write
	b Loop // Play Sample Again

LdB
Posts: 856
Joined: Wed Dec 07, 2016 2:29 pm

Re: Audio output on the Raspberry Pi 3B

Mon Sep 03, 2018 8:25 am

Code: Select all

*(uint32_t *)((((uint32_t)(PERIPHERAL_BASE + PWM_BASE) & 0x0000FFFF) | ((uint32_t)(PERIPHERAL_BASE + PWM_BASE) & 0xFFFF0000)) + PWM_FIF1) = sample;
This line looks all wrong off the top of my head doesn't it need shift left 16 ... It sort of looks like you are trying to set upper and lower halves.
You don't get smaller faster code by writing single long lines verses lots of small easy to read lines .. pull that out to easy to check steps :-)

I am also not sure what it will do with alignment either ... you have no alignment cues for the optimizer anywhere. You are writing a uint16_t called sample using a 32 bit uint32_t* cast .. danger Will Robinson. Everything about that lines sends chills down my spine.

I would be over the top cautious ... live or die by the sword.
Your best friends should be

Code: Select all

volatile __attribute__((aligned(4))) uint32_t* ptr32
volatile __attribute__((aligned(2))) uint16_t* ptr16
or use PUT16, PUT32 like David does.

You need to know you are writing aligned 32 bit writes to the registers

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: Audio output on the Raspberry Pi 3B

Sat Sep 29, 2018 3:19 am

Sorry for resurrecting an old thread but I think I found another issue with my C translation. In PeterLemon's code he uses the instruction adr to get the label address relative to the PC. Is it possible to pass in the label address to the asm function as &_any_label and have it work as expected when calling from C? Here is what I tried but it is clearly buggy and doesn't work.

Code: Select all

//"================================================================================"
//              Audio Play Function -- AARCH64 Raspberry Pi 3
//              C Function: "void play_audio(char *audio_start, char *audio_end);"
//              Entry:  X0 will have the start address of the audio file
//                      X1 will have the end address of the audio file
//              Return: nothing
//"================================================================================"
.section .text.play_audio, "ax", %progbits
.balign 4
.globl play_audio;
.type play_audio, %function
play_audio:
        mov x1, x4 //Move the inputs to where they are required
        mov x0, x1 //Same here

        mov x0,PERIPHERAL_BASE + GPIO_BASE
        mov w0,(PERIPHERAL_BASE + PWM_BASE) & 0x0000FFFF
        mov w1,(PERIPHERAL_BASE + PWM_BASE) & 0xFFFF0000
        orr w0,w0,w1
        
        Loop:
                adr x5, .// Get PC value _binary_src_audio_Interlude_bin_start // X1 = Sound Sample
                sub x1, x1, x5
                //ldr w2, x4//=_binary_src_audio_Interlude_bin_end
                mov x2, x4
                and w2, w2, 0x0000FFFF // W2 = End Of Sound Sample
                //ldr w3, x4//=_binary_src_audio_Interlude_bin_end
                mov x3, x4
                and w3, w3, 0xFFFF0000
                orr w2,w2,w3
                FIFO_Write:
                        ldrh w3,[x1],2 // Write 2 Bytes To FIFO
                        lsr w3,w3,4 // Convert 16bit To 12bit
                        str w3,[x0,PWM_FIF1] // FIFO Address

                        ldrh w3, [x1], 2
                        lsr w3, w3, 4
                        str w3, [x0, PWM_FIF1]
                FIFO_Wait:
                        ldr w3,[x0,PWM_STA]
                        tst w3,PWM_FULL1 // Test Bit 1 FIFO Full
                        b.ne FIFO_Wait
                cmp w1,w2 // Check End Of Sound Sample
                b.ne FIFO_Write
        b Loop // Play Sample Again
EDIT 1:
If LdB sees this would you mind checking out the prefetch abort thread I would just like to ask you questions about your code.

LdB
Posts: 856
Joined: Wed Dec 07, 2016 2:29 pm

Re: Audio output on the Raspberry Pi 3B

Sun Sep 30, 2018 3:51 pm

You keep treating W1 as if it's a different register to X1 .. you get that it isn't it's just the bottom half
Look at figure 4.1 and 4.2
http://infocenter.arm.com/help/index.js ... HDEEJ.html

So look at play_audio:

The aim of line 2 is to preserved the incoming value of x0 in x1

Code: Select all

mov x0, x1 //Same here
Then in line 5 you promptly trash the bottom half of the very register you just saved it in

Code: Select all

 mov w1,(PERIPHERAL_BASE + PWM_BASE) & 0xFFFF0000
Then later on you try to use x1 as if it still has the value that came in as x0 .. well it doesn't

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: Audio output on the Raspberry Pi 3B

Mon Oct 01, 2018 12:00 am

Thanks and yes I knew that I just hadn't realized what I had done. It appears I still am not getting the correct values in the registers. Can the opcode adr return a negative value or does it always return the absolute value?

LdB
Posts: 856
Joined: Wed Dec 07, 2016 2:29 pm

Re: Audio output on the Raspberry Pi 3B

Mon Oct 01, 2018 2:52 am

It's an address and hence unsigned there is no such thing as negative. Your display routine is the only thing that may call things negative.

Your C prototype controls what is in the registers and if it is like you have shown

Code: Select all

void play_audio(char *audio_start, char *audio_end);
x0 = audio_start
x1 = audio_end

However you can't pass parameters via CoreExecute, if you are trying to still play on core1 .. they never get passed thru the mailbox.
If you are trying to do that setup a global.

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: Audio output on the Raspberry Pi 3B

Mon Oct 01, 2018 5:23 am

I'm trying to play through core 0 (the same core main() is executing on) but I am unable to get the correct addresses. From what I understand of the original code adr was used to get the label address relative to the PC and the end label had its absolute address loaded in and then masked. Is there something I am missing?

Return to “Bare metal, Assembly language”

Who is online

Users browsing this forum: No registered users and 2 guests