User avatar
MikeDB
Posts: 167
Joined: Sun Oct 12, 2014 8:27 am

RTOS on GPU ?

Sat Oct 26, 2019 9:14 am

Still finding things out about the Pi and I gather the GPU has direct access to all memory (and hence all peripherals) without an MMU, and boots the ARM cores at startup. I was thus wondering if anybody had tried creating an RTOS that runs on the GPU, thereby releasing all four ARM cores for other work ?

Or are there roadblocks to this I haven't appreciated. I realise this isn't possible on the Pi4 yet because of documentation.
Will the forthcoming MIDI-2 spec at last allow us to set the volume to 11 !!

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 26038
Joined: Sat Jul 30, 2011 7:41 pm

Re: RTOS on GPU ?

Sat Oct 26, 2019 9:37 am

The firmware that runs on the GPU already runs threadx, an rtos. What you are suggesting is already how it works.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

User avatar
MikeDB
Posts: 167
Joined: Sun Oct 12, 2014 8:27 am

Re: RTOS on GPU ?

Sat Oct 26, 2019 9:58 am

Thanks. I'll look at that then


Edit : Ok I looked and find that is closed-source. So I'll rephrase - has anybody ran another RTOS under ThreadX on the GPU to control the ARMs in real-time ?
Will the forthcoming MIDI-2 spec at last allow us to set the volume to 11 !!

fanoush
Posts: 508
Joined: Mon Feb 27, 2012 2:37 pm

Re: RTOS on GPU ?

Sat Oct 26, 2019 4:28 pm

Check https://github.com/christinaa/rpi-open-firmware There is also gcc port so with enough motivation porting some RTOS is possible.

User avatar
MikeDB
Posts: 167
Joined: Sun Oct 12, 2014 8:27 am

Re: RTOS on GPU ?

Sat Oct 26, 2019 9:47 pm

fanoush wrote:
Sat Oct 26, 2019 4:28 pm
Check https://github.com/christinaa/rpi-open-firmware There is also gcc port so with enough motivation porting some RTOS is possible.
Thanks - very useful. Interesting to see the chain of comments on Github about the history of this project. You wonder why Broadcom have to be quite so secretive about stuff that is totally open in the Intel world, albeit at higher cost, and I can totally understand the author's comments on why she gave up. If the Libre RISC-V SoC (or an alternative) ever gets to production then I'm sure many people will switch to that, but in the meantime it is a frustrating hill to climb. The nearest thing at the moment is the STMicroelectronics MP1 dual core which I actually started development on until they made the decision not to support bare metal on it.
Will the forthcoming MIDI-2 spec at last allow us to set the volume to 11 !!

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 26038
Joined: Sat Jul 30, 2011 7:41 pm

Re: RTOS on GPU ?

Sun Oct 27, 2019 1:01 pm

MikeDB wrote:
Sat Oct 26, 2019 9:58 am
Thanks. I'll look at that then


Edit : Ok I looked and find that is closed-source. So I'll rephrase - has anybody ran another RTOS under ThreadX on the GPU to control the ARMs in real-time ?
What are you trying to achieve? The firmware on the GPU has been hugely optimised over the last 10 years, by a large team of engineers, so is already pretty much as good as it gets.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

User avatar
MikeDB
Posts: 167
Joined: Sun Oct 12, 2014 8:27 am

Re: RTOS on GPU ?

Sun Oct 27, 2019 7:22 pm

jamesh wrote:
Sun Oct 27, 2019 1:01 pm
MikeDB wrote:
Sat Oct 26, 2019 9:58 am
Thanks. I'll look at that then


Edit : Ok I looked and find that is closed-source. So I'll rephrase - has anybody ran another RTOS under ThreadX on the GPU to control the ARMs in real-time ?
What are you trying to achieve? The firmware on the GPU has been hugely optimised over the last 10 years, by a large team of engineers, so is already pretty much as good as it gets.
I want it to do all the process scheduling, and reading/writing of most of the I/O, leaving the ARM cores to get on with the DSP work for my product. Maybe I'm missing something but as far as I can work out on headless systems, the GPU just sits there doing not much once it has booted the system, so one ARM core has to be dedicated to housekeeping which seems rather wasteful. For me, another use for it would be to run a database and webserver as currently I am having to add an ESP8266 to do this.
Will the forthcoming MIDI-2 spec at last allow us to set the volume to 11 !!

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 26038
Joined: Sat Jul 30, 2011 7:41 pm

Re: RTOS on GPU ?

Sun Oct 27, 2019 8:50 pm

MikeDB wrote:
Sun Oct 27, 2019 7:22 pm
jamesh wrote:
Sun Oct 27, 2019 1:01 pm
MikeDB wrote:
Sat Oct 26, 2019 9:58 am
Thanks. I'll look at that then


Edit : Ok I looked and find that is closed-source. So I'll rephrase - has anybody ran another RTOS under ThreadX on the GPU to control the ARMs in real-time ?
What are you trying to achieve? The firmware on the GPU has been hugely optimised over the last 10 years, by a large team of engineers, so is already pretty much as good as it gets.
I want it to do all the process scheduling, and reading/writing of most of the I/O, leaving the ARM cores to get on with the DSP work for my product. Maybe I'm missing something but as far as I can work out on headless systems, the GPU just sits there doing not much once it has booted the system, so one ARM core has to be dedicated to housekeeping which seems rather wasteful. For me, another use for it would be to run a database and webserver as currently I am having to add an ESP8266 to do this.
Housekeeping of what? Linux need to do housekeeping for its own 'maintenance', and that cannot be passed off to the GPU, as that doesn't run Linux. There's an inherent problem with handing off stuff like IO onto the GPU, and that is getting buffers of data to and from the GPU - on the whole this means copying data, which hammers your SDRAM bandwidth and CPU's.

The GPU sits there running things like the display, CODECs etc. Linux has its own drivers that then talk to that firmware. With FKMS a lot of the work for the display stuff is indeed passed off to the GPU. In the future, we actually want the ARM to do all that, leaving the GPU with nothing to do, because it's actually better/faster to do it that way. (SDRAM bandwidth, CPU).

So whilst a nice idea, I don't think there is any benefit to it. I suggest you read up on how Linux scheduling works, and how efficient it is.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

User avatar
MikeDB
Posts: 167
Joined: Sun Oct 12, 2014 8:27 am

Re: RTOS on GPU ?

Sun Oct 27, 2019 9:57 pm

jamesh wrote:
Sun Oct 27, 2019 8:50 pm
Housekeeping of what? Linux need to do housekeeping for its own 'maintenance', and that cannot be passed off to the GPU, as that doesn't run Linux. There's an inherent problem with handing off stuff like IO onto the GPU, and that is getting buffers of data to and from the GPU - on the whole this means copying data, which hammers your SDRAM bandwidth and CPU's.

The GPU sits there running things like the display, CODECs etc. Linux has its own drivers that then talk to that firmware. With FKMS a lot of the work for the display stuff is indeed passed off to the GPU. In the future, we actually want the ARM to do all that, leaving the GPU with nothing to do, because it's actually better/faster to do it that way. (SDRAM bandwidth, CPU).

So whilst a nice idea, I don't think there is any benefit to it. I suggest you read up on how Linux scheduling works, and how efficient it is.
Jamesh - in case you hadn't noticed, we are in the bare metal forum section so why would I be interested in how Linux scheduling works, or how its drivers talk to the GPU ? And if you think that Linux scheduling is efficient, I think you should go and try to run some heavyweight audio DSP code under Linux. Using isolcpu and so on you can get rid of most of the clicks and pauses if you include latency inducing ring buffers, but you can never achieve the full capability of the ARM cores for real-time processing, for example for something like an autotune function.

If you read the work done by Christinaa linked above and you'll see it's a very valid methodology, made awkward by a lack of information. However what I was hoping for was the ARMs and GPU are linked by some sort of DMA controller so that whilst information would go through SDRAM and suffer some delay, it wouldn't slow down the ARMs.

The alternative would be to slave the ARM cores to an external processor running an RTOS, for example an STM32H7 series processor or the equivalent NXP Crossover MCU, but that is adding complexity and cost.
Will the forthcoming MIDI-2 spec at last allow us to set the volume to 11 !!

trejan
Posts: 1635
Joined: Tue Jul 02, 2019 2:28 pm

Re: RTOS on GPU ?

Sun Oct 27, 2019 10:23 pm

MikeDB wrote:
Sun Oct 27, 2019 7:22 pm
I want it to do all the process scheduling, and reading/writing of most of the I/O, leaving the ARM cores to get on with the DSP work for my product.
You might want to look at https://github.com/hoglet67/PiTubeDirect and talk to the author. They're using the GPU for I/O and doing the rest of the processing on the ARM core(s). It is still running ThreadX and the rest of the Pi firmware.

User avatar
MikeDB
Posts: 167
Joined: Sun Oct 12, 2014 8:27 am

Re: RTOS on GPU ?

Sun Oct 27, 2019 10:29 pm

trejan wrote:
Sun Oct 27, 2019 10:23 pm
MikeDB wrote:
Sun Oct 27, 2019 7:22 pm
I want it to do all the process scheduling, and reading/writing of most of the I/O, leaving the ARM cores to get on with the DSP work for my product.
You might want to look at https://github.com/hoglet67/PiTubeDirect and talk to the author. They're using the GPU for I/O and doing the rest of the processing on the ARM core(s). It is still running ThreadX and the rest of the Pi firmware.
Thanks - I will.
Will the forthcoming MIDI-2 spec at last allow us to set the volume to 11 !!

LdB
Posts: 1524
Joined: Wed Dec 07, 2016 2:29 pm

Re: RTOS on GPU ?

Mon Oct 28, 2019 12:06 am

You would be better off looking at something like a Zynq SOC where everything is exposed.

User avatar
MikeDB
Posts: 167
Joined: Sun Oct 12, 2014 8:27 am

Re: RTOS on GPU ?

Mon Oct 28, 2019 12:48 am

LdB wrote:
Mon Oct 28, 2019 12:06 am
You would be better off looking at something like a Zynq SOC where everything is exposed.
It's only 32 bit so can't do high performance audio DSP. The A72 is a great core as it's 64 bits with a better pipeline. A few A72s on an FPGA would be perfect but I don't think anybody is planning such a device.
Will the forthcoming MIDI-2 spec at last allow us to set the volume to 11 !!

User avatar
Gavinmc42
Posts: 4383
Joined: Wed Aug 28, 2013 3:31 am

Re: RTOS on GPU ?

Mon Oct 28, 2019 2:53 am

A few A72s on an FPGA would be perfect but I don't think anybody is planning such a device.
Someone probably is.

David Banks stuuff is cool, had not seen that before.
A quick look and it looks like it is using the mailbox to the ThreadX RTOS?

One of Liz's first posts was saying one day they might expose the Pi's DSP?
Still not sure if this was the VPU's or QPU's or some unknown DSP.
Not even sure if the Pi's RTOS is runing in one VPU leaving one spare.

With the Pi4/VC6 more is intended to be run on the Arm cores as they are now faster.
That leaves the VC6 VPUs sitting around not doing much?
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

LdB
Posts: 1524
Joined: Wed Dec 07, 2016 2:29 pm

Re: RTOS on GPU ?

Mon Oct 28, 2019 3:06 am

It's a quad A53 and the same ARM core licensed by Broadcom AFAIK
the only significant difference is it's a Mali GPU .

There is also a cheap dual core.

Because it's slices in the FPGA you can reconfigure it.

Here is the selection guide
https://www.xilinx.com/support/document ... -guide.pdf

It's not targetted at the SBC market like the Pi it's splayed open as a design tool.

There is also an octal A72 design kicking around on TSMC fpga.

User avatar
MikeDB
Posts: 167
Joined: Sun Oct 12, 2014 8:27 am

Re: RTOS on GPU ?

Mon Oct 28, 2019 10:10 am

LdB wrote:
Mon Oct 28, 2019 3:06 am
It's a quad A53 and the same ARM core licensed by Broadcom AFAIK
the only significant difference is it's a Mali GPU .

There is also a cheap dual core.

Because it's slices in the FPGA you can reconfigure it.

Here is the selection guide
https://www.xilinx.com/support/document ... -guide.pdf
$280 per IC. I think that blows the budget I'm afraid :-( I've got one Pi4 running Gentoo (or some other 64 bit OS - TBD) and six Pi4s running baremetal so would need about $2ks worth for a product that will sell for not much more.
Will the forthcoming MIDI-2 spec at last allow us to set the volume to 11 !!

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 26038
Joined: Sat Jul 30, 2011 7:41 pm

Re: RTOS on GPU ?

Mon Oct 28, 2019 10:11 am

MikeDB wrote:
Sun Oct 27, 2019 9:57 pm
jamesh wrote:
Sun Oct 27, 2019 8:50 pm
Housekeeping of what? Linux need to do housekeeping for its own 'maintenance', and that cannot be passed off to the GPU, as that doesn't run Linux. There's an inherent problem with handing off stuff like IO onto the GPU, and that is getting buffers of data to and from the GPU - on the whole this means copying data, which hammers your SDRAM bandwidth and CPU's.

The GPU sits there running things like the display, CODECs etc. Linux has its own drivers that then talk to that firmware. With FKMS a lot of the work for the display stuff is indeed passed off to the GPU. In the future, we actually want the ARM to do all that, leaving the GPU with nothing to do, because it's actually better/faster to do it that way. (SDRAM bandwidth, CPU).

So whilst a nice idea, I don't think there is any benefit to it. I suggest you read up on how Linux scheduling works, and how efficient it is.
Jamesh - in case you hadn't noticed, we are in the bare metal forum section so why would I be interested in how Linux scheduling works, or how its drivers talk to the GPU ? And if you think that Linux scheduling is efficient, I think you should go and try to run some heavyweight audio DSP code under Linux. Using isolcpu and so on you can get rid of most of the clicks and pauses if you include latency inducing ring buffers, but you can never achieve the full capability of the ARM cores for real-time processing, for example for something like an autotune function.

If you read the work done by Christinaa linked above and you'll see it's a very valid methodology, made awkward by a lack of information. However what I was hoping for was the ARMs and GPU are linked by some sort of DMA controller so that whilst information would go through SDRAM and suffer some delay, it wouldn't slow down the ARMs.

The alternative would be to slave the ARM cores to an external processor running an RTOS, for example an STM32H7 series processor or the equivalent NXP Crossover MCU, but that is adding complexity and cost.
Good point, but I still think its a waste of time. The GPU is very slow compared with the ARM's (600MHz, vs 1500MHz). It does have the VPU and QUADS, but there again, the ARM's have NEON which is faster. It also has a different instruction set, so you need two lots of everything.

It's a lot of work for little benefit. Of course, if someone else has already done the work for you, then use that!

What are you doing that requires all four ARM cores to be completely dedicated to specific tasks? There's a huge amount of processing power in the ARM cores+NEON, and I/O does not take much of it. Just overclocking the cores from 1500 to 2000 (if possible, smaller overclocks are available) would dwarf any savings from offloading IO to the GPU.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

User avatar
MikeDB
Posts: 167
Joined: Sun Oct 12, 2014 8:27 am

Re: RTOS on GPU ?

Mon Oct 28, 2019 10:17 am

Gavinmc42 wrote:
Mon Oct 28, 2019 2:53 am

David Banks stuuff is cool, had not seen that before.
A quick look and it looks like it is using the mailbox to the ThreadX RTOS?

One of Liz's first posts was saying one day they might expose the Pi's DSP?
Still not sure if this was the VPU's or QPU's or some unknown DSP.
Not even sure if the Pi's RTOS is runing in one VPU leaving one spare.

With the Pi4/VC6 more is intended to be run on the Arm cores as they are now faster.
That leaves the VC6 VPUs sitting around not doing much?
That was my point. We found with MathCAD on Intel that it's best to run audio DSP on the Intel cores rather than try and use GPUs because of the word size so it will probably be the same with ARM/VC6. But it seems a waste to have the GPU silicon sitting there doing nothing as there is no display, hence my suggestion to mop up simple housekeeping tasks that a 64 bit processor is overkill for. I learnt almost 40 years ago you can never have too much audio DSP processing power.
Will the forthcoming MIDI-2 spec at last allow us to set the volume to 11 !!

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 26038
Joined: Sat Jul 30, 2011 7:41 pm

Re: RTOS on GPU ?

Mon Oct 28, 2019 10:35 am

Gavinmc42 wrote:
Mon Oct 28, 2019 2:53 am
One of Liz's first posts was saying one day they might expose the Pi's DSP?
Still not sure if this was the VPU's or QPU's or some unknown DSP.
Not even sure if the Pi's RTOS is runing in one VPU leaving one spare.

With the Pi4/VC6 more is intended to be run on the Arm cores as they are now faster.
That leaves the VC6 VPUs sitting around not doing much?
No DSP - just the VPU and Quads. Both cores on the VPU run threadx and both are used. For example the camera algorithms require a lot of grunt, as do the codecs.

We are hoping to move away from using the VPU for some tasks - it means it can be reduced in size, making the die cheaper, and also makes S/W development a lot easier only dealing with the ARM cores, and gets rid of the nasty interface and associated SDRAM and CPU bandwidth costs. Also, making the VPU's faster requires a LOT of engineering time (and therefor cost) at BRCM whereas making the ARM's faster is done by ARM, so no engineering time/cost. Relatively speaking the VPU does not get faster in the same timescales the ARM's do, and that is a problem.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

User avatar
Gavinmc42
Posts: 4383
Joined: Wed Aug 28, 2013 3:31 am

Re: RTOS on GPU ?

Mon Oct 28, 2019 11:37 am

We are hoping to move away from using the VPU for some tasks - it means it can be reduced in size, making the die cheaper, .....
Hmm, make the die smaller and add something fun like an EthosN77 to the Pi5?
With guys already overclocking some Pi4 Arm's to 2GHz I can see an interesting future.

Any VC improvements would be to get two 4K60 output?
Big ask to double the bandwidth.?

Might have to read up on ThreadX, did not know it was running on both VPU's, but that makes sense now.

OK so Liz's intial DSP comment was a generic term for VPU/QPU,
Glad that one has been sorted out after all these years.
No hidden 24bit audio DSP.
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

LdB
Posts: 1524
Joined: Wed Dec 07, 2016 2:29 pm

Re: RTOS on GPU ?

Mon Oct 28, 2019 11:37 am

MikeDB wrote:
Mon Oct 28, 2019 10:10 am
$280 per IC. I think that blows the budget I'm afraid :-( I've got one Pi4 running Gentoo (or some other 64 bit OS - TBD) and six Pi4s running baremetal so would need about $2ks worth for a product that will sell for not much more
It's a dev system not a end product .... you perfect your code and push silicon at end :-)
You may like to look at what most of the competitors to the Pi do.
Anyhow this has gone sideways leave you to it.

@Jamesh
Relatively speaking the VPU does not get faster in the same timescales the ARM's do, and that is a problem.
You left out that part of that is also commercial decision because your boss wanted to stay on the VideoCore because he did not want to go out on a Mali and give competitors a leg up (I believe I am quoting him exactly). So there is commercial as well as legal reasons some parts are not open. I have no issue but lets not pretend that all the problem is technical.

hippy
Posts: 7147
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: RTOS on GPU ?

Mon Oct 28, 2019 11:48 am

jamesh wrote:
Mon Oct 28, 2019 10:35 am
Gavinmc42 wrote:
Mon Oct 28, 2019 2:53 am
That leaves the VC6 VPUs sitting around not doing much?
Both cores on the VPU run threadx and both are used.
Can you confirm that mean it is no longer possible to load VPU code from userland and have that run ?

Is this a change just for VC6 / Pi 4B upwards, or does it affect earlier Pi boards using VC4 also ?
Last edited by hippy on Mon Oct 28, 2019 11:50 am, edited 1 time in total.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 26038
Joined: Sat Jul 30, 2011 7:41 pm

Re: RTOS on GPU ?

Mon Oct 28, 2019 11:49 am

LdB wrote:
Mon Oct 28, 2019 11:37 am
MikeDB wrote:
Mon Oct 28, 2019 10:10 am
$280 per IC. I think that blows the budget I'm afraid :-( I've got one Pi4 running Gentoo (or some other 64 bit OS - TBD) and six Pi4s running baremetal so would need about $2ks worth for a product that will sell for not much more
It's a dev system not a end product .... you perfect your code and push silicon at end :-)
You may like to look at what most of the competitors to the Pi do.
Anyhow this has gone sideways leave you to it.

@Jamesh
Relatively speaking the VPU does not get faster in the same timescales the ARM's do, and that is a problem.
You left out that part of that is also commercial decision because your boss wanted to stay on the VideoCore because he did not want to go out on a Mali and give competitors a leg up (I am quoting him exactly). So there is commercial as well as legal reasons some parts are not open.
I have no issue but lets not pretend that all the problem is technical.
It's NEVER all technical. There's always commercial decisions, because we are a business and want to stay in business! I hardly ever discuss business stuff on here in anything but a distant way, that would be inappropriate. And job threatening!
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 26038
Joined: Sat Jul 30, 2011 7:41 pm

Re: RTOS on GPU ?

Mon Oct 28, 2019 11:51 am

hippy wrote:
Mon Oct 28, 2019 11:48 am
jamesh wrote:
Mon Oct 28, 2019 10:35 am
Gavinmc42 wrote:
Mon Oct 28, 2019 2:53 am
That leaves the VC6 VPUs sitting around not doing much?
Both cores on the VPU run threadx and both are used.
Can you confirm that mean it is no longer possible to load VPU code from userland and have that run ?

Is this a change just for VC6 / Pi 4B upwards, or does it affect earlier Pi boards using VC4 also ?
Threadx has always run on both cores, so there should be no change. But I've never tried it.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

User avatar
Gavinmc42
Posts: 4383
Joined: Wed Aug 28, 2013 3:31 am

Re: RTOS on GPU ?

Mon Oct 28, 2019 12:29 pm

Threadx has always run on both cores, so there should be no change. But I've never tried it.
For some reason I had made an assumption start-x had the extra code for camera etc while the cut down version did not use the other VPU.
Still learning.

Hmm have not diffed the VC6 start-x against the VC4 start-x.
What I don't even have a hexeditor yet, emerge time.
How long have the Pi'4s been out?
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

Return to “Bare metal, Assembly language”