ioannes
Posts: 6
Joined: Fri Apr 25, 2014 2:03 pm

VideoCore IV QPUs

Thu May 22, 2014 9:48 am

Hello, not sure if this is the correct category to post this, please move it accordingly if needed.
I've been going through the relatively newly released specifications document for the VideoCore IV GPU, trying to understand a little bit of its bare metal logic. The reason is that I have to write about raspberry pi's hardware for my thesis and it would be nice if I could dive into such issues a little bit. This is the first time I am studying a GPU architecture.
I am having trouble to understand how the QPUs multiplex the data so that they can be logically considered as 16-way SIMD processors. As far as I understand, they run the same instruction on 4 different vectors over 4 system cycles. If so what is the advantage of it?
Thanks in advance.

mimi123
Posts: 583
Joined: Thu Aug 22, 2013 3:32 pm

Re: VideoCore IV QPUs

Thu May 22, 2014 3:09 pm

The advantage is not system interlocks for pipeline stalls.

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: VideoCore IV QPUs

Thu May 22, 2014 3:16 pm

It logically looks like a 16-way SIMD processor as you can issue only one new instruction every four cycles. Even though the latency of each operation is four clock cycles, as you can only do something new every four when you're scheduling your code it looks like it has one-cycle throughput. (though the clock speed in that model is 4x lower than what it's really doing)

ioannes
Posts: 6
Joined: Fri Apr 25, 2014 2:03 pm

Re: VideoCore IV QPUs

Mon May 26, 2014 10:34 am

Thanks, this is quite clear now. But what exactly do they mean when they say that data are being multiplexed?

User avatar
teh_orph
Posts: 346
Joined: Mon Jan 30, 2012 2:09 pm
Location: London
Contact: Website

Re: VideoCore IV QPUs

Tue May 27, 2014 6:49 am

I would guess you mean the bit about "4-way multiplexed over four successive cycles". I suppose it's this http://en.wikipedia.org/wiki/Time-division_multiplexing
There are only really four hardware units, but they pretend to be 16 by re-running the same instruction four times. The inputs and output will change each cycle and be selected via a multiplexer (mux) to ensure the correct bank of four is selected on each cycle. http://en.wikipedia.org/wiki/Multiplexer

ioannes
Posts: 6
Joined: Fri Apr 25, 2014 2:03 pm

Re: VideoCore IV QPUs

Tue May 27, 2014 11:43 am

Yes, that is the exact bit I am reffering to.

The following quote is from the reference guide, on page 16:
Internally the QPU is a 4-way SIMD processor multiplexed to 16-ways by executing the same instruction for four clock cycles on four different 4-way vectors termed ‘quads’
As far as I understand, the "n-way" matches the type of processor and type of vectors the processor is able to process. So, a 4-way processor is able to process 4-way vectors. Each QPU processes 4 different 4-way vectors, or quads, over 4 successive clock cycles, thus virtually it can be considered as a 16-way processor (processing 16-way vectors, or 4 quads, at a time). But I am getting that this applies to each QPU, not for a group of them.

At the beggining of page 17, it is also stated that:
The front end of each QPU pipeline receives instructions from a shared instruction cache (icache). As one icache unit serves four QPUs in four successive clock cycles the front end pipelines of each of these four QPUs will be at different phases relative to each other. After instruction fetch there is a ‘re-synchronisation’ pipeline stage which brings all of the QPUs into phase with each other.
The idea I get is that each QPU on the slice receives the same instruction from the cache over 4 successive (different) clock cycles and works on a different 4-way vector. So, by the time that all 4 QPUs in a slice will be served an instruction, 4 clock cycles will have elapsed, enough for the first QPU to have run the instruction on 4 different quads. Then there is a resynchronization stage and the cycle starts over.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23311
Joined: Sat Jul 30, 2011 7:41 pm

Re: VideoCore IV QPUs

Wed May 28, 2014 9:35 am

And people wonder why the Quads are so difficult to work with......
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
"My grief counseller just died, luckily, he was so good, I didn't care."

Return to “Bare metal, Assembly language”