colinh
Posts: 94
Joined: Tue Dec 03, 2013 11:59 pm
Location: Munich

ARM integer v. NEON/VFP float instruction timings

Tue Aug 22, 2017 4:35 pm

A. How do the instruction timings (number of clock cycles) compare for the following instructions:

Code: Select all

add   r0, r1, r2
vadd.s32  q0, q1, q2         // vector integer
vadd.f32  q0, q1, q2         // vector floating point
vadd.s32  s0, s1, s2         // scalar integer
vadd.f32  s0, s1, s2         // scalar floating point

mul   r0, r1, r2
vmul.s32  q0, q1, q2
vmul.f32  q0, q1, q2
vmul.s32  s0, s1, s2
vmul.f32  s0, s1, s2

B. Are the clock rates the same for ARM core and the NEON unit?

C. multiply instructions take several cycles. Do they stall the pipeline, or does the next instruction execute in the next cycle, as long as the result register of the multiply is not used?

Code: Select all

ldm  r0, {r1-r8}
add  r9, r10, r11   // executes with the ldm instruction carrying on in the background

mul  r0, r1, r2
add  r3, r4, r5               // what about this?

vmul.f32  q0, q1, q2
vadd.f32  q3, q4, q5          // or this?

vmul.f32  q0, q1, q2
add  r3, r4, r5               // or even this?

jahboater
Posts: 1682
Joined: Wed Feb 04, 2015 6:38 pm

Re: ARM integer v. NEON/VFP float instruction timings

Tue Aug 22, 2017 4:57 pm

NEON on ARMv8 is no longer a co-processor which may make a difference.
And ARMv8 NEON is quad issue - which should make a big difference.
So I believe NEON is very fast, but I have no idea about actual timings*
The Pi3 and the new Pi2 are ARMv8.

* Though not answering your questions, the only benchmarking I did was moving and setting large amounts of memory using NEON instead of memcpy/memset, and NEON was around 8-10x faster.

User avatar
Gavinmc42
Posts: 1379
Joined: Wed Aug 28, 2013 3:31 am

Re: ARM integer v. NEON/VFP float instruction timings

Thu Aug 24, 2017 4:24 am

I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

LdB
Posts: 521
Joined: Wed Dec 07, 2016 2:29 pm

Re: ARM integer v. NEON/VFP float instruction timings

Thu Aug 24, 2017 6:09 am

There is a couple of bugs or changes to the firmware that have occurred since Peters code, I don't know which.

I picked them up when I baremetalled the hello_fft sample on the Pi repo and it disagreed with some of the things Peters code does.

Return to “Bare metal”

Who is online

Users browsing this forum: No registered users and 3 guests