I did some quick simple benchmark, thought I'd share this here. It's probably not 100% accurate but gives a rough idea. Double precision operations are at least 7 times as fast compared to soft fpu emulation. This is for fpu operations which work on registers only ie require no memory access.

Code: Select all

```
soft float (official debian6 image):
operation (asm) ops per sec cycles per op
+ (bl __aeabi_dadd) 12 Mflops 59
* (bl __aeabi_dmul) 11 Mflops 64
/ (bl __aeabi_ddiv) 1.2 Mflops 590
sqrt (bl sqrt) 270 kflops 2500
exp (bl exp) 240 kflops 3000
log (bl log) 180 kflops 3900
sin (bl sin) 270 kflops 2500
erf (bl erf) 250 kflops 2800
pow (bl pow) 88 kflops 7900
hard float (unofficial raspbian image):
operation (asm) ops per sec cycles per op speed up
+ (fadd) 87/350 Mflops 8/2 7/30
* (fmuld) 77/350 Mflops 9/2 7/32
/ (fdivd) 22 Mflops 32 18
sqrt (fsqrtd) 15 Mflops 48 52
exp (bl exp) 2.3 Mflops 310 10
log (bl log) 1.3 Mflops 540 7
sin (bl sin) 2.7 Mflops 260 10
erf (bl erf) 3.8 Mflops 180 15
pow (bl pow) 0.8 Mflops 830 10
```