*Andrew Holme is well known to regular blog readers, as the creator of the awesome (and fearsomely clever) homemade GPS receiver. Over the last few months he’s been experimenting with writing general purpose code for the VideoCore IV graphics processing unit (GPU) in the BCM2835, the microchip at the heart of the Raspberry Pi, to create an accelerated fast Fourier transform library. Taking the Fourier transform of a function yields its frequency spectrum (i.e. the pure harmonic functions which can be added together to reconstruct the original function). In the following example, shamelessly lifted from Wikipedia, we have a function which oscillates roughly three times per second, and whose Fourier transform unsurprisingly has a peak around 3Hz.*

*Being able to perform lots of Fourier transforms quickly is useful for all sorts of audio and radio applications including, unsurprisingly, GPS. Ham radio enthusiasts will also find Andrew’s work very useful. In this guest post, Andrew talks about his Fourier transform library for the Pi.*

Last October, Eben attended the Radio Society of Great Britain (RSGB) Convention, where radio amateurs told him they wanted a speedy fast Fourier transform (FFT) library to do Software Defined Radio (SDR) projects on the Pi.

GPU_FFT is an FFT library for the Raspberry Pi which exploits the BCM2835 SoC V3D hardware to deliver **ten times the performance** that is possible on the 700 MHz ARM. Kernels are provided for all power-of-2 FFT lengths from 256 to 131,072 points inclusive.

GPU_FFT uses single-precision floating point for data and twiddle factors, so it does not compete on accuracy with double-precision libraries; however, the relative root-mean-square (rms) error for a 2048-point transform is less than one part per million, which is not bad.

The library runs on dedicated 3D hardware in the BCM2835 SoC, and communication between ARM and GPU adds 100µs of latency which is much longer than the shortest transform takes to compute! To overcome this, batches of transforms can be executed with a single call. Typical per-transform runtimes in microseconds are:

Points | batch=1 | batch=10 | batch=50 | FFTW | Speedup |

256 | 112 | 22 | 16 | 92 | 5.8x |

512 | 125 | 37 | 26 | 217 | 8.3x |

1024 | 136 | 54 | 45 | 482 | 10.7x |

2048 | 180 | 107 | 93 | 952 | 10.2x |

4096 | 298 | 256 | 240 | 3002 | 12.5x |

8192 | 689 | 624 | 608 | 5082 | 8.4x |

16384 | 1274 | 1167 | 1131 | 12005 | 10.6x |

32768 | 3397 | 3225 | 3196 | 31211 | 9.8x |

65536 | 6978 | 6703 | 6674 | 82769 | 12.4x |

131072 | 16734 | 16110 | 16171 | 183731 | 11.4x |

To get GPU_FFT enter the following at the command prompt:

`sudo rpi-update && sudo reboot`

To build and run the example program:

`cd /opt/vc/src/hello_pi/hello_fft`

make

sudo mknod char_dev c 100 0

sudo ./hello_fft.bin

API documentation can be found in the hello_fft folder.