gtoal
Posts: 113
Joined: Sun Nov 18, 2012 12:02 am

realtime audio FFT

Wed Mar 29, 2017 5:12 am

I've just discovered the 2014 post about GPU_FFT and the demo code in /opt/vc. Better late than never I guess :-)

Our beehive monitoring project could usefully use this to analyse audio from the beehive and log the data in realtime as we do for other sensors in the hive such as temperature and humidity.

Does anyone have a ready to use solution we can take and modify that feeds the microphone into the GPU FFT code in real time and assigns the various sounds into frequency buckets? We don't yet know the exact frequencies to expect but we can take a ballpark guess - there will be a concentration in the 200-400hz area but also occasional important sounds possibly up to 4KHz.

We don't need a continuous readout like an SDR waterfall - it would be good enough to get a set of data describing all the peaks seen within a 1-second period (as text output or in an array of a program that we could tweak)

We're not the first team to look into this area. Here's a good site that discusses the issues: http://www.beehacker.com/wp/?page_id=103

Image

YCN-
Posts: 246
Joined: Fri Jun 10, 2016 3:18 pm

Re: realtime audio FFT

Mon Apr 03, 2017 8:00 am

Hi,

I know that won't exactly answer your needs of knowledge but you could give a look into sox, which is a software that does this kind of audio processing. I think that you can find some piece of code out there that will do this for you. What are you calling real time? Do you really mean real time, REAL TIME?
If you mean continuously that's not the same thing, and you should find more simple solutions.

YCN-

A_Taste_of_Pi
Posts: 9
Joined: Wed Feb 22, 2017 9:30 am

Re: realtime audio FFT

Sat Apr 08, 2017 10:46 am

gtoal wrote:I've just discovered the 2014 post about GPU_FFT and the demo code in /opt/vc. Better late than never I guess :-)

Our beehive monitoring project could usefully use this to analyse audio from the beehive and log the data in realtime as we do for other sensors in the hive such as temperature and humidity.

Does anyone have a ready to use solution we can take and modify that feeds the microphone into the GPU FFT code in real time and assigns the various sounds into frequency buckets? We don't yet know the exact frequencies to expect but we can take a ballpark guess - there will be a concentration in the 200-400hz area but also occasional important sounds possibly up to 4KHz.

We don't need a continuous readout like an SDR waterfall - it would be good enough to get a set of data describing all the peaks seen within a 1-second period (as text output or in an array of a program that we could tweak)
I'm just learning about the Pi but I do know something about audio. You probably know more than I am about to say but just in case any of this is useful then here goes... The FFT will put the audio from the time domain into the frequency domain. Just as the time domain is sampled so the frequency domain is too. In effect both are handed to you in buckets. By adjusting the size of the FFT you set the size of the frequency buckets - few PCM samples = wide buckets, many = narrow buckets. However, you don't have to stop there. Take what is returned from the FFT and add two adjacent buckets together and you get one wider frequency bucket. FFTs return a real value and an imaginary value. Square each of them, add them and take the square root of the result to get the actual volume of sound in that bucket regardless of the phase of the audio - which is probably what you want here. If you want to combine buckets then do this absolute volume measurement operation first.

The FFT will return a linear spectrum - with an equal number of Hz per bucket - so you can use this combining the returned buckets to create a log spectrum - with an equal portion of an octave per bucket - instead. To do this leave the first bucket on its own then sum the next two to get the second then sum the next four to get the third and the next eight to get the fourth and so on. That might be useful here or might not. My guess is not.

You want to get the sound levels into decibels though. So take the values divide them by a reference (like maximum possible level) and convert the result by taking the log to the base 10 and multiplying that by 20. Don't worry if all your values are negative
it just means that all values are less than your reference. Now your analysis will work regardless of how close the queen is to the microphone.

If the queen makes a loud noise at 400Hz then the way you will detect it is to look for 400Hz sounds relative to nearby frequencies. You will want a bucket that is not too broad but wide enough for the different variations between different queens. My guess would be that a narrow bucket would be sufficient. You will then want adjacent buckets that are for the frequencies either side to complete your measurement. It will be the value of the 400Hz bucket relative to the adjacent buckets that will say that you have a 400Hz buzz. You need a threshold. Let's say 6dB. If you don't see at least 6dB difference then ignore the output - there is no event. If you get a difference of more than 6dB then start looking for an event. If it then lasts for say 10 seconds or longer then that would be your trigger - the event to say that the queen is buzzing (I am making up bee-speak here so apologies). What you could do is sample during the event and create a value which represents the number of dBs difference multiplied by the time the difference is present giving a certainty reading like "72" being equivalent to 6dB for 12 seconds or 7.2dB for 10 seconds but counting as a similar certainty.

I hope all of that is some use and not just a load of nonsense.

User avatar
Gavinmc42
Posts: 4339
Joined: Wed Aug 28, 2013 3:31 am

Re: realtime audio FFT

Sat Apr 08, 2017 11:50 am

I recently found this.
viewtopic.php?f=37&t=179516&p=1144330&h ... e#p1144330

According the RPF guys using the Pi 3 NEON instructions should be quite fast.
And this Compute library does have a FFT example.
ARM are working on porting it to Pi's?
You would have to use the pure NEON version but it should bee easier than the VC4 version.

Once you have a FFT then run it through the ML functions.
Just yesterday I was wondering what use the FFT example would be.
With the Worldwide decline of pollinating bees this is an important project.

These guys use LPC not FFT for birds
http://soundid.net/SoundID/SEQ_Recordings.html
Would LPC be better than FFT for insects too?
http://soundid.net/SoundID/Soft_LPC_Spectrogram.html

LPC waveforms also seem to have less noise, making machine learning detection easier?
http://www.soundid.net/SoundID/Papers/D ... 0Paper.pdf
Bird calls are more complex then insects? Could a Pi3 do it?
http://www.avisoft.com/soundanalysis.htm
Does bats too, but you need ultrasonic microphones.

LPC is also widely used for human speech codecs.
LPC is lossy which is good as that gets rid of noise.
If you can keep audio signals to 8bit, then you can run it through the 128bit NEON in parallel, 16 samples at a time.
AI/ML stuff neural networks are also fine with 8 bit data.

But this is all theory, going to need serious coding still.
This sort of stuff is why I am learning Aarch64/NEON on baremetal.
https://ultibo.org/forum/viewtopic.php? ... c42f#p2918
Still need to get PDM/i2s microphones working with DMA.
As a hardware guy I am pretty sure the Pi's can do this and much more, just a simple matter of software :lol:
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

gtoal
Posts: 113
Joined: Sun Nov 18, 2012 12:02 am

Re: realtime audio FFT

Sun Dec 22, 2019 5:02 am

I came across this old thread again today by accident and realised I never posted the results here. If anyone is interested, in the end we used 'cava' ( http://karlstav.github.io/cava/ ) with some modifications. You can see some of the graphical outputs that were generated in this github issues thread: https://github.com/karlstav/cava/issues/162

Graham

Return to “Graphics, sound and multimedia”