Speech Recognition with Pocketsphinx
33 posts
Page 1 of 2 1, 2
I'm currently working on a project that requires speech recognition. Though I haven't completed it yet, I thought some of you might be interested in the fairly detailed steps I took to install Pocketsphinx on the RPi. The steps start with a clean image of Debian wheezy and end with continuous speech testing. You can find it at https://sites.google.com/site/observing ... spberry-pi
- Posts: 30
- Joined: Mon Feb 27, 2012 12:18 pm
Thanks for this! Your instructions are quite good and clear. It at least gives an alternative to Julius (which I've written tutorials for in http://www.aonsquared.co.uk/raspi_voice_control) for speech recognition on the Raspberry Pi. Great possibilities!
- Posts: 21
- Joined: Sat Jan 28, 2012 6:40 pm
- Location: Bristol, UK
Actually, I used Julius for a windows version of my project (a universal remote control operated by speech commands) and liked it a lot. However, I think Pocketsphinx might be better suited for small systems like the RPi, so I'm giving it a try. The big hurdle is getting the audio input clean enough to get decent accuracy. The USB audio adapter I'm using does a great job under windows, so I know it's not the problem. I'm going to take another shot at using a Bluetooth headset and see if I can get better results.
- Posts: 30
- Joined: Mon Feb 27, 2012 12:18 pm
I like to suggest another alternative: Google Speech API and off-load the recognition to google, here's a quick example:
This came from here.
There's a C library libsprec but I'm still struggling with the dependencies on the r-pi.
I would like to hear from others on the accuracy of this approach.
- Code: Select all
arecord -D plughw:1,0 -f cd -t wav -d 3 -r 16000 | flac - -f --best --sample-rate 16000 -o out.flac; wget -O - -o /dev/null --post-file out.flac --header="Content-Type: audio/x-flac; rate=16000" http://www.google.com/speech-api/v1/recognize?lang=en | sed -e 's/[{}]/''/g'
This came from here.
There's a C library libsprec but I'm still struggling with the dependencies on the r-pi.
I would like to hear from others on the accuracy of this approach.
- Posts: 52
- Joined: Sat Jun 16, 2012 2:55 pm
ax206geek wrote:I like to suggest another alternative: Google Speech API and off-load the recognition to google, here's a quick example:
I cleaned it up a little and posted it here
This is now good enough for executing simple voice commands like hostname and halt
- Code: Select all
eval `gs_sp_to_txt.sh`
I'll order my r-pi to halt with it
- Posts: 52
- Joined: Sat Jun 16, 2012 2:55 pm
Hmm got pocketsphinx compiled ok using the above instructions, but had ZERO success with word recognition.
I haven't got it to recognize any single word correctly, and the words it produces don't sound like what was said at all ?!
It can tell the difference between no sound and speech, but doesn't recognise anything.
Recording a .wav from the mic (webcam pro 9000) produces a reasonable sound file, so I wonder if something else is wrong.
Can pocketsphinx run from a .wav file, and are there known good test .wav files out there ?
So I tried using google's online recognizer with the same setup - and it's near 100% accuracy !
So my mic setup seems ok.
Of course google only works with an internet connection, but it gets me up and running, thanks ax206geek
I haven't got it to recognize any single word correctly, and the words it produces don't sound like what was said at all ?!
It can tell the difference between no sound and speech, but doesn't recognise anything.
Recording a .wav from the mic (webcam pro 9000) produces a reasonable sound file, so I wonder if something else is wrong.
Can pocketsphinx run from a .wav file, and are there known good test .wav files out there ?
So I tried using google's online recognizer with the same setup - and it's near 100% accuracy !
So my mic setup seems ok.
Of course google only works with an internet connection, but it gets me up and running, thanks ax206geek
Pi count: 4 - File & print server / Wifi Webcam server, XBMC and tinkerPi !
How are you actually getting the speech onto the Pi in the first place? USB microphone or something?
My Raspberry Pi blog with all my latest projects and links to articles
http://raspberrypipod.blogspot.com. +++ Current project: PiPodTricorder - lots of sensors, lots of mini-displays, breadboarding, bit of programming.
http://raspberrypipod.blogspot.com. +++ Current project: PiPodTricorder - lots of sensors, lots of mini-displays, breadboarding, bit of programming.
- Posts: 209
- Joined: Mon Jun 25, 2012 10:41 am
I've tried a cheap (£3 !) usb sound card and old mic - which was just too noisy.
Best results with the Logitech Pro 9000 webcam, which shows up as a mic in raspbian,
- not cheap, but I had it already for skype.
Some webcams will allow the mic to be used as a separate device in linux, but not all.
Best results with the Logitech Pro 9000 webcam, which shows up as a mic in raspbian,
- not cheap, but I had it already for skype.
Some webcams will allow the mic to be used as a separate device in linux, but not all.
Pi count: 4 - File & print server / Wifi Webcam server, XBMC and tinkerPi !
@mikerr,
When I ran pocketsphinx_continuous using the setup I described, it did recognize various words in its dictionary despite the poor sound quality. If you give it a word not in its dictionary, it will always pick what it thinks is the closest match regardless of how far off it is.
There is an example of a pocketsphinx application that can process an audio file at this link: http://cmusphinx.sourceforge.net/wiki/t ... cketsphinx. I would expect good results from it because I think the real problem with using USB audio input has to do with USB packet loss (see viewtopic.php?f=28&t=5249)
I wish I could use the solution that ax206geek proposed, but my application isn't intended to be used with an internet connection.
When I ran pocketsphinx_continuous using the setup I described, it did recognize various words in its dictionary despite the poor sound quality. If you give it a word not in its dictionary, it will always pick what it thinks is the closest match regardless of how far off it is.
There is an example of a pocketsphinx application that can process an audio file at this link: http://cmusphinx.sourceforge.net/wiki/t ... cketsphinx. I would expect good results from it because I think the real problem with using USB audio input has to do with USB packet loss (see viewtopic.php?f=28&t=5249)
I wish I could use the solution that ax206geek proposed, but my application isn't intended to be used with an internet connection.
- Posts: 30
- Joined: Mon Feb 27, 2012 12:18 pm
I'm going to build pocketsphinx tonight and try and get some love with it, wish me luck! I'll report back any success/failure... love this little computer
- Posts: 3
- Joined: Fri Jun 08, 2012 3:34 am
Good luck, Biscuit! Looking fwd to your results!
My Raspberry Pi blog with all my latest projects and links to articles
http://raspberrypipod.blogspot.com. +++ Current project: PiPodTricorder - lots of sensors, lots of mini-displays, breadboarding, bit of programming.
http://raspberrypipod.blogspot.com. +++ Current project: PiPodTricorder - lots of sensors, lots of mini-displays, breadboarding, bit of programming.
- Posts: 209
- Joined: Mon Jun 25, 2012 10:41 am
Good news! I've successfully built pocketsphinx (it wasn't that bad) on the Rasp Pi (using the wheezy distro). I'm trying to figure out how to build a continuous audio input setup to try it out, but i'll post the specifics in a bit (it's labor day for me tomorrow, another 24 hours to geek out before my new job!)
- Posts: 3
- Joined: Fri Jun 08, 2012 3:34 am
Your instructions seem to need a usb audio card with both input and output. How would I set up the pi to use the built in audio output but have a usb microphone as input?
- Posts: 13
- Joined: Tue Aug 28, 2012 10:28 am
Just a quick update. Since I last posted in this thread, I have seen greatly improved results by setting the sampling rate to 48000 Hz. This might be specific to the chipset of my audio adapter (C-Media), but note that the default sampling rate is 8000 Hz.
- Posts: 30
- Joined: Mon Feb 27, 2012 12:18 pm
Hi,
I do not know python very well but I managed to control the GPIO outputs to turn on the LEDs.
I would now be able to control the outputs GPIO with voice commands, possibly in Italian, with python.
I could use a simple example.
thanks
I do not know python very well but I managed to control the GPIO outputs to turn on the LEDs.
I would now be able to control the outputs GPIO with voice commands, possibly in Italian, with python.
I could use a simple example.
thanks
- Posts: 1
- Joined: Fri Nov 02, 2012 5:09 am
mikerr wrote:So I tried using google's online recognizer with the same setup - and it's near 100% accuracy !
So my mic setup seems ok.
What is the link for Google online recognizer? Do you mean on an Android device?
Thanks
- Posts: 27
- Joined: Mon Aug 20, 2012 1:35 am
Just a tip: If you install the python development headers (python-dev and/or python2.7-dev) before building sphinxbase and pocketsphinx, the Python API module will be installed by default. This way, you can simply:
import pocketsphinx as ps
speechRec = ps.Decoder()
wavFile = file(wavfile,'rb')
wavFile.seek(44)
speechRec.decode_raw(wavFile)
result = speechRec.get_hyp()
print result[0]
import pocketsphinx as ps
speechRec = ps.Decoder()
wavFile = file(wavfile,'rb')
wavFile.seek(44)
speechRec.decode_raw(wavFile)
result = speechRec.get_hyp()
print result[0]
- Posts: 2
- Joined: Sat Dec 15, 2012 8:38 am
snowhite wrote:mikerr wrote:So I tried using google's online recognizer with the same setup - and it's near 100% accuracy !
So my mic setup seems ok.
What is the link for Google online recognizer? Do you mean on an Android device?
see ax206geek's posts above
Pi count: 4 - File & print server / Wifi Webcam server, XBMC and tinkerPi !
Try the gstreamer api: http://cmusphinx.sourceforge.net/wiki/gstreamer
I'm successfully using it on the pandaboard, should do as well on rpi.
I'm successfully using it on the pandaboard, should do as well on rpi.
Prebuild GStreamer 1.0 + gst-omx Packages for Raspbian are available here: http://www.raspberrypi.org/phpBB3/viewtopic.php?p=293634#p293634
- Posts: 72
- Joined: Tue Oct 30, 2012 6:17 pm
- Location: Hamburg, Germany
I'm currently working on my own home automation project and want to include some voice recognition to be able to control the lights and tv.
all this is being handled with a java tool that runs on the pi. My arduino is connected to the pi as well to do the actual turning on an off of the lights.
But i find the speech recognition on the pi rather slow and fairly inaccurate (english isn't my main language), therefore i find the solution to offload the recognition to google rather interesting. It also gives a much better result.
but i'm sure google won't be happy that i'm sending them a voice sample every 5 seconds.
Therefore i was thinking about using the pocketsphinx solution anyway but only to capture 1 keyword (computer). my tool could then switch from pocketsphinx to google recognition if the keyword is recognized by pocketsphinx. This brings me back to my original problem. i want only 1 keyword to be recognized, which is "computer". although i'm certainly not a language expert but i feel that the pure size of the dictionary might be the limiting factor. i've tried to make my own dictionary but the software required to do so times out upon download.
Can i change stuff in the already available model? of do you guys have another solution for just capturing keywords?
all this is being handled with a java tool that runs on the pi. My arduino is connected to the pi as well to do the actual turning on an off of the lights.
But i find the speech recognition on the pi rather slow and fairly inaccurate (english isn't my main language), therefore i find the solution to offload the recognition to google rather interesting. It also gives a much better result.
but i'm sure google won't be happy that i'm sending them a voice sample every 5 seconds.
Therefore i was thinking about using the pocketsphinx solution anyway but only to capture 1 keyword (computer). my tool could then switch from pocketsphinx to google recognition if the keyword is recognized by pocketsphinx. This brings me back to my original problem. i want only 1 keyword to be recognized, which is "computer". although i'm certainly not a language expert but i feel that the pure size of the dictionary might be the limiting factor. i've tried to make my own dictionary but the software required to do so times out upon download.
Can i change stuff in the already available model? of do you guys have another solution for just capturing keywords?
- Posts: 5
- Joined: Mon Jan 30, 2012 4:46 pm
Could Voice recognition in a Java script help you?
Have a look here:
http://www.aonsquared.co.uk/raspi_voice_control
http://www.aonsquared.co.uk/node/30
Have a look here:
http://www.aonsquared.co.uk/raspi_voice_control
http://www.aonsquared.co.uk/node/30
Voice recognition in javascript isn't the solution for me. I've found some java code though to handle some speech recognition but the library isnt from oracle, they let 3rd party developers provide the library.
Before i'm going in that direction i just wanted to make sure if there isnt already a working binary available which i could talk to, since i don't know where the java lib is going to take me
.
All this work, just to be able to say "computer" to my pi
Before i'm going in that direction i just wanted to make sure if there isnt already a working binary available which i could talk to, since i don't know where the java lib is going to take me
All this work, just to be able to say "computer" to my pi
- Posts: 5
- Joined: Mon Jan 30, 2012 4:46 pm
tommekevda wrote:but i'm sure google won't be happy that i'm sending them a voice sample every 5 seconds.
Could it be you mean Apple? Check SiriProxy?
I've had the same thought about Apple's Siri. Onde day, Apple might block other users and just cater for iOS users. But up until now, they haven't done so. They probably don't care...
I havent checked siriproxy yet because i didnt know that existed.
But i ment google.
But i ment google.
- Code: Select all
CODE: SELECT ALL
arecord -D plughw:1,0 -f cd -t wav -d 3 -r 16000 | flac - -f --best --sample-rate 16000 -o out.flac; wget -O - -o /dev/null --post-file out.flac --header="Content-Type: audio/x-flac; rate=16000" http://www.google.com/speech-api/v1/recognize?lang=en | sed -e 's/[{}]/''/g'
- Posts: 5
- Joined: Mon Jan 30, 2012 4:46 pm
tommekevda wrote:I havent checked siriproxy yet because i didnt know that existed.
Funny. I didn't know about the Google Voice API. Even Google doesn't seem to know it.
Python and GV?
http://code.google.com/p/pygooglevoice/