Raspberry Pi - Speech To Text


27 posts   Page 1 of 2   1, 2
by ande765a » Fri Dec 28, 2012 2:40 pm
I just wanted to share a script I made, for translating voice (recorded from a USB mic) to text, using Google's speech-api...
Here's the code:
Code: Select all
#!/bin/bash
arecord -D "plughw:1,0" -q -f cd -t wav | ffmpeg -y -i - -ar 16000 -acodec flac file.flac
wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12
rm file.flac

You need to have ffmpeg installed by typing:
Code: Select all
sudo apt-get install ffmpeg
in raspbian.
Posts: 4
Joined: Fri Dec 28, 2012 2:27 pm
by netw1z » Sat Dec 29, 2012 11:20 pm
Awesome work dude! What are you working on making?
Posts: 7
Joined: Wed Dec 26, 2012 5:26 am
by ande765a » Sun Dec 30, 2012 8:26 am
I actually already connected it up with espeak and wolfram|alpha, so I have an almost working Siri thingy:)
Like you can say: "How big is the moon," and It will tell you the answer...
Posts: 4
Joined: Fri Dec 28, 2012 2:27 pm
by netw1z » Sun Dec 30, 2012 3:25 pm
Thats awesome! Im working along similiar lines, i also installed googlecl to read things and post things from googles command line interface for some of their services which is a bit eaiser than writing a scrapper from scratch whose format could change.

Are you using a usb or jack in mic or bluetooth mic? Im trying to get bluetooth going for voice and playback of audio
Posts: 7
Joined: Wed Dec 26, 2012 5:26 am
by ande765a » Sun Dec 30, 2012 4:10 pm
I'm using a USB webcam's microfone... But I'm pretty sure it will work with a USB mic too, if it's supported.
Posts: 4
Joined: Fri Dec 28, 2012 2:27 pm
by pipuppy » Fri Jan 04, 2013 12:33 pm
Hi ande765a,

Could you tell us what model f webcam you are using please?

Regards,

pipuppy
Posts: 50
Joined: Fri Aug 24, 2012 12:51 pm
by elvisimprsntr » Fri Jan 04, 2013 1:07 pm
I have SiriProxy running on the RPI which I use to control my home automation system using Siri from my mobile phone.
viewtopic.php?f=37&t=27529

I believe there is an equivalent method which uses Google speech to text.

Enjoy!

Elvis
http://www.youtube.com/user/TheElvisImprsntr
Posts: 131
Joined: Sat Dec 29, 2012 11:36 pm
by stevewardell » Thu Jan 24, 2013 1:39 pm
Have been experimenting getting USB audio working and speech to text, this is great, I swapped out ffmpeg for avconv (based on the deprecated warning from ffmpeg) and it works very well
Steve
@stevewardell
stevewardell.wordpress.com
Posts: 16
Joined: Thu Jun 07, 2012 9:15 pm
by ande765a » Thu Jan 24, 2013 5:23 pm
I'm using an old Logitech Quickcam Pro 4000...
The audio works, but I can't get the video working :(
pipuppy wrote:Hi ande765a,

Could you tell us what model f webcam you are using please?

Regards,

pipuppy
Posts: 4
Joined: Fri Dec 28, 2012 2:27 pm
by davejavu1969 » Wed May 07, 2014 6:26 pm
This is not necessarily Pi specific - but will affect anyone (including me!) who has been using this set of instructions to perform speech to text functions on the Pi.

It appears that the v1 version of the Google Speech API used in this has been deprecated.

I had a beautifully running app on the Pi - until last night. There is a v2 version of the api but I have been unable to get it to work with the Python script I have running on the Pi.

My original POST request was....

wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=44100" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12 >stt.txt

Which worked every time with a word perfect response saved to stt.txt - until last night. Grrr.

Have now updated this to....

wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=44100" -O - "http://www.google.com/speech-api/v2/recognize?lang=en-us&client=chromium&key=AIzaSyCnl6MRydhw_5fLXIdASxkLJzcJh5iX0M4" | cut -d\" -f12 >stt.txt

Details of v2 api and details of the 'key' element, I found here... https://github.com/gillesdemey/google-s ... /README.md

I can now connect to server, and get a reponse.I am writing in Python & the original v1 api call returned a perfect response everytime and was saved in stt.txt which I am then passing to another part of the app.

I now get a 200 OK response from google and I am receiving an approx 400b file (which seems tiny to me) back from the v2 api and this is written to SDTOUT.

But - the response that now gets saved to stt.txt is either the single word 'final' or I have twice had the word 'transcript' - and I have no idea why or what this means.

In one attempt with the identical settings, I got a portion of the sentance that I had spoken - every other test has produced the result above.

If anyone has got futher than me - all help is greatfully received. I had a fully working app running on the Pi until last night. :cry:

Many thanks,

D.
Posts: 13
Joined: Mon May 07, 2012 6:25 pm
by davejavu1969 » Wed May 07, 2014 6:43 pm
Update....

A bit more research has turned up the fact that the JSON being returned by google is in a different format. I suspect this is the root cause of the issue. How this new format can be converted into a text file is however sadly beyond me!
Posts: 13
Joined: Mon May 07, 2012 6:25 pm
by drcaptain » Tue Nov 11, 2014 6:09 pm
The problem I'm having seems to be a unique error. At this point, for my project, I only want to be able to get to the point of transcribing my speech to text. For the next phase of the project I will run a script that counts the words of a conversation for meetings that we have. (Should be a pretty cool way of quantifying our meetings and doing some interesting comparisons based on who is in the room for different meetings.) For now, though, the problem is this: when I run the script below, I do not get an error. It only says "Processing..." Then, without pressing Control C to stop recording, it immediately jumps to "You said: pi@drcaptain ~$"

Code: Select all
#1/bin/bash
echo "Recording..."
arecord -D "plughw:0,0" -q -f cd -t wav | ffmpeg - loglevel panic -y -i - -ar 16000 -acodec flac file.flac > /dev/null 2>&1

echo "Processing..."
wget -q -U "Mozilla5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/vs/recognize?lang=en-us&client=chromium&key=<MY KEY> | cut -d\" -f12 >stt.txt

echo -n "You said: "
cat stt.txt

rm file.flac > /dev/null 2>$1


I've also tried running different variations of code from the various tutorials available for Google Speech API.
I've searched all over and haven't seen anybody posting about the error that I'm experiencing. Anybody have any idea what's happening here and how I can get my speech to text?

Thanks!
Posts: 12
Joined: Tue Nov 11, 2014 6:12 am
by gtucker19 » Mon Nov 17, 2014 12:50 pm
The Google API is returning the translated file but it is in a different format. I'm also looking for an easy way to extract the data so I can translate the file to another language. The only difference from my code and yours posted above is I have v2 where you have vs and at the end I have just >file .txt without the "cut" command. That is the part that was not working with the newer format.
Posts: 5
Joined: Mon Nov 17, 2014 12:47 pm
by drcaptain » Wed Nov 19, 2014 1:32 am
Thanks for the suggestion, gtucker. Unfortunately, the error still occurs:

Code: Select all
You said: pi@X ~ $ ./speech2text.sh
Recording...
Processing...
You said: pi@X ~ $


The program allows 0 time to capture any recording. It instantly shows "Recording" and "Processing" virtually simultaneously. Any ideas what's up with that!?
Posts: 12
Joined: Tue Nov 11, 2014 6:12 am
by gtucker19 » Wed Nov 19, 2014 2:36 am
I actually got everything working today. Changed the -f12 to -f8 and everything worked fine. Ran numerous test today and translations were perfect.
Posts: 5
Joined: Mon Nov 17, 2014 12:47 pm
by drcaptain » Wed Nov 19, 2014 6:03 am
gtucker - thanks again! And that's awesome to hear that v2 of the api is working for *somebody*!

However, for me the same error persists.
Here is my wget line:

Code: Select all
wget -q -U "Mozilla/5.0" --post file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v2/recognize?lang=en-us&client=chromium&key=MY KEY" | cut -d\" -f8 >stt.txt


What else can I try?
Posts: 12
Joined: Tue Nov 11, 2014 6:12 am
by r3d4 » Wed Nov 19, 2014 9:40 am
davejavu1969 wrote:Update....

A bit more research has turned up the fact that the JSON being returned by google is in a different format. I suspect this is the root cause of the issue. How this new format can be converted into a text file is however sadly beyond me!


Dont know if this is still relevent but ..
For JSON processing take a look at a tool called "JQ" - It is like sed for json data.
To learn who rules over you, simply find out who you are not allowed to criticize.
.
Real life is, to most , a long second-best, a perpetual compromise between the ideal and the possible.
what about spike milligan??
.
User avatar
Posts: 768
Joined: Sat Jul 30, 2011 8:21 am
Location: ./
by gtucker19 » Wed Nov 19, 2014 12:32 pm
This is the code that is working on my pi.

echo "Recording your Speech (Ctrl+C to Transcribe)"

arecord -D plughw:0,0 -q -f cd -t wav -r 16000 | flac - -f --best --sample-rate 16000 -s -o file.flac
echo "Converting Speech to Text..."
wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v2/recognize?lang=en-us&client=chromium&key=API key” |cut -d\" -f8 >stt.txt
echo "extract recognized text"
cat stt.txt
echo "You Said:"
value=`cat stt.txt`
echo "$value"

You may want to test your API key by browsing to the "http:..." address in your browser and you should get the following message "400. That’s an error.Your client has issued a malformed or illegal request. Content-Type should be of the form: audio/xxx; rate=yyy That’s all we know." I was initially having issues with my API because it wasn't initialized correctly.

Good Luck
Posts: 5
Joined: Mon Nov 17, 2014 12:47 pm
by drcaptain » Wed Nov 19, 2014 5:49 pm
I don't get the 400 error. Instead, I get a 403 error:

"403. That’s an error.
Your client does not have permission to get URL /speech-api/v2/recognize?lang=en-us&client=chromium&key=AIzaSyCZqgSwomikHVUKSiwdtfoYSgMN7hq9q7g from this server. Invalid key. That’s all we know."

I've regenerated my key. To no avail.
Posts: 12
Joined: Tue Nov 11, 2014 6:12 am
by gtucker19 » Wed Nov 19, 2014 6:40 pm
That is what I was getting this weekend before I realized that the lower case l in the key was actually a capital I(eye). That is why I had you look for the 400 error which means your key is good. Also, make sure you have the Speech API enabled. Another way to make sure the API is working is to notice that your usage count increases every time you get a 400 error even though there is no translation.
Posts: 5
Joined: Mon Nov 17, 2014 12:47 pm
by drcaptain » Wed Nov 19, 2014 7:31 pm
So it is a capital I(eye) and not a lower case L. I've confirmed that because I can see in my Google dev console that I now have a 7 count for calls to the API. So good eye on that. I hadn't thought of changing that. (Doh!)

Even so, I'm still getting a 403 error. Even though my console is telling me that the API usage is being impacted.
Here's what my console tells me:

Response Code Count %
Success (2xx) 7 100%

Requests / sec
Success (2xx): 0.0167

Is there something in the RPi code you think I could debug to slow down the rate at which it goes from "Recording" to "Processing"?

This is nuts! The api and code is working for you. What am I missing here that I can't get it working on my end? Oi!
Really appreciate your hand in all this.
Posts: 12
Joined: Tue Nov 11, 2014 6:12 am
by gtucker19 » Wed Nov 19, 2014 10:50 pm
Glad you were able to get your code working with my API.

Good luck with future projects!!
Last edited by gtucker19 on Wed Nov 26, 2014 8:53 pm, edited 1 time in total.
Posts: 5
Joined: Mon Nov 17, 2014 12:47 pm
by drcaptain » Wed Nov 26, 2014 2:37 am
Holy crap! Gtucker! I think I finally got it to work using your API key.

Right now it's only converting short bursts of speech (~3-5 seconds). And I'm having trouble with my API key (i.e. it's not working). I'm going to try to reset it. But had to post this real quick to share my excitement.

Thanks so much for your help through this!!! Feels awesome!
Posts: 12
Joined: Tue Nov 11, 2014 6:12 am
by drcaptain » Wed Nov 26, 2014 3:16 am
Okay, so I'm not sure what's going on with my API keys. But they're not working.

AIzaSyCqS-vSYEuFZ65_bN1ucB3cNB4322XDWLY

Also turns out that Google Speech API will only translate 15 second clips. (Drats!)

I'd love to know why my APIs aren't working. If anybody has any ideas, let me know. I really need to be able to record for longer than 15 seconds. So I may have to hop over to trying to figure out how to use Sphinx open source speech tools. But this may do for now for my proof of concept. So I'll keep working...
Posts: 12
Joined: Tue Nov 11, 2014 6:12 am
by abhi68 » Wed Jul 15, 2015 5:01 pm
i am using the same code but i am getting a blank reply plz help me.
i have my api key n have entered correct commands.
Posts: 2
Joined: Wed Jul 15, 2015 4:58 pm