Posts: 2
Joined: Thu Apr 30, 2020 9:50 pm

[Need Help] Voice Recognition & Audio Playback

Thu Apr 30, 2020 10:20 pm

So, I have an old(ish?) Pi 3b+ that I found while I was moving, and I want to do something dumb with it. I want to stuff the Pi, a battery pack (with power pass-through), a PS3 Eye camera, and a small USB powered speaker inside of a Trick or Treat Studios Good Guy Doll (Chucky).

Why a PS3 camera? Because the mic on it has amazing range and they go for like $7.

I want to use offline Voice Recognition software to pick up 4 basic commands ("Hi", "Ugly doll", etc etc...) and tie the phrase recognition to the playing of sound files so you get something like:

Me: "Hi, I'm Dave"
RasPI: "Hi, I'm Chucky, and I'm your friend till the end. Hidey-ho. Ha. Ha. Ha."

Honestly, I partially want to do this to freak out my wife, who already hates the doll (but loves the movies?) and then put it on my shelf at work. I work in marketing in the horror industry, so it makes sense.

I've been banging my head against the wall just trying to get a few of the offline Voice Recognition software packages to work. I think I've reflashed my Micro SD card about 60 times with 4 different Raspbian Lite builds trying to follow old tutorials from 2014 to 2018, but I keep failing at something.

I'm going to take a break from it tonight, but if anyone has any advice, or know of things that can help me out, that would be highly appreciated.

Posts: 249
Joined: Wed Jun 20, 2012 2:51 pm
Location: Southampton, England

Re: [Need Help] Voice Recognition & Audio Playback

Thu May 07, 2020 8:57 am

I had a look into this area, without much luck, but then recently came back and had some success. Have you looked at DeepSpeech?
There are some out-of-date instructions at ... respeaker/, but you now need to change all 0.6.1 to 0.7.0. The --lm and --trie options have also combined into a single --scorer option. More information is at, and a useful example of how to take input from a microphone at ... _streaming. The biggest problem I have found with that script is catching the start of the vocalisation, as it waits to hear something before running the analysis.
I also use a Playstation Eye as a USB microphone.

