An assistant for Google Assistant.
A year ago, I wrote a note about my raspberry pi project to add additional functionality to my “Google Home” (GH).
This note is an update and shows my drift towards AI.
At that time, I used a GH and the “If This Then That” (IFTTT) service to route selected requests to a google spreadsheet. The spreadsheet then compiled a text response from data stored in the spreadsheet. I.e acting as a simple database.
My RPi then polled the response cell in the spreadsheet using a Google api then spoke it using Google TTS via mplayer and a usb loudspeaker.
Accessing Google “voice recognition” this way, provides exceptional quality, able to cope with high background noise and free unlimited use. Over the last year I have modified this strategy to make it more flexible. I felt that the RPi wasn’t earning its keep!
I still use GH + IFTTT but now route the request as an HTTP call to a simple python Flask web-server app running continuously on the RPi. The Rpi then compiles its own response and speaks wirelessly from the GH. The RPi is thus hidden away!
I don’t have a fixed public IP so I use a free DNS service that I update using my own python “DNS updater” app running on the same RPi. It polls a free “whats my IP” service and given experience to date, tries alternatives if the lookup service fails.
The RPi also runs my Artificial Neural Network (ANN) app, using python and “numpy”. My ANN app converts the request text, letter by letter into a 600 bit binary input. The 3 layer ANN recognises the phrase and if present, any name contained in the phrase and passes them to an appropriate python function such as my “date of birth” function.
I trained the ANN to recognise alternative forms of the same request so for example, “how old is jim”, “when was jim born” or when is jim’s birthday” all call my “date of birth” function. I also trained it to accept alternative names so for example both “jim” and “james smith” are recognised as the same person.
It selects “functions” with 100% accuracy and “names” with 99.3% accuracy.
There are currently 11 core functions and 90 names.
I manage the RPi from my laptop running VNC and a caja file manager on Mint Mate Linux.
By connecting to the RPi using caja I can edit and run its python files directly on the laptop. This allows fast offline training of the ANN without having to copy files between machines. Training involves feeding the ANN with all variations of all phrases and names, hundreds of thousands of times. Fortunately this only takes 10-20 mins and only needs repeating if I add or edit functions or names.
I have bundled a host of my own python functions in a module that gets imported into the ANN app.
The ANN polls for the latest request in a loop. This has the advantage that the module remains loaded and hence doesn’t cause delays when the server receives a new request.
The function module includes GTTS, Pico and Festival TTS functions although I prefer to use the higher quality online GTTS.
I now use stream2chromecast-Master to send the response audio to a selected GH speaker as GH contains its own chromecast function.
This has the advantage that I can now make a request and get the response at any one of my 3 GH’s. Switching to a different GH is via a vocal request.
Oddly I can no longer direct to any of the stand alone chromecast devices that feed my home audio systems. This would have been useful for loud announcements!
Whilst I can use Google Assistant on my phone to make requests whilst away from home,
Stream 2chromecast can’t cast to the phone, so instead, I us my own function to update a web page on the Flask server which includes an HTML5 sound player containing the spoken response, giving me full, away from home access.
a 4th python app on the same RPi supplements my home security. It is intentionally independent of my main system and has its own sensors connected via GPIO and runs like the other apps as an independent continuous loop. It gives interesting advice to an intruder via my 3 GHs as well as emailing me of their presence! I control it by voice via the GH-IFTTT-RPi strategy.
I have 3 future developments underway.
Whilst only a proof of concept at the moment I have demonstrated another high quality offline TTS python function. I used GTTS to pre-record the unchanging part of my responses plus the numbers 0-100, month-names and days of the week etc. The function stitches the requisite responses together using sox. Whilst it works well,there will be times when responses need to contain new words. To keep it offline, these will have to be generated by Festival or Pico, so may sound odd. I haven’t developed that bit yet.
I have tried the Zamia offline voice recognition published in MagPi issue 72 on both my laptop and on a Rpi 3b with a sandisk extreme sd card containing the latest Stretch.
It happily converts a request on both, but sadly on the Rpi, the “while” loop in the demo for continuous recognition fails to loop! Advice welcome!
Accuracy is moderate and will probably best support a menu of simple commands rather than plain english phrases, particularly avoiding names. Saying that, “How old is bob” converted reliably when spoken with clear gaps between words. It will suffice to intigrate into my project as a backup input if I can overcome the loop failing.
Finally. Whilst the GH-IFTTT-RPi strategy is cumbersome I have conducted proof of concept that simple interaction with RPi is tolerable. I arranged the RPi to provide instructions for preparing a “Jalfrezi curry” requiring my verbal prompting when I want the next step or for RPi to repeat the previous step. Given this minor success my next objective is to advance the ANN to participate in slightly more complex conversations. This will use multiple ANN to handle different areas of the conversation.
Finally, my projects are strictly my hobby so I’m happy to provide further details.