1 week ago

Seeing Wand uses Microsoft AI to describe things

Point this magic wand at an item and it’ll speak its name.
Phil King lifts the curtain to see how it’s done

Inspired by a blind cousin who would ‘look’ around his environment by way of touch, Robert Zakon has built a Seeing Wand that can speak the name of whatever it’s pointed at.

Housed in a makeshift PVC tube, a Pi Zero is connected to a Camera Module that takes a photo when a push-button is pressed. The image is sent to Microsoft’s Cognitive Services Computer Vision API to get a description, which is then spoken – using the open-source eSpeak speech synthesizer – through a Speaker pHAT.

This article first appeared in The MagPi 71. Click here to download a free digital copy of The MagPi magazine.

A Pi Camera Module is used to take photos of items, while speech is output through a Speaker pHAT

What does the Seeing Wand do?

“I was looking for a way to teach my kids about innovation through integration and had been wanting to test out both the Pi and emerging cognitive computing services,” explains Robert. “They were a bit sceptical at first, but warmed up to it and thought the end result was pretty awesome (their words). My eldest helped with assembly, and both aided in testing.”

Seeing Wand:Microsoft’s Cognitive Services Computer Vision API

Robert’s debut Raspberry Pi project, it came together over the course of a few weekends.

Asked why he chose Microsoft Cognitive Services over other image-recognition APIs, Robert responds: “Microsoft did a nice job with the API and it was fairly straightforward to integrate with. There was no particular reason for choosing it other than it appeared to be robust enough and free to use for our project.”

The results surprised him in terms of accuracy and level of detail: “People, pets, and large objects seem to be the sweet spot.”

Even when the wand gets it wrong, the results can be amusing. “My kids had a lot of fun whenever something was misidentified, such as pointing at a toy robot on a table and having it identified as ‘a small child on a chair’. Another example was pointing at our garage with a sloping roof and being informed there was ‘a skateboarder coming down a hill’ – still not sure what it thought the skateboarder was. My favourite, though, had to be when we pointed it at clouds and heard what sounded like ‘Superman flying across a blue sky’.”
As per its original inspiration, however, the Seeing Wand could be of serious use to partially sighted people. “Although there are smartphone apps that do the same thing, this could be a less expensive and more human-friendly device.”

Seeing Wand Tree

Point the wand at an item, press the button, and its description is spoken

Fine-tuning the Seeing Wand

Robert admits that the prototype wand is a little rough around the edges. “We have talked about making improvements both to the hardware and software. On the hardware side, we would solder all wires and buttons, and use a smaller battery in order to make it truly palm-sized and thinner so it could fit as the holding end of a white (blind) cane. For the software, we’d like to integrate the text recognition and possibly language translation services so signs and printed material could be read, and the face recognition service so people could be identified. Also, as the cognitive services are not yet perfect, it would be interesting to ‘poll’ multiple services and determine which identification is best through our own cognitive meta-service.”

Step-01: Wiring the electronics

Components include a Pi Zero W, Camera Module, and Speaker pHAT. Wiring is currently via a mini breadboard. The device is powered by a 2200 mAh power cube.

Step-02: PVC housing

The electronics are crammed into a PVC tube. The camera fits into a closet-rod-supporting end cap and is held in place by rigid insulation, with its lens up against the cap’s screw hole.

Step-03: Two buttons

The breadboard holds two push-buttons: one to take a photo of the item you want to identify, and the other – wired to the GPIO 03 and GND pins – to turn the Pi Zero W on and off.