PranavLal
Posts: 124
Joined: Fri Jun 28, 2013 4:49 pm

Re: Sight for the Blind for <100$

Tue Jan 05, 2016 2:39 pm

Hi all,
Teradeep is indeed significantly better than jetpack. It identified the curtains in my study correctly. It did call a cupboard shelves though it did say cabinet later and it called my window with a screen a fence.

It recognized painted walls and recognized my face and gender.

However, things got really interesting when I pointed it my main computer table which has a glass top. The first thing it told me was that the image was blurry. I got the camera closer and pointed at the open model 1 unit. It called it a bug. Given the wires on my table, I am not surprised.

I will be able to really test it once I get my new pair of glasses.

Pranav

mikey11
Posts: 354
Joined: Tue Jun 25, 2013 6:18 am
Location: canada
Contact: Website

Re: Sight for the Blind for <100$

Tue Jan 05, 2016 3:53 pm

Unfortunately I don't get to play yet because my 16gb card is just.a little too small for that image.

It turns out each SD card manufacturer makes them in slightly different sizes.

As a result, I looked into procedures that can prevent this problem from happening. There are ways to resize the filesystem to be slightly less than 16gb to ensure compatibility, and I will be doing that to each of my image files in the future.

I'm hoping I can pick up a 32gb card today so I can check it out. Then I want to try out that library to see if it executes any faster.

I am liking the reports coming in though. It does indeed sound like the teradeep network is superior at the current time.

I did get an email back from Pete warden who made jetpac and is now with Google indicating that they are working to eventually get 'tensor flow' working on the pi. Tensor flow is googles open source solution to deep learning applications, and I am looking forward to how that shapes up. I just cross my fingers that it will all work offline.

mr_indoj
Posts: 42
Joined: Wed Jul 01, 2015 9:28 am

Re: Sight for the Blind for <100$

Tue Jan 05, 2016 4:13 pm

I have begun tests with the thnets lib, so far i get it to compile and it seems to load the network. However, note that you have to use the original teradeep files from dropbox, (model.net and stat.t7). The one's shipped with the image had to be modified to work on the pi with torch.
I'm currently unable to get the testprogram to load the input image, it segfaults so will have to see if i can find something out.

mr_indoj
Posts: 42
Joined: Wed Jul 01, 2015 9:28 am

Re: Sight for the Blind for <100$

Tue Jan 05, 2016 8:08 pm

Got it working, apparently it didn't like my image. testprogram takes about 20 sec to run, the results is displayed as a series of float numbers, so doesn't make much sense right now until better understood..

mikey11
Posts: 354
Joined: Tue Jun 25, 2013 6:18 am
Location: canada
Contact: Website

Re: Sight for the Blind for <100$

Tue Jan 05, 2016 8:27 pm

ouch! 20 seconds is a long time to wait!

mikey11
Posts: 354
Joined: Tue Jun 25, 2013 6:18 am
Location: canada
Contact: Website

Re: Sight for the Blind for <100$

Tue Jan 05, 2016 9:20 pm

I found another 16gb card and was able to flash the image with teradeep.

I can't wait to demonstrate that to other people. The everyday objects that get found are much better than the other networks. I also find the interval is about 5 seconds which is just a hair above what I was getting from jetpac.

I am planning on adding a feature that is customizeable: Taking the rangefinder output and reading it after image classification. This will let the wearer know how close they are to nearby objects. I've found the vibration to be a less effective method than I had hoped, and I want to take the distance read over audio for a spin.

By creating a persistent configurable setting, people can easily leave it off.

mikey11
Posts: 354
Joined: Tue Jun 25, 2013 6:18 am
Location: canada
Contact: Website

Re: Sight for the Blind for <100$

Tue Jan 05, 2016 9:54 pm

Also mr indoj:

Thanks for keeping the coding between teradeep.py and jetpac.py similar. It makes changing and maintaining them easier.

PranavLal
Posts: 124
Joined: Fri Jun 28, 2013 4:49 pm

Re: Sight for the Blind for <100$

Wed Jan 06, 2016 1:29 am

Hi Mikey11,

Can you make the range finder audio speak along with the soundscapes of raspivoice?

If you need to mix audio to get this to work, would a program like jack help?

Pranav

mikey11
Posts: 354
Joined: Tue Jun 25, 2013 6:18 am
Location: canada
Contact: Website

Re: Sight for the Blind for <100$

Wed Jan 06, 2016 1:56 am

Pranav,

I am currently syncing a new image which has the following changes:

1. added distance readout to jetpac and teradeep. It read like this: "x.x meters" and the values range between 0.3 and 5 meters. This is read immediately after the objects.
2. added a configurable setting for making the distance audible/not audible
3. I redid the rangefinder vibration. In the past I used pulse width modulation because it ran as a subprocess from my main menu. As I have moved the rangefinder vibration to its own process, I no longer need to use PWM, and have changed it to make a series of pulses based on distance with a more insistent signal at close ranges. I find it works much better now than it did in the past, and will be less computationally expensive.

As for adding it to raspivoice, I don't know how to do that. the cpp code is simply beyond me for the most part. I don't know how to use jack yet either. I shall research jack, and then look at a place in the raspivoice code where I might insert that. I would also have to add a config reader to raspivoice to read the audible/not audible flag for the distance readout.

For one who is versed in C it should be easy to do. All you need to do is open the serial port at 9600 baud, and then find the values which come in as a string that is repeated with the letter "R" separating the readings in millimeters.

A few conversions to a number where you strip some decimals makes it into the x.x meters format which gets read out.

If I get the gumption to tackle that I will take a look. My next four or five days are pretty busy with a lot of travel and caring for two kids on my own for some of it.

mr_indoj
Posts: 42
Joined: Wed Jul 01, 2015 9:28 am

Re: Sight for the Blind for <100$

Wed Jan 06, 2016 8:05 am

I can take a look at that when i get the new image. I'm planning to take a look in raspiVoice anyway, to see if i can do something about the image processing from the camera.

mikey11
Posts: 354
Joined: Tue Jun 25, 2013 6:18 am
Location: canada
Contact: Website

Re: Sight for the Blind for <100$

Wed Jan 06, 2016 12:24 pm

of course I'm back to having syncing issues...

this time microsofts one drive is not working, but luckily I tried google drive again and now its working. Fickle world.

I will post the hopefully sync'd link within the next 12 hours I hope.

User avatar
seeingwithsound
Posts: 165
Joined: Sun Aug 28, 2011 6:07 am
Contact: Website

Re: Sight for the Blind for <100$

Wed Jan 06, 2016 12:42 pm

A blind user of raspivoice on a vanilla Raspberry Pi device today reported to me that using the latest image he got stuck with the startup message, apparently for lack of the rotary knob hardware that the After-Sight devices have. So he was unable to switch among apps and turn on raspivoice, for instance, or switch to teradeep. Maybe it would be good to include a keyboard equivalent of the rotary knob in a next device image, to have an alternative input means for vanilla Raspberry Pi devices?

Thanks,

Peter

mr_indoj
Posts: 42
Joined: Wed Jul 01, 2015 9:28 am

Re: Sight for the Blind for <100$

Wed Jan 06, 2016 2:53 pm

Sound from multiple processes is not a problem anymore it seems.
I just did a test:
I modified the raspivoice so that it looks for existence of an empty raspi_frame file. If that exists it reads an image from opencv.jpg, plays the soundscape and removes the rasp_frame file and then waits for it to appear again before doing anything more.

At the same time the python process that spawned raspivoice pulls images from the camera, but only outputs it to the file when raspi_frame disappears. At that moment i also tried to output text via espeak, the result is that i get speech that comes from the python process, but is synced with the soundscape playback.
hacky , yes but it seems to work and it helps to move the camera out of raspivoice, and we don't need raspivoce to read multiple frames until it can play.

PranavLal
Posts: 124
Joined: Fri Jun 28, 2013 4:49 pm

Re: Sight for the Blind for <100$

Wed Jan 06, 2016 5:38 pm

Hi mr_indoj and all,

What I would really like to see is for teradeep etc to sound the object at its position. So, if an object is in the left of the camera view, you hear it in your left ear.

From what you say, raspivoice can already play both audio streams so we can have the view being spoken and sonified. I see a very natural work flow coming here.
1. You use the sound schema to look at a scene.
2. If you do not understand what you are looking at or want confirmation, you look at it for a longer time which gives time for the object recognition to catch up. This is exactly what sighted people do when they need to study something.

As a first step, it would be nice to have the vOICe and object recognition sonify and speak the same frame. Let's do this first before we get to my position idea.

Pranav

mr_indoj
Posts: 42
Joined: Wed Jul 01, 2015 9:28 am

Re: Sight for the Blind for <100$

Wed Jan 06, 2016 6:26 pm

Hi Pranav!
Question is what should happen while we are waiting for the object detection? Should we play the same soundscape until the detection is finished and can be spoken?

User avatar
seeingwithsound
Posts: 165
Joined: Sun Aug 28, 2011 6:07 am
Contact: Website

Re: Sight for the Blind for <100$

Wed Jan 06, 2016 7:11 pm

Hi Pranav and mr_indoj,

Indicating the position of a recognized object in an image through spatial sound would be great, but I think the current crop of neural networks only returns a word or phrase plus a confidence score but no positional information at all, not even left or right. So for now this cannot be implemented.

Secondly, as long as evaluating a neural network takes longer than a soundscape, such as 5 seconds versus 1 second for a default soundscape, it is probably best to run both processes asynchronously. The soundscapes should definitely not be kept constant until the neural network catches up with its recognition result. I think the user can manage the lag and roughly remember what showed in the soundscape view a few seconds ago, and match that to the spoken recognition outcome. Indeed, as Pranav indicates, if the user has a specific interest in an item he/she will hold the camera steady for a few seconds with the object centered in the view, based on the soundscapes. The user may also move closer to the object of interest (or zoom in) until it fills most of the camera view, thus aiding the neural network by making it more salient and by dropping most other objects from the camera view.

Peter


Seeing with Sound - The vOICe
http://www.seeingwithsound.com

mr_indoj
Posts: 42
Joined: Wed Jul 01, 2015 9:28 am

Re: Sight for the Blind for <100$

Wed Jan 06, 2016 11:34 pm

The concept seems to work, soundscapes becomes a bit laggy when teradeep runs but it basically works.

mikey11
Posts: 354
Joined: Tue Jun 25, 2013 6:18 am
Location: canada
Contact: Website

Re: Sight for the Blind for <100$

Wed Jan 06, 2016 11:48 pm

Mr Indoj:

I am quite keen to test this out!
perhaps it would be best if you post an image when you think you have your initial bugs worked out? Then I can just redo my bits in a few days on that foundation?

Essentially it sounds like:

1. You are using image files, and an empty file as a flag to indicate when an image file is ready to be used.
This means we no longer need a video 'T' which would be processor intensive anyways, and considering what we are running, we don't have much to spare.

2. You have sounds queued up so that the soundscapes play as often as they can, but teradeep will pause the soundscapes after one finishes to say its bit and then soundscapes resume?

Even though you say there is a bit of lag, this may be acceptable. Users could always still choose to run one of raspivoice or teradeep on its own anyhow if it is too much of a bother.

Away from this forum, I have another coder who is making incremental improvements on the raspivoice code through optimizations. He has indicated to me that he has a few that have improved execution by about 8%. It's not much, but there are still unexplored avenues in overall optimization. Perhaps the lagginess will be manageable at some point.

I also wanted to chime in on Peters observations about the neural nets. Although I saw videos with positional information, I did not see a way to access that information in either jetpac or teradeep. It's possible that this capability was kept closed source on purpose as a way for those innovators to generate revenue/sell their companies to the bigger fish. I'm sure one day we will have it for free.

PranavLal
Posts: 124
Joined: Fri Jun 28, 2013 4:49 pm

Re: Sight for the Blind for <100$

Thu Jan 07, 2016 12:02 am

Hi all,

Laggy soundscapes will not do. They have to change with the scene. When walking, I keep the vOICe at double speed to ensure that the view stays synchronized with my environment.

Pranav

User avatar
seeingwithsound
Posts: 165
Joined: Sun Aug 28, 2011 6:07 am
Contact: Website

Re: Sight for the Blind for <100$

Thu Jan 07, 2016 7:30 am

mikey11 wrote:teradeep will pause the soundscapes after one finishes to say its bit and then soundscapes resume?
No, the soundscapes should never be interrupted in normal operation without user interaction. It would be annoying and distracting to have the steady flow of soundscapes representing complete and correct camera view composition interrupted by the still questionable identifications by the neural network. Just mix the soundscapes and the speech by running the two processes independently (hence asynchronously), and only provide a setting to control their relative volume. The vOICe for Android also mixes soundscapes and speech to speak compass directions and GPS-based street locations, where the latter two are event-driven and hence asynchronous processes.
mikey11 wrote:I also wanted to chime in on Peters observations about the neural nets. Although I saw videos with positional information, I did not see a way to access that information in either jetpac or teradeep. It's possible that this capability was kept closed source on purpose as a way for those innovators to generate revenue/sell their companies to the bigger fish. I'm sure one day we will have it for free.
I've seen such videos too, but it was (at least) a 2-stage process: one set of algorithms segments the image into predicted object locations with bounding boxes, and the content of all individual bounding boxes is then analyzed by a neural network for object identification. In this way one can obtain positional information from outside the neural network, but I have my doubts about the reliability of the segmentation that need not match the set of physical objects, while it would again add CPU load. Interesting to explore in a later stage of development.

Peter


Seeing with Sound - The vOICe
http://www.seeingwithsound.com

User avatar
seeingwithsound
Posts: 165
Joined: Sun Aug 28, 2011 6:07 am
Contact: Website

Re: Sight for the Blind for <100$

Thu Jan 07, 2016 7:50 am

mikey11 wrote:You are using image files, and an empty file as a flag to indicate when an image file is ready to be used.
This means we no longer need a video 'T' which would be processor intensive anyways, and considering what we are running, we don't have much to spare.
I do not know much about Raspberry Pi and its Linux flavor, but on Windows one can define a shared memory space that is shared among independently running programs. I have used this in the past to run multiple camera programs, each of them connected to a different camera, and with one "master" program collecting video frames from all running camera programs, for instance for stereo vision processing in a 2-camera setup. Something similar should be possible for multiple programs independently processing video frames from a single camera program. In-memory sharing of video frames should give negligible CPU overhead for the video 'T' and avoids the file I/O hack. (The file I/O approach may or may not work well depending on the caching properties of the OS determining whether frames are really passed via files or more quickly through "files" that are still cached in RAM.)

Peter


Seeing with Sound - The vOICe
http://www.seeingwithsound.com

mr_indoj
Posts: 42
Joined: Wed Jul 01, 2015 9:28 am

Re: Sight for the Blind for <100$

Thu Jan 07, 2016 9:22 am

First of all, when i say that soundscapes are laggy i mean that we get a pause between them, but the data seems ok. It sould not be old data, i don't queue things up.

We need the video thread, so that we have fresh frames from the camera. At least when using Opencv, it seems that you have to pull frames continuely to have fresh content.
I see the current solution as a hacky one, the video thread creates images upon request for the different processes. The problem is of course that the different parts comes from different places. For example, if raspivoice where a library, with callable python-bindings, then the image data could be passed as objects instead, i think that's the way to work ahead for.


I don't pause the soundscapes when teradeep read, it just reads on top of it. A volume setting could of course be implemented and applyed to the speech sound data.
I will continue to hack on this, so i can create a first image.

PranavLal
Posts: 124
Joined: Fri Jun 28, 2013 4:49 pm

Re: Sight for the Blind for <100$

Thu Jan 07, 2016 4:12 pm

mr_indoj,
<snip First of all, when i say that soundscapes are laggy i mean that we get a pause between them, but the data seems ok. It sould not be old data, i don't queue things up.
PL] Thanks for the clarification. Does the pause occur while the soundscape is sounding?

Your solution of simultaneous audio output is good and will work for now. Independently adjustable volumes will help but we will need a good way of setting them.

The first problem we need to solve is to get the menu talking while raspbivoice is running.

mikey11
Posts: 354
Joined: Tue Jun 25, 2013 6:18 am
Location: canada
Contact: Website

Re: Sight for the Blind for <100$

Thu Jan 07, 2016 6:52 pm

The first problem we need to solve is to get the menu talking while raspbivoice is running.
I hope you just mean the menu for raspivoice, and not the main menu that launches the programs.

If you want to hear what happens when you make the main menu stop being quiet it's actually quite easy. Go into the python code for menu.py and comment out every line that contains the bequiet variable.

You will find that although you can now operate the main menu, the menu within raspivoice is also activated. when you rotate to the right, you will hear both options walking over each other.

This unfortunately makes the job into three possibilities:

1. move all menu commands from raspivoice to the main menu, and relaunch raspivoice with each change by passing command line options ( I don't see this as too bad of an idea, as it also lets the other options get added to the config file if we so choose). raspivoice launches quickly, so it doesnt introduce performance degradation.

2. move everything into cpp. I'm against the second option because then I can't really participate. I do recognize that it gives performance advantages, but I don't feel like the menuing system in python is overly taxing, and I find it's very easy to understand.

3. This may sound weird to understand, but I think this is actually a really good idea, and uses more python so I can help. Remove all rotary encoder options from the raspivoice cpp code. migrate all the functions to menu.py. Once you have launched programs, the menu level can change to an 'operating menu' where the functions currently in raspivoice can be migrated, but we can also add the option to toggle teradeep, toggle raspivoice, toggle distance readout, and toggle vibration, or killall running programs and return to the main menu.

I strongly favour option 3.

I also want to let Peter know: I've heard your complaint about keyboard input being required, and I will put something in my next revision. I had been planning on arrow keys and the space bar. I just had a thought though. I was considering a number pad. This would be convenient because there are lots of cheap usb number pads (a quick ebay search found some for $7). This could act like a replacement for the rotary encoder for those people without the physical hardware I have been providing. This also opens up the possibility that more people will use and contribute to the software, or fork it for their own specialized needs.

Consider the following: The software stack where it is, any UVC camera (I didn't get the $2 ones sadly), rpi v2 $35, usb keypad(7$), rpi case ($10), and 10,000 mah external battery ($40). This lets people make compatible devices for minimal costs (<$100) as I had always intended, and they can do it out of mass production.

Obviously this adds even more wires to the mix, and removes the rangefinder. I already get complaints about the wires, but that price is absolutely amazing.

I don't think I can do it for a few days as I have my hands full with travel and kids, but maybe by the time that is done, mr. indoj will have a new image up, and I can add that in along with the spoken distance option. I also cleaned up a few other things where espeak was walking over itself on startup, and added a few contextual speech cues that were missing (ie. when you launch raspivoice and teradeep, it now says that rather than just launching without saying. That way the lag on teradeep startup is less worrisome.

So you can expect that fix in the next few weeks.

If you want to help out your friend in the meantime, you can provide system image files where you have changed the autolaunch settings. I would offer, but my record for getting cloud storage to work lately has been abysmal. I'm almost to the point of spending money on dropbox or something.

mr_indoj
Posts: 42
Joined: Wed Jul 01, 2015 9:28 am

Re: Sight for the Blind for <100$

Thu Jan 07, 2016 7:40 pm

I also favour option 3

Return to “Assistive technology and accessibility”