ASSIST: Final Report, Hearing Device

ASSIST: Hearing Device Group

Professor: Chenming Hu

Members: Martín, Brian, Chi long, Caroline, Amit, Georgina, and William

Hearing Device Group is one of the groups in ASSIST. As members of this group, we have defined our goal which is to assist those people who are deaf and help them to have better communication with other people in their outside world. In the beginning of this semester, we have visited couple centers for disable people and talked to some people who have experienced with deaf people. Afterwards, we have come up our very first project. The main purpose of this project is to help the deaf to communicate with other people using speech recognition computer software. In this project, we choose to use one of the best speech recognition software, which is called "Dragon Naturally Speaking." However, this recognition software is speaker dependent. Speaker dependent means that each user has to create his or her own voice file in order to use this software. Because this software is speaker dependent, if deaf people want to use this software, they need to create voice files for each person they know, and they cannot communicate with other people they don't know. Therefore, there isn't any deaf people using this software as their communication tool. Our job is to improve this software. We are going to make this software speaker independent by using "Unified Pitch Shifting Method." In this report, it shows why we choose to use this method, what we have tested and found out with the pitch shifting feature from a Karaoke Machine, what we are going to do in the coming semesters.

Testing from Karaoke machine

Karaoke is a very popular entertainment right now. People use it to sing their favorite songs. However, we use Karaoke machine because it contains the pitch-shifting feature, which is what we are going to use it with the Dragon Naturally Speaking program. The pitch-shifting feature can allow singers to change their voice pitch. They can shift their pitch between -8 and +8. Shifting to negative range would make the voice pitch output lower which means lower frequency output. On the other hand, shifting to the positive range would make the voice pitch much higher which means higher frequency output. Then, we have done some testing with the pitch-shifting feature to see if the Dragon's program takes the shifted pitch as input. Below are the testing procedures. After the testing, we have done some accuracy calculation >From the testing, we found out that Dragon's program is pitch sensitive. As the result, we came up with another approach for our project, which is called "Unified Pitch shifting method."

Procedure

After installing Dragon into our laptop, we created normal male and female voice file as a control voice files.
We connected the microphone to the karaoke and, then connected the karaoke to the laptop to create pitch-shifted voice files.
We set up the microphone sensitive level to 6, in the scale of 1 to 10, where 10 are most sensitive.
We set up the computer's volume level to 4, in the scale of 1 to 5, where 5 are loudest.
We set up the computer's volume control to low bass and low treble and kept the same setting through out the experiments.
We tried to keep the background noise level low by closing windows and working at the isolated corners.
Users were trained at least 30min and with an average of 120 paragraphs to create the voice files.
We created -2 and -l pitch shifted voice files of male and female.
When female used the male's shifted and normal voice files the results were not accurate, and vice versa.
However, when different male used male's shifted and normal voice files the results were acceptable, and vice versa.

Unified Pitch Shifting Method

As the fact that the Dragon Naturally Speaking itself is a user-dependent software, every time the person should open his/her own training file in order to start using Dragon Naturally Speaking. In the other words, the training file is a file, which contains his/her own specific set of voice in order to provide the software for a reference to translate the corresponding spoken words to texts. As one of the theme of our research, we try to make the Dragon Naturally Speaking user-independent. It means that one training file can be good enough for any single user, male or female. Therefore, we intended to investigate in a technique so called "Pitch Shifting".

Naturally, the human voice is some kind of wave, which has different frequencies and amplitudes. The wave of human voice looks like to following,

We were triggered by the idea from the karaoke machine, which provides the pitchshifting feature. By the karaoke machine, male voice can be easily transformed into female voice; female voice can be transformed into male voice.

So, what is the "Unified Pitch Shifting?" Unified Pitch Shifting means that to convert different frequency input into the same frequency output. Like the following waves:

This human voice waveform carries the same information except the frequency is urn fled. By applying the unified pitch shifting technique to the Dragon Naturally Speaking program. We believe that Dragon Naturally Speaking can have higher accuracy when other people share the same voice file. This also means that Dragon Naturally Speaking would become speaker independent.

This semester, we've used Dragon Naturally Speaking, which is a speaker-dependent system. This type of software generally provides high recognition rates, but with only 1 user.

In the future, we would like to make speaker-independent systems better. The first thing that we are trying is a voice normalization of the speech input. One such device that claims to be able to do this is the Ultra-Harmonizer 4000 which has a retail price of~4000$. A little too expensive for testing purposes.

Perhaps we will find a DSP chip by itself, which is already designed to "flatten" the wavelengths of the user's voice. We will continue looking for this and contacting companies.

Other possibilities might lie with software. There are already many, many algorithms out there that help increase the robustness of speech recognition software. Some of which are HMM, PLP, and RASTA (developed by Prof. Nelson Morgan here at ISCI which is at CAL). One step we will take is to contact Prof. Morgan at the ISCI and ask him about his steps towards developing a robust system.

Envisioned for the Coming Semester

We have envisioned endless possibilities for the hearing device. The device's final function, structure, and capabilities are limited only by the device team's scope of knowledge, time and resources. Our goal is to create a device that would aid the hearing impaired by acting as an answering service---answering the telephone, and transcribing real time speech into a typed message. Where the fun starts is in deciding the "how's¹¹of accomplishing such a goal. We have many ideas that next semester's team could take a variation of; deviate from, or straight out attempt.

The first consideration for the device is in how it would recognize a phone call. There are two ways to think about this. One is the easy way, and using the same techniques as an everyday answering machine, connect the device to a telephone and to the corresponding jack in the wall. Picture if you will, a box like object connected to the phone and wall. Or, perhaps the more technologically preferred method is to leave everything up to a computer to do. Nowadays, with the help of a modem, a computer is able to recognize incoming calls. In this situation, instead of a physical box like object for the device, we need instead to create software that would take over after noting an incoming call.

The second consideration is what to do after an incoming call is recognized. This is where the Dragon software comes in. As is, the Dragon software is designed to be user dependent. This year's hearing device team tries to add an additional device to make the software function as if it is user independent. If this is successful, by this stage, an incoming phone call should be able to be recognized, and real time speech should be transcribed into typed text.

The next, last, and perhaps the stage that calls for the most creativity is in deciding on the user interface that allows the hearing impaired to access the phone call messages that have been transcribed from real time speech to typed text. For those familiar with the "Eudora" software program that allows users to check, send and store e-mail, perhaps an analogous software program can be made to access transcribed phone messages. Or, perhaps, if the hearing device turns out to be a physical box, then the messages could spew out like a stock market's old ticker tape machine, or be contained digitally as a large beeper would.

Whatever the final design might be, again, the device is limited by the project team's time, scope of knowledge and resources. So perhaps a simple box like machine using the existing abilities of an answering machine to answer phone calls and simply adding another device that pipes the voices of the incoming messages to the Dragon software and storing the messages using the already available Dragon file storage program is best. So all that needs to be done for this project is a continuation of what this semester's team was trying to accomplish--4o add a device to make the Dragon software user independent. Along with some hook ups and wit, this simple version of the project can be done.

back to the home page