Yasuhiro Oikawa of Waseda University in Tokyo pointed a high-speed camera at the throat of a volunteer with one task in mind: To capture his/her voice without the use of a microphone.
Yes, you read that correctly. Oikawa and his team announced at the International Congress on Acoustics on June 3 that they used cameras to take thousands of images per second and record the motions of a person’s neck and voice box as they spoke. A computer program then turned the recorded vibrations into sound waves.
Why did they do this, you ask? Some lip-reading software programs are sophisticated enough to recognize different languages, but the end result doesn’t usually involve much more than a transcript, according to a ScienceNews article. In addition, microphones often record too much background noise, so Oikawa and his colleagues, looking for a new method of capturing vocal tones, came up with this idea.
The article explains that the researchers pointed the camera at the throats of two volunteers and had them say the Japanese word tawara, which means straw bale or bag. The team recorded them at 10,000 fps, and at the same time, recorded the volunteers’ words with a standard microphone and a vibrometer for comparison. The vibrations recorded by the camera vibrations can’t be recorded by a camera – I think you mean “interpreted by the camera data) were similar to the ones from the microphone and vibrometer, Oikawa said in the article.
After running the images though a computer program, the team reconstructed the volunteers’ voices well enough to hear and understand them saying tawara. Mechanical engineer Weikang Jiang of Shanghai Jiao Tong University in China noted Oikawa did not play audio of the reconstructed voices, but instead showed the comparison photos of the sound waves and vibrations.
Like Weikang, I am interested to hear what the audio sounds like.