Corrections or additions?
This article by Barbara Fox was published in U.S. 1 Newspaper on July 15, 1998. All rights reserved.
Unsmudging Background Noise
Next year Siemens will offer a Web browser that reads World Wide Web pages aloud, and a small Trenton-based company, Productivity Works, has a similar product on the market right now, but neither firm has announced a Web browser with voice controls. Still, you can bet their engineers are working on it.
So are engineers in lots of other R&D labs. When U.S. 1 started calling around to ask about voice-control technology, the first place we called -- Sarnoff Corporation -- said "Yes, we're doing it."
Sarnoff is partnering with two major forces in speech recognition industry to develop voice commands to activate appliances: your microwave, your refrigerator, your VCR, and maybe even your car (http://www.sarnoff.com).
"Our customers want to talk to PCs to input data with the voice. You could update a database with your voice or ask for your bank balance," says Bill Porter, director of business development in the speech area. He claims that another way to use voice commands with an automated system is with ordering fast food at a self-serve terminal. (And you thought it was tough ordering your Whopper from a live person!).
If your voice can successfully order a burger and fries, it can certainly activate a Web browser. But none of these applications will work well if background noise "smudges" the voice command. Sarnoff's "Clearvox" speech processing algorithms can "clean up" background noise on current voice command systems.
"Our forte is pre-processing. We are developing the underlying software, the enabling technology -- the algorithms -- to support a voice activated browser," says Porter, who is known in the company as a "dealmaker." The son of a postal employee and a domestic worker, he was a physics major at Lamar University, Class of 1967, and has a graduate degree from Texas Southern. "We take the noise out of the speech so that enough of what you say gets into the computer to command."
The engineers are wrestling with difficult problems to get the maximum amount of speech energy into the software synthesizer, to develop the enabling technology so that it works every time. The "old way" is spectral subtraction. The new way is with "hands-free speech processing algorithms." They offer three sets of algorithms for these scenarios:
Imagine a blurred line an inch high. That's a picture of the sound in a "smudged" speech recording. Now imagine the picture of a Rorschach "ink-blot" test. It has skinny parts and fat parts. That shows the sound after the Sarnoff algorithms have cleaned it up.
The difference? It's the difference between what you hear on a very bad speaker phone and what you get on a very good hand-phone connection.
But don't look to Sarnoff for the voice-controlled web browser. "We are developing the underlying software, not the final product," cautions Porter. "In other words, Sarnoff makes the tires to go on the car, but it doesn't sell cars."
-- Barbara Fox
This page is published by PrincetonInfo.com -- the web site for U.S. 1 Newspaper in Princeton, New Jersey.