This article by Barbara Fox was published in U.S. 1 Newspaper on July 15, 1998. All rights reserved.

Unsmudging Background Noise

Next year Siemens will offer a Web browser that reads World Wide Web pages aloud, and a small Trenton-based company, Productivity Works, has a similar product on the market right now, but neither firm has announced a Web browser with voice controls. Still, you can bet their engineers are working on it.

So are engineers in lots of other R&D labs. When U.S. 1 started calling around to ask about voice-control technology, the first place we called -- Sarnoff Corporation -- said "Yes, we're doing it."

Sarnoff is partnering with two major forces in speech recognition industry to develop voice commands to activate appliances: your microwave, your refrigerator, your VCR, and maybe even your car (

"Our customers want to talk to PCs to input data with the voice. You could update a database with your voice or ask for your bank balance," says Bill Porter, director of business development in the speech area. He claims that another way to use voice commands with an automated system is with ordering fast food at a self-serve terminal. (And you thought it was tough ordering your Whopper from a live person!).

If your voice can successfully order a burger and fries, it can certainly activate a Web browser. But none of these applications will work well if background noise "smudges" the voice command. Sarnoff's "Clearvox" speech processing algorithms can "clean up" background noise on current voice command systems.

"Our forte is pre-processing. We are developing the underlying software, the enabling technology -- the algorithms -- to support a voice activated browser," says Porter, who is known in the company as a "dealmaker." The son of a postal employee and a domestic worker, he was a physics major at Lamar University, Class of 1967, and has a graduate degree from Texas Southern. "We take the noise out of the speech so that enough of what you say gets into the computer to command."

The engineers are wrestling with difficult problems to get the maximum amount of speech energy into the software synthesizer, to develop the enabling technology so that it works every time. The "old way" is spectral subtraction. The new way is with "hands-free speech processing algorithms." They offer three sets of algorithms for these scenarios:

Single-Microphone Speech Enhancement: both speech and noise are picked up by a single microphone. Clearvox must "adaptively estimate" the noise and use the estimate to enhance speech content. It would work for hands-free voice control in any area, dictation and data entry, mobile telephony, speech compression, children's toys, and hearing aids.

Constrained-Microphone Speech Enhancement: a speech signal is corrupted by environmental noise and recorded by the speaker's microphone. Meanwhile a second "reference microphone" positioned close to the noise source gives a good estimate of the noise signal. The software provides real-time noise cancellation for voice control and echo suppression. Applications might be command and control of PCs, voice control on an assembly line, and voice-activated drive-up ATMs.

Unconstrained-Microphone Speech Enhancement: Several microphones placed at different location record different mixtures but the signals can be unmixed to separate speech from noise. "The algorithm is considered `blind,'" says Porter, "because the only prior knowledge that is exploited is that the source signals are mutually uncorrelated." It is particularly useful for such applications as teleconferencing, voice control for home appliances, and voice control in automobiles, where the noises are constantly changing.

Imagine a blurred line an inch high. That's a picture of the sound in a "smudged" speech recording. Now imagine the picture of a Rorschach "ink-blot" test. It has skinny parts and fat parts. That shows the sound after the Sarnoff algorithms have cleaned it up.

The difference? It's the difference between what you hear on a very bad speaker phone and what you get on a very good hand-phone connection.

But don't look to Sarnoff for the voice-controlled web browser. "We are developing the underlying software, not the final product," cautions Porter. "In other words, Sarnoff makes the tires to go on the car, but it doesn't sell cars."

-- Barbara Fox

Sarnoff Corporation, CN 5300, Princeton 08543-5300. James E. Carnes, president & CEO. 609-734-2000; fax, 609-734-2040. Home page:

