Voxware

Sarnoff’s Work

Research Think Tank: Siemens

IsSound: Speech Gone Bad

Ficomp: Silent Again

Corrections or additions?

These articles by Barbara Fox were prepared for the March 27, 2002

edition of U.S. 1 Newspaper. All rights reserved.

All Talk, Part II

Top Of Page
Voxware

At its peak, Voxware employed 92 people on College Drive

and was

burning

$4 million a year. Now it has six employees on Franklin Corner Road

plus 32 in Cambridge, Massachusetts.

Voxware started out using its special speech algorithms in multimedia

software but soon moved to the IP telephony market. It more than

quadrupled

in size in 1996 when it did a deal with Netscape, and it raised $20

million by going public.

This early success was based on expectations of how the Internet

telephone

industry would develop. In 1997 Voxware’s CEO, Michael Goldstein,

predicted that the technology used for phones on the Internet would

be in place by 2000 or 2002. Chat servers would replace expensive

phone links, and that Voxware’s algorithms would make the voice

transmission

more efficient.

Voxware began to reconsider its focus when Microsoft began giving

away Internet telephones and Voxware’s potential clients withdrew

from the market. In November, 1997, Goldstein was replaced by his COO,

Bathsheba J. Malsheen. In 1999 Bathsheba sold the speech coding

business to Lucent for $5.1 million in cash and bought an industrial

speech recognition products

business — user-friendly computer interfaces for warehouse,

customer

service, and fulfillment environments.

For sales to be effective, the speech recognition products need to

be packaged with the software that the warehouse is already using,

so Voxware’s clients are the software companies. It has bought a

voice-based

hardware system for industrial clients and made an alliance with an

Oracle-based warehouse management system.

"Industrial speech in the warehouse area has reached a base where

it looks like it is ready to start growing fairly rapidly, says Bill

Meisel, editor of Speech Recognition Update newsletter. But Voxware is

in a chicken or egg situation. "The market hasn’t

developed as rapidly as they have liked and they need more cash to

do an adequate sales job, but they need more sales to get cash."

"Voxware has only two or three real competitors — SyVox in

Boulder and VoCollect in Pittsburgh — and the space is very large.

It got slowed down by the Internet bust," says Meisel. "What

motivates

the need for more efficiency in the warehouse is internet ordering

— the individual small orders, where you need real efficiency

going through the warehouse, picking it, and packing it. But it does

require integration with warehouse management software, so Voxware

is making deals to bundle it to make a complete solution."

— Barbara Fox

Voxware Inc. (VOXW), 168 Franklin Corner Road,

Suite 3, Lawrenceville 08648. Bathsheba J. Malsheen PhD, president

and CEO. 609-514-4100; fax, 609-514-4101. Home page:

www.voxware.com

Top Of Page
Sarnoff’s Work

Working in the areas of both speech recognition and

speech

verification,

Sarnoff has done research for television manufacturers on how to

voice-control

the TV or VCR from a distance in a noisy environment (U.S. 1, July

15, 1998). Other applications might be speaking into a Palm Pilot

on a crowded city street, or controlling non-essential automotive

functions, such as the air conditioner or the radio.

Craig Fancourt is one of four Sarnoff researchers, working on speech

technology on a project by project basis, who belong to Peter Burt’s

Vision Technologies Group. Though many companies are working with

speech applications on a basic level, Sarnoff currently focuses on

two variables: when the speaker’s position is not known, and when

outside noise is going to interfere.

"The theme is the same, to allow the recognition to occur up to

four meters from the microphones, so that the speaker does not need

to use a headset," says Fancourt. A native of Bucks County,

Fancourt

graduated from the University of South Florida in 1987 and has a PhD

from the University of Florida.

One of Sarnoff’s methods is to work with four or eight microphones.

For instance, a microphone could be embedded in each corner of a Palm

Pilot or a television set. "If you can get the signals to add

up in phase, you get a tremendous boost in signal-to-noise ratio,"

he says. Current technology requires the speaker’s position to be

determined in advance. So, with a process called

"beamforming,"

Sarnoff is using algorithms to locate the speaker.

"Another approach we have used is to exploit the fact that one

person’s voice is independent from a noise source," says Fancourt.

"I can try to rearrange or filter the outputs of the microphones

until they have that same property of independence. Then miraculously

the voice and the noise show up on different channels."

"We work with the signal before it gets to the recognizer,"

says Fancourt, "to make the signals more robust. Where we have

excelled at Sarnoff is to create a hybrid of the two techniques,

called

Geometric Source Separation."

Fancourt’s personal dream application is in home automation —

so that from anywhere in the room you could control any light or

appliance.

This sounds ideal for the disabled, but is it a worthy goal to enable

someone to be a couch potato, too lazy even to cross the room to turn

out the light?

That would depend on how complicated the controls are. "In the

case of the VCR, voice control would be a major advantage, because

VCRs are complicated and have many menus to traverse. With an oven,

it is not that complicated now but who knows what ovens will be like

five years from now," says Fancourt. "And in the car, the idea

is to keep your eyes on the road, and if you can control the radio

and the air conditioner with your voice you will be a better

driver."

And can the technology be perfected? What if an oven turns on by

accident

when the dog barks — and it burns down the house? Fancourt agrees

that people have a very low tolerance for false positives, as when

a TV channel gets changed spontaneously, or when a barking dog turns

on the oven. But he believes that — at least in the home appliance

arena — perfection is possible. Says Fancourt: "I think it

will happen."

Sarnoff Corporation, 201 Washington Road, CN 5300,

Princeton 08543-5300. James E. Carnes, president & CEO. 609-734-2000;

fax, 609-734-2040. Home page: www.sarnoff.com

Top Of Page
Research Think Tank: Siemens

Four years ago Stuart Goose was among the scientists at Siemens

Corporate

Research (SCR) on College Road who were working on ways to enable

a car driver to hear E-mail and websites by using a driver information

system with a synthesized voice. Goose, a 1993 graduate of the

University

of Southampton (in the United Kingdom), was trying to make accessing

and processing information faster and more intuitive (U.S. 1, July

15, 1998).

The next step is next generation mobile applications. Goose thinks

SCR has the first solution of its kind to allow a commercial wireless

handheld device, such as a Palm Pilot, to offer a 3-D listening

experience.

With the ordinary text-to-speech broadcast, the nuances of microphone

placement are lost, in contrast to an actual broadcast, where a

reporter

and an interviewee really sound as if they are standing in different

spaces. Although the source of the audio broadcast is still only a

single page of text, listeners will hear the text "in 3-D,"

as if different locations are involved.

"We have created 3-D audio extensions to Synchronized Multimedia

Interaction Language (SMIL)," says Goose. "Our server-based

framework can receive a request, process the SMIL file, create

multiple

speech and audio objects simultaneously, and spatialize them in 3-D

space. The framework then multiplexes the audio objects into a single

stereo audio stream and transmits it wirelessly to a mobile

device."

This solution does not require the mobile device — or a desktop

streaming media player — to have any proprietary software.

Siemens Corporate Research Inc. (SI), 755 College

Road, Princeton Forrestal Center, Princeton 08540. Norbert Gaus,

president

and CEO. 609-734-6500; fax, 609-734-6565. Home page:

www.scr.siemens.com

Top Of Page
IsSound: Speech Gone Bad

IsSound Corporation, formerly known as Productivity Works, started

out trying to help those with visual disabilities to use the Internet.

Software devised by co-founders Ray Ingram and Mark Hakkinen could

"read" web pages aloud.

Five years ago the company was primarily a consulting organization

doing projects for government and nonprofits, providing access to

the disabled and the blind, says Elaine Verna, president of IsSound.

"We were trying to take those products commercial. We put all

our energies into creating a product." IsSound aimed to do

interface

design, software design, engineering and Internet-related development,

especially in universal access for those with disabilities.

Now the company is out of business, partly due to a contract with

Microsoft that was not renewed, and partly to a changed investment

environment. Last year when dot-coms lost their sheen, IsSound’s angel

investors, based in New York City, asked for a shorter

product-to-market

cycle than the company could deliver. "What became clear was that

the time frame to get the product created, delivered, and earning

revenue was going to take a longer time than anyone expected,"

says Verna.

Verna, a founder of Princeton Softech, joined the company in April,

2000. This was after the company took in money, changed its name from

Productivity Works to IsSound, and moved from

founder Ray Ingram’s house. Verna says that Mark Hakkinen, the other

co-founder, is working on disability access with a company that is

contracting to the Japanese government.

Ingram stepped down from being president in July or August but is

still a major shareholder. He says that Productivity Works and IsSound

were ahead of their time. "Two years later we are seeing some

of those products coming into the marketplace," he says. Competing

screen reader software packages have now become Web oriented, Ingram

points out. So his special Web software for the disabled has become

subsumed by existing programs that have become "web

intelligent."

"We were just still a little too early," says Verna. "We

needed some distribution mechanism. Also the funding climate changed

dramatically between April, 2000 and the first seed (investment)

round.

A year later it was a very different marketplace, and entrepreneurial

companies that did not have product and did not have revenue were

suffering."

"IsSound is winding down in an orderly way," says Verna, who

works now at Edison Venture Fund on Lenox Drive. "We have met

all our obligations."

IsSound Corporation (Productivity Works), Box

66068,

Lawrenceville 08648. Elaine Verna, president. 609-773-0007. Home

page: www.issound.com

Top Of Page
Ficomp: Silent Again

Ficomp had developed voice recognition solutions that enable people

to interact with computers using conversational speech — not just

in a nice quiet office but in noisy environments such as a brokerage

trading floor.

"We are out of that business. We couldn’t make it profitable,"

says Gary Richman, owner of the 25-year-old Ficomp Systems Inc. in

East Brunswick. "There may be folks making money in the speech

technology business, but I don’t know who they are."

Richman had been an electrical engineering major at Brooklyn

Polytechnic

Institute, Class of 1965, and worked in engineering design and

communications

before moving to sales and marketing. He founded his business in 1977.

In the 1980s he invested heavily in speech recognition technology,

first for Wall Street traders, then for warehouse systems. He lost

money and went back to his original business.

With 24 employees, Ficomp Systems now focuses on system integration

for banks, brokerages, and insurance companies — setting up

servers

and workstations, tying into communication networks, providing

training,

and doing application programs — to deal with the flow of data

and voice.

"Speech technology would have been a profitable market but a very

difficult market," says Richman now. "In the telephony

business

and in the switch business, voice recognition seems to be working

well, but in the financial industry, the revenue stream could not

support the development."

Still, Ficomp earned lots of notoriety for trying to make speech

recognition

work on Wall Street. The New York Stock Exchange rejected the system,

saying that 97 percent accuracy was not enough, but Bear Stearns did

buy the product for half its traders. "We used it for four or

five years in the late 1980s, early 1990s," says a New York-based

Bear Stearns employee who did not wish to be named. "It worked

if you were to follow the instructions, but there was a learning

curve.

People had a problem trying to adjust their voice. It probably came

out before its time."

To be 100 percent accurate, the traders would have had to repeat their

words sometimes, and most were not willing to do that. "With that

psychology, it was going to fail," says Richman. "There was

no way the technology was ever going to be 100 percent."

— Barbara Fox

Ficomp Systems Inc., 3 Barkley Court, East

Brunswick

08816. Gary Richman, president. 732-967-1234; fax, 732-238-4952.

Home

page: www.ficompsystems.com


Previous Story


Corrections or additions?


This page is published by PrincetonInfo.com

— the web site for U.S. 1 Newspaper in Princeton, New Jersey.

Facebook Comments