Corrections or additions?
These articles by Barbara Fox were prepared for the March 27, 2002
edition of U.S. 1 Newspaper. All rights reserved.
All Talk, Part II
At its peak, Voxware employed 92 people on College Drive
$4 million a year. Now it has six employees on Franklin Corner Road
plus 32 in Cambridge, Massachusetts.
Voxware started out using its special speech algorithms in multimedia
software but soon moved to the IP telephony market. It more than
in size in 1996 when it did a deal with Netscape, and it raised $20
million by going public.
This early success was based on expectations of how the Internet
industry would develop. In 1997 Voxware’s CEO, Michael Goldstein,
predicted that the technology used for phones on the Internet would
be in place by 2000 or 2002. Chat servers would replace expensive
phone links, and that Voxware’s algorithms would make the voice
Voxware began to reconsider its focus when Microsoft began giving
away Internet telephones and Voxware’s potential clients withdrew
from the market. In November, 1997, Goldstein was replaced by his COO,
Bathsheba J. Malsheen. In 1999 Bathsheba sold the speech coding
business to Lucent for $5.1 million in cash and bought an industrial
speech recognition products
business — user-friendly computer interfaces for warehouse,
service, and fulfillment environments.
For sales to be effective, the speech recognition products need to
be packaged with the software that the warehouse is already using,
so Voxware’s clients are the software companies. It has bought a
hardware system for industrial clients and made an alliance with an
Oracle-based warehouse management system.
"Industrial speech in the warehouse area has reached a base where
it looks like it is ready to start growing fairly rapidly, says Bill
Meisel, editor of Speech Recognition Update newsletter. But Voxware is
in a chicken or egg situation. "The market hasn’t
developed as rapidly as they have liked and they need more cash to
do an adequate sales job, but they need more sales to get cash."
"Voxware has only two or three real competitors — SyVox in
Boulder and VoCollect in Pittsburgh — and the space is very large.
It got slowed down by the Internet bust," says Meisel. "What
the need for more efficiency in the warehouse is internet ordering
— the individual small orders, where you need real efficiency
going through the warehouse, picking it, and packing it. But it does
require integration with warehouse management software, so Voxware
is making deals to bundle it to make a complete solution."
— Barbara Fox
Suite 3, Lawrenceville 08648. Bathsheba J. Malsheen PhD, president
and CEO. 609-514-4100; fax, 609-514-4101. Home page:
Working in the areas of both speech recognition and
Sarnoff has done research for television manufacturers on how to
the TV or VCR from a distance in a noisy environment (U.S. 1, July
15, 1998). Other applications might be speaking into a Palm Pilot
on a crowded city street, or controlling non-essential automotive
functions, such as the air conditioner or the radio.
Craig Fancourt is one of four Sarnoff researchers, working on speech
technology on a project by project basis, who belong to Peter Burt’s
Vision Technologies Group. Though many companies are working with
speech applications on a basic level, Sarnoff currently focuses on
two variables: when the speaker’s position is not known, and when
outside noise is going to interfere.
"The theme is the same, to allow the recognition to occur up to
four meters from the microphones, so that the speaker does not need
to use a headset," says Fancourt. A native of Bucks County,
graduated from the University of South Florida in 1987 and has a PhD
from the University of Florida.
One of Sarnoff’s methods is to work with four or eight microphones.
For instance, a microphone could be embedded in each corner of a Palm
Pilot or a television set. "If you can get the signals to add
up in phase, you get a tremendous boost in signal-to-noise ratio,"
he says. Current technology requires the speaker’s position to be
determined in advance. So, with a process called
Sarnoff is using algorithms to locate the speaker.
"Another approach we have used is to exploit the fact that one
person’s voice is independent from a noise source," says Fancourt.
"I can try to rearrange or filter the outputs of the microphones
until they have that same property of independence. Then miraculously
the voice and the noise show up on different channels."
"We work with the signal before it gets to the recognizer,"
says Fancourt, "to make the signals more robust. Where we have
excelled at Sarnoff is to create a hybrid of the two techniques,
Geometric Source Separation."
Fancourt’s personal dream application is in home automation —
so that from anywhere in the room you could control any light or
This sounds ideal for the disabled, but is it a worthy goal to enable
someone to be a couch potato, too lazy even to cross the room to turn
out the light?
That would depend on how complicated the controls are. "In the
case of the VCR, voice control would be a major advantage, because
VCRs are complicated and have many menus to traverse. With an oven,
it is not that complicated now but who knows what ovens will be like
five years from now," says Fancourt. "And in the car, the idea
is to keep your eyes on the road, and if you can control the radio
and the air conditioner with your voice you will be a better
And can the technology be perfected? What if an oven turns on by
when the dog barks — and it burns down the house? Fancourt agrees
that people have a very low tolerance for false positives, as when
a TV channel gets changed spontaneously, or when a barking dog turns
on the oven. But he believes that — at least in the home appliance
arena — perfection is possible. Says Fancourt: "I think it
Princeton 08543-5300. James E. Carnes, president & CEO. 609-734-2000;
fax, 609-734-2040. Home page: www.sarnoff.com
Four years ago Stuart Goose was among the scientists at Siemens
Research (SCR) on College Road who were working on ways to enable
a car driver to hear E-mail and websites by using a driver information
system with a synthesized voice. Goose, a 1993 graduate of the
of Southampton (in the United Kingdom), was trying to make accessing
and processing information faster and more intuitive (U.S. 1, July
The next step is next generation mobile applications. Goose thinks
SCR has the first solution of its kind to allow a commercial wireless
handheld device, such as a Palm Pilot, to offer a 3-D listening
With the ordinary text-to-speech broadcast, the nuances of microphone
placement are lost, in contrast to an actual broadcast, where a
and an interviewee really sound as if they are standing in different
spaces. Although the source of the audio broadcast is still only a
single page of text, listeners will hear the text "in 3-D,"
as if different locations are involved.
"We have created 3-D audio extensions to Synchronized Multimedia
Interaction Language (SMIL)," says Goose. "Our server-based
framework can receive a request, process the SMIL file, create
speech and audio objects simultaneously, and spatialize them in 3-D
space. The framework then multiplexes the audio objects into a single
stereo audio stream and transmits it wirelessly to a mobile
This solution does not require the mobile device — or a desktop
streaming media player — to have any proprietary software.
Road, Princeton Forrestal Center, Princeton 08540. Norbert Gaus,
and CEO. 609-734-6500; fax, 609-734-6565. Home page:
IsSound Corporation, formerly known as Productivity Works, started
out trying to help those with visual disabilities to use the Internet.
Software devised by co-founders Ray Ingram and Mark Hakkinen could
"read" web pages aloud.
Five years ago the company was primarily a consulting organization
doing projects for government and nonprofits, providing access to
the disabled and the blind, says Elaine Verna, president of IsSound.
"We were trying to take those products commercial. We put all
our energies into creating a product." IsSound aimed to do
design, software design, engineering and Internet-related development,
especially in universal access for those with disabilities.
Now the company is out of business, partly due to a contract with
Microsoft that was not renewed, and partly to a changed investment
environment. Last year when dot-coms lost their sheen, IsSound’s angel
investors, based in New York City, asked for a shorter
cycle than the company could deliver. "What became clear was that
the time frame to get the product created, delivered, and earning
revenue was going to take a longer time than anyone expected,"
Verna, a founder of Princeton Softech, joined the company in April,
2000. This was after the company took in money, changed its name from
Productivity Works to IsSound, and moved from
founder Ray Ingram’s house. Verna says that Mark Hakkinen, the other
co-founder, is working on disability access with a company that is
contracting to the Japanese government.
Ingram stepped down from being president in July or August but is
still a major shareholder. He says that Productivity Works and IsSound
were ahead of their time. "Two years later we are seeing some
of those products coming into the marketplace," he says. Competing
screen reader software packages have now become Web oriented, Ingram
points out. So his special Web software for the disabled has become
subsumed by existing programs that have become "web
"We were just still a little too early," says Verna. "We
needed some distribution mechanism. Also the funding climate changed
dramatically between April, 2000 and the first seed (investment)
A year later it was a very different marketplace, and entrepreneurial
companies that did not have product and did not have revenue were
"IsSound is winding down in an orderly way," says Verna, who
works now at Edison Venture Fund on Lenox Drive. "We have met
all our obligations."
Lawrenceville 08648. Elaine Verna, president. 609-773-0007. Home
Ficomp had developed voice recognition solutions that enable people
to interact with computers using conversational speech — not just
in a nice quiet office but in noisy environments such as a brokerage
"We are out of that business. We couldn’t make it profitable,"
says Gary Richman, owner of the 25-year-old Ficomp Systems Inc. in
East Brunswick. "There may be folks making money in the speech
technology business, but I don’t know who they are."
Richman had been an electrical engineering major at Brooklyn
Institute, Class of 1965, and worked in engineering design and
before moving to sales and marketing. He founded his business in 1977.
In the 1980s he invested heavily in speech recognition technology,
first for Wall Street traders, then for warehouse systems. He lost
money and went back to his original business.
With 24 employees, Ficomp Systems now focuses on system integration
for banks, brokerages, and insurance companies — setting up
and workstations, tying into communication networks, providing
and doing application programs — to deal with the flow of data
"Speech technology would have been a profitable market but a very
difficult market," says Richman now. "In the telephony
and in the switch business, voice recognition seems to be working
well, but in the financial industry, the revenue stream could not
support the development."
Still, Ficomp earned lots of notoriety for trying to make speech
work on Wall Street. The New York Stock Exchange rejected the system,
saying that 97 percent accuracy was not enough, but Bear Stearns did
buy the product for half its traders. "We used it for four or
five years in the late 1980s, early 1990s," says a New York-based
Bear Stearns employee who did not wish to be named. "It worked
if you were to follow the instructions, but there was a learning
People had a problem trying to adjust their voice. It probably came
out before its time."
To be 100 percent accurate, the traders would have had to repeat their
words sometimes, and most were not willing to do that. "With that
psychology, it was going to fail," says Richman. "There was
no way the technology was ever going to be 100 percent."
— Barbara Fox
08816. Gary Richman, president. 732-967-1234; fax, 732-238-4952.
Corrections or additions?
This page is published by PrincetonInfo.com
— the web site for U.S. 1 Newspaper in Princeton, New Jersey.