Your library is not far from another generational shakeup. Ever since computers and the web transformed library card catalogs into fireplace tinder, all of us have adapted to a new way of ferreting out the information we seek.
Yet even today’s sophisticated keywording and thread linkage depend on fixed human categories, which the searcher must know to find his answers. “The newest step in information searching is to get away from the human labels completely and link searches to content alone,” says #b#Andrea LaPaugh#/b#, professor of computer science at Princeton University.
Can the 14.5 billion web searches per month sift through the more than 1 trillion websites, documents, and music selections more efficiently and naturally? Plans are definitely afoot, and algorithms for all media are in the offing. To show how our search engines currently work, and where we might be headed, the Princeton Chapter of the ACM/IEEE has invited LaPaugh to speak on “The Foundations and Future of Information Search.” This free event will be held on Thursday, April 15, at 7:30 p.m. at the Princeton University Computer Science Building. A pre-meeting dinner with the speaker will be held at 6 p.m. at Ruby Tuesday’s Restaurant on Route 1. Registration for the dinner is requested. Visit www.princetonacm.acm.org.
Princeton students who come to LaPaugh’s “Information Retrieval, Discovery, and Delivery” class find a professor who has not only analyzed the methods of information retrieval, but had a hand in their making. Growing up in Middletown with an English teacher mother and office worker father, she’s not sure from where her technological bent came. She earned her bachelor’s in physics from Cornell University in 1974. She then turned to Massachusetts Institute of Technology, taking a master’s and Ph.D. in the burgeoning fields of electrical engineering and computing science.
After a year teaching at Brown University, she made Princeton University her permanent academic home. Here LaPaugh researches the algorithms and designs for information systems, blending the electronic and the neural. She has also helped develop algorithms for VLSI circuitry — a multiple-electrode sensor that can be implanted into the human cerebral cortex to monitor EEG and visual responses.
In addition to her scores of published papers, LaPaugh serves on the editorial board of the international computing science journal, “Algorithmica.”
Certainly, the quest to organize and easily retrieve knowledge is as old as humanity itself. It has been half-jestingly noted that God created man in his own image because that was all he had to work with — and we humans create our computers in our image for the same reason.
But LaPaugh disagrees. “I think those who are designing the search and ‘thought’ patterns for computers know exactly when they are following the human path and when they work in a wholly different direction,” she says. Whatever the method, the goal remains to bring a broader number of resources to bear on a selected topic.
#b#Words and links#/b#. “Many new types of encyclopedias will appear, ready-made with a mesh of associative trails running through them, waiting to be dropped into the memex and there amplified.” This proto-hyperlink prediction was written in 1945 by engineer and scientist Vannevar Bush. Having recently overseen the multidisciplinary Manhattan Project in the creation of the world’s first atomic bomb, Bush was among the first to envision information as a river that must flow across all sections, unhemmed by artificial categories.
Fifty-five years later, computers have fulfilled Bush’s plan beyond even his dreams. In 1998 Google’s PageRank took the standard keyword search method a giant stride forward. Instead of using only specific words, Google matches their frequency and placement within a document, as well as the frequency of hits to a document by similar searches. Prior to any query, web documents have been ranked according to a word-appearance list.
But the real Google edge came from PageRank’s link-to-link exploring capabilities. Web crawlers follow across websites, noting and following all associated links. “Instead of searching word-by-word,” says LaPaugh, “the system is driving swiftly through entire networks of linked sites. Also it can judge the relevancy of such links by using such factors as site popularity.” The entire bibliography stands ready with each volume ready to flip open to the correct page, upon command.
#b#Beyond Google#/b#. PageRank and its subsequent improvements move searches ever further from the library of old’s cast iron knowledge categories, and closer to a content-based expansion. It’s more human. One thought leads to another. Pondering our global need for water may take the thinker into the disciplines of demographics, transportation, physics, geology, meteorology, and who knows where. Questions arise that lie beyond the realm of existing databases and websites.
What is the cheapest way to bring pure water from this river to that town? The facts exist, but the format must be assembled anew with each query. “Our next step is to have an engine which searches through several tables, extracts the relevant data, and presents it in the table as sought by the researcher,” she says. The savings in time and the deepened focus of knowledge brought to the problem would be immense.
#b#Sights and sounds#/b#. Technically, a human does not see a tree. He sees colored patches that his mind then interprets as a tree, using well established neural and brain pathways. “This is something that is so trivial for humans, but so enormously complex for a computer,” says LaPaugh. Using certain visual cues, designers are trying to have one digital image link to and call up others within the parameters the searcher wants. But getting your landscape photo to present only other shots of that mountain, not forests with the same species of tree, is no mean trick.
Likewise, musicologists and search-algorithm experts are struggling to define music’s quantifiable basics in such ways as will allow searchers to reach beyond composer, performer, and genre. Future searchers need not know the best flute sonatas, only that they want them in all genres from the 19th century.
It may seem a bit far fetched, but one has only to crack a print encyclopedia, and then browse around the worldwide web to realize that in our quest to organize our knowledge, anything is possible.