Figuring out what genes do is essential to scientific and medical progress, but an important first step in using data from genomes, human and otherwise, is making the information available to scientists. Publicly available biological data abounds, but because it is often unwieldy, Princeton University scientists in the Laboratory for Bioinformatics and Functional Genomics have been developing better ways to access and use it in experiments.
Olga Troyanskaya, a Princeton professor in both computer science and at the Lewis-Sigler Institute for Integrative Genomics, says that she and her colleagues have developed methods to consolidate this data, while determining its reliability and biological accuracy. They are also making the data accessible to biologists through a Google-type interface, available on the web, that quickly integrates large amounts of data together while assessing the reliability of each data source.
Troyanskaya will be part of a Lunch ‘n Learn panel on “Women in Research Computing,” a free panel held on Wednesday, March 25, at noon at the Frist Campus Center. The panel will be moderated by Betty Leydon, vice president for information technology and chief information officer. Two additional panelists will be Emily Carter, professor of mechanical and aerospace engineering and applied and computational mathematics, and Jennifer Rexford, professor of computer science. For information, call 609-258-6033.
To give a sense of how her team’s system works, Troyanskaya makes an analogy to public health scientists who are trying to predict a flu epidemic using different types of available data. “If, say, a lot more people are coming to all 10 hospitals in an area with coughs, fever, and there is a highly unusual rise in people buying diarrhea medicine, then based on these three sources, you will say there is a flu epidemic,” says Troyanskaya.
But, she adds, you would trust the sources differently, depending on the question you are asking. If the question is specifically about stomach problems, then the value of the data on diarrhea medicine would be much greater than if you are asking about the flu, where there is more secondary evidence.
The database works similarly, but is used for more esoteric genomic questions rather than flu epidemics. For example, a scientist could assess whether a specific protein is or is not related to mitochondria, the energy center of the cell, or whether it does or does not participate in breast cancer. Furthermore, says Troyanskaya, you can learn how much to trust different sources of data, and the system allows its users to upweigh or downweigh certain data sets, depending on the question.
But the size of the database also gives it power. “The system can take millions of data points,” says Troyanskaya. “If you take one source there is not an answer, but over the entire collection, you can answer it reliably.”
Troyanskaya uses the genomic data in the university database to make accurate computer predictions of what different genes do. She then determines which of the predictions were correct and feeds this data back into the system so that it “learns” to make better predictions on the next iteration.
Troyanskaya has been looking specifically at mitochondria, a specialized sub-unit of a cell that is linked to number of diseases, including some types of cancer as well as autism. She is using as a model system baker’s yeast, which, sharing common evolutionary ancestors with human beings, has fundamentally similar biology and genes. “We are likely to be able to use the findings of yeast in the study of human biology,” says Troyanskaya.
“Based on genomic-scale data,” she says, “we can make predictions of new proteins to work in this process.” So she took all of the proteins from the database known to maintain and participate in the building of mitochondria and did experiments that confirmed 60 percent of the predictions as being correct. These were fed back into the system, which made additional predictions. At the end of the process the researchers were able to go from 100 proteins in the original set to an additional 120 proteins already known to be linked to mitochondria, as well as 100 more not previously linked to it.
Troyanskaya grew up in Moscow, Russia, where her mother was an electrical engineer and her father a civil engineer. She came to the United States as an exchange student to spend her last year of high school in the Northern Virginia suburbs of Washington, D.C., and her parents eventually emigrated to Israel.
Troyanskaya decided to go to college in the United States and applied frantically, she says, knowing that her Russian parents could not afford to pay her tuition. She ended up at the University of Richmond on a full scholarship, graduating with a double major in computer science and biology — what she calls her “split personality” — in 1999. She moved on to Stanford University for graduate school in biomedical informatics, earning her doctorate in 2003 and then coming to Princeton as an assistant professor.
Mentors have not been difficult for Troyanskaya to find, and she considers herself lucky. Early male mentors were her Ph.D. advisers Russ Altman and David Botstein, who have both mentored many highly successful women. She also cites Kai Li, professor of computer science at Princeton, as well as Maria Klawe, who has mentored many women in computer science. Formerly the dean of engineering at Princeton, Klawe is now the president of Harvey Mudd College.
Troyanskaya herself has three female Ph.D. students, one in computer science and two in molecular biology, and she has mentored several undergraduate women formally or informally. Last year former Princeton undergraduate Rachel Sealfon, who worked with Troyanskaya’s group both summers and for her junior paper and senior thesis, won the female Computing Research Award for North America and is now at the Massachusetts Institute of Technology working on a Ph.D. in electrical engineering and computational biology.
Troyanskaya is excited about the potential for research at the nexus of biology and computing. “Computations can make a huge difference,” she says, as biologists are looking for genes that interact closely with genes like BRCA2, which is responsible for some early-onset breast cancer. And of course Troyanskaya also has personal reasons for loving her work in biomedical informatics. “I never cured my split personality,” she says. —