March 2005--Needles in haystacks are sometimes easier to find than biologically significant signals on a DNA chip. Does that red spot on the microarray indicating an enhanced level of gene expression point to the culprit in a disease, or is it mere noise? The only way to tell for sure is to analyze chip after chip, comparing DNA from individuals with similar conditions, creating algorithms and running the data through a computer to tease out associations. That's the job—one of them—of the computational biologist, an increasingly important member of many research teams.
Computational biologists provide the mathematical and statistical expertise necessary to decode data generated by high-throughput experiments.
“We've gone from studying one gene at a time to looking at things on a comprehensive scale in microarray experiments where you see everything that's going on,” says Joel Bader, a computational biologist in the Department of Biomedical Engineering. “Without some sort of mathematical, computational, statistical framework, it's really difficult to make sense of the information in a useful way.”
Bader points out, however, that computational biology is more than just a collaborative step in data-wrangling; it can also be predictive—and that aspect may one day take over.
“In the long term, the goal is for computational methods to replace some of the experimentation,” he says. “But the new techniques are already reliably predicting how some biological systems are going to behave.”
Bader notes that computational modeling is already being used, for example, to make inferences about gene function and the structure of proteins.
“It used to be that if you wanted to know protein structure, you'd try to crystallize the protein, then measure structure experimentally. It was very expensive,” he explains. “Now you look for a protein whose structure has already been measured, a close relative of your protein, and you infer the structure of the new protein computationally.”
The same is true of genes, he says. In the past, revealing a gene's function meant first spending a lot of time and energy in characterizing the gene.
“Now, he says, “if it's a new one that nobody has seen before, you do a database search, find genes coding similar protein sequences and use that knowledge to infer function.”
Still, we haven't yet reached the point in biology—and probably never will—when computation will replace experimentation, according to Dave Cutler, a statistical geneticist at the McKusick-Nathans Institute. It takes a great deal of time in the laboratory to acquire the data necessary to make strong inferences, he says, adding: “But computational biology itself is changing the way we approach questions. The better prediction becomes, the more it opens up the range of experiments.”
As an example, Cutler points to a $3 million National Institute of Mental Health-funded study of genetic factors in autism in which Hopkins researchers will use genome-searching technology to identify any genetic role in the disease. As the computational biologist, Cutler plans an analysis of data from thousands of DNA chips scanning 500,000 single nucleotide polymorphisms (SNPs) in approximately 1,000 autistic children and their parents, to reveal an association with the disease.
This targeted sweep of the genome may turn up a few strong autism-implicated genes and their neighbors after Cutler processes the data.
“Most of it is an algorithm thing,” he says, leading to smarter data analysis.
He and colleagues also have access to faster computers—some 50 of them.
“Without those,” he says, “nothing would happen.”
Results of the high-throughput sweep of the genome and subsequent data analysis should generate another round of experiments on the flagged genes.
“If what I do works out perfectly in this study, I'll be able to tell you that certain spots on the genome are likely associated with autism. That still gives you no idea how or why autism develops,” Cutler says. “But it does suggest a whole boatload of new experiments.”
Whether predictive or analytic, the computational steps support biology's increasing emphasis on understanding systems and networks.
“If, for example, you had wanted to look at DNA-protein network interactions a few years ago, it would have been a kind of philosophic exercise,” says Bader. “Now we can really do it, but these data sets have gotten so large that you have to figure out which statistical tools make sense of the data, and plan experiments properly.”