The next phase of personalized medicine will require intensive computational science, says Feilim Mac Gabhann.
August 2012--In a well-air-conditioned room on the Johns Hopkins Homewood campus, a massive computer system operates 24 hours a day, seven days a week. Step inside this 1,000-square-foot room and you’ll be surrounded by rack upon rack of processors, with green lights shining. Yards of computer cables resembling licorice dangle all around you, and exhaust fans whir continuously.
In technical terms, you will have entered the guts of an IBM iDataPlex cluster computer containing 250 compute nodes, 2,000 individual processor cores, and one petabyte (quadrillion bytes) of storage. In plain language: an extremely powerful computer. And it may represent the future of medicine.
Feilim Mac Gabhann, credit: David Hopkins
According to Feilim Mac Gabhann, massive computing systems like this one are what is necessary to realize the dream of personalized medicine—in which doctors customize treatments based on an individual patient’s genome or the specific molecular characteristics of the patient’s disease, such as the genetic profile of a cancer patient’s tumor.
To achieve this end, Mac Gabhann, an assistant professor in the Department of Biomedical Engineering and at the Institute for Computational Medicine, is working at the interface of computer science and biological experimentation, using mathematical models to search large datasets for answers that the human brain cannot discern alone.
“The era of genomics and proteomics has created this huge avalanche of data,” says Mac Gabhann. Technology such as rapid DNA sequencing and the microarray, or “gene chip,” have enabled scientists to gather vast stores of information about genes and proteins. Now the challenge is to find meaning in the torrent of data. “Ultimately, biology is so complex that we can’t hope to keep all the information in our brains, and so we use these external brains—the computers—to do our processing and analysis,” says Mac Gabhann.
Personalized medicine is not new. Doctors began using the term and applying its principles two decades ago. In treating breast cancer patients, for instance, oncologists determine whether a patient's tumor contains receptors for the hormones estrogen or progesterone, hormones that stimulate tumor growth. Those that do can be treated with drugs (such as Tamoxifen) that prevent hormones from accessing tumor cells. In addition, some breast cancers overproduce a protein called HER2. Oncologists can now treat these patients with Herceptin, a special antibody that attaches to HER2 and stops or slows the cancer’s growth.
But personalized medicine is still limited in scope, says Mac Gabhann. In breast cancer, for example, 14 to 20 percent of patients are “triple negative”—their tumors lack receptors for estrogen or progesterone, and do not overproduce HER2. These patients cannot benefit from hormonal therapy or Herceptin, and they tend to have a poorer prognosis than other breast cancer patients.
Personalized medicine, says Mac Gabhann, is in its infancy. “There are some highly effective therapies that work for large groups of patients,” says Mac Gabhann. “But it will take more effort to pair therapies with the remaining subsets of people who will benefit from them.”
Mac Gabhann’s approach is to start with large datasets collected in previously published medical studies. In particular, Mac Gabhann focuses on datasets that contain information about gene expression levels in large numbers of individual patients. He then enters this information into a set of computer models run on the cluster computer.
The models analyze the gene expression data for each patient and highlight genes that are expressed at greater rates in patients as compared to healthy individuals. The models search for patterns in the gene expression data among all patients, and then use these patterns to separate patients into different subgroups. To use a simple example as illustration, one subgroup might contain patients with this pattern: gene “A” is expressed at twice the normal rate and gene “B” at four times the normal rate.
Because genes code for proteins, gene expression can be used to infer the levels of different types of proteins found in a cell, explains Mac Gabhann. Proteins are the targets of most drug therapies. So in a next step, Mac Gabhann can apply his models to select drug regimens that will target specific proteins found in a patient subgroup. The models can also be used to determine the optimum dosage, timing of drug administration and even mode of administration (orally versus through IV, for example).
Mac Gabhann and graduate student Joe Bender recently applied this approach to analyze collated sets of gene expression data from more than 2,600 breast cancer patients. The scientists were especially interested in those patients with triple-negative breast cancer, those whose tumors do not have any of the three markers that respond to current therapies. “One thing the model results suggest,” says Mac Gabhann, “is that there are novel markers in breast cancer—proteins or pathways not identified before as being involved.” Any of these might serve as a potential drug target.
The scientists discovered that about 60 percent of the patients with triple-negative breast tumors fell into a special subgroup having a gene expression pattern that is associated with high levels of angiogenesis, a process involving the growth of new blood vessels. Tumors exploit angiogenesis to fuel their growth, sending out signals that induce existing blood vessels to sprout new vessels that become the tumor’s blood supply.
Tumors with this molecular signature may be especially vulnerable to anti-angiogenesis drugs, says Mac Gabhann. Researchers have tried but had little success in using such drugs to treat patients with breast cancer. One reason could be that only a small fraction of breast tumors bear the molecular signature that would respond to an angiogenesis inhibitor, suggests Mac Gabhann. Their numbers may be so small that the positive results are drowned out in large studies.
To find out whether that hypothesis is correct, researchers would need to treat women in the select subgroup with angiogenesis inhibitors.
In fact, any of the conclusions reached “in silico”—with the computer models—will need to be verified “in vivo”—in real patients, says Mac Gabhann. “It will be at least five to 10 years before these results could be applied in the clinic,” he estimates. When that day arrives, he says, a doctor won’t need to go in search of a cluster computer to customize a patient’s treatment. Research like Mac Gabhann’s, he predicts, will be translated into programs and apps that can be performed on a laptop or tablet computer. No room-sized computer required.