Fourteen years later, the Johns Hopkins Center for Inherited Disease Research leads the way in deciphering disease genotypes
March 2010 - David Valle recalls that all the NIH bigwigs attended that first meeting — auspicious as it turned out to be — to mark the establishment of the Center for Inherited Disease Research (CIDR, pronounced cider) at the Johns Hopkins Bayview campus: representatives from all the various institutes including, of course, Francis Collins, then director of the National Human Genome Research Institute and now head of the NIH.
“It was memorable because we heard lots of noise outside and then noticed a huge crowd of people gathering out in front of the Triad Building in which we were in a conference room on the fourth floor,” says Valle, director of the Institute for Genetic Medicine. “Next, there was a knock on the door and a Baltimore City fireman telling us that the first three floors had been evacuated because of a suspicious package delivered to an office on the second floor. The fire department had judged from the size of the package that should it be a bomb, and should it go off, the damage to our floor would be minimal. So he said we could either continue the meeting or evacuate. It was our call.
“We elected to continue the meeting. We probably wouldn’t have come to that conclusion nowadays, but back in 1996, that’s what we did.
“When the meeting was over, we all rode to the lobby on a glass elevator through which we could see a little robot poking at the package.”
Nothing blew up, that day.
But along came the Human Genome Project.
And then, the explosion.
When its initial contract was signed, CIDR set out doing a few hundred thousand genotypes a year and ambitiously pledged that, someday, it would produce a million genotypes a year. (“Doing” a genotype involves examining an individual’s DNA at a site where variation is known to commonly occur.)
“When I joined CIDR six years ago, the goal then was to do 11 million genotypes a year, which was just unheard of,” says Lee Watkins, director of bioinformatics. “It was a tremendous amount of data, which we knew was only going to keep going up, though we couldn’t imagine how.”
In 2009, CIDR generated 37 billion genotypes. “That’s with a b,” Valle asserts, adding: “CIDR is second to none in terms of quality or quantity; nobody beats us in genotyping.”
With the Illumina GAIIX sequencer, CIDR employees (L to R) Kurt Hetrick, M.S.,
research specialist for technology evaluation; Hua Ling, Ph.D., statistical
geneticist; and Brian Craig, M.S., senior laboratory informatics research analyst.
The human genome consists of about 25,000 genes and roughly 3 billion base pairs. Yours is 99.9 percent identical to Valle’s, and his to Collins’, and so forth. It’s those places where we’re consistently not alike that are interesting to geneticists because they contain the genetic variants that influence how people differ in their risk of common diseases and disorders such as cancer, heart disease, stroke, depression and asthma, as well as their response to drugs.
Those sites where the DNA sequences of many individuals vary by a single base are called single nucleotide polymorphisms, or SNPs — pronounced snips. (For example, some people may have an A at a particular site on a chromosome where others have a G. Each form is called an allele.)
The term “genotyping” refers to uncovering individual variation at any number of the 10 million SNPs that have been well-defined by the HapMap project that analyzed genomes from populations around the world. Until fairly recently, the only human variants mapped and identified were for “simple” Mendelian diseases and disorders: those that could be pegged to a single gene.
Since 1996, CIDR has provided high-quality genotyping services and statistical genetics consultation to gene hunters: researchers who are working to discover genes that contribute to common diseases by ferreting out variants in the genome.
When CIDR was established at Johns Hopkins in 1996, there were not nearly enough genetic markers identified for scientists to ferret out genes associated with common diseases, and there was no cost effective way to genotype so many. But the recent explosion in human genetic data provides opportunities to untangle the interplay of genes in diseases by surveying sites of variation throughout the entire genome. The development of statistical methods for human gene mapping takes full advantage of the many terabytes of data currently being churned out by CIDR.
“The typical researcher who comes to CIDR has been studying a disease for a long time and has a cohort of a few hundred to many thousand individuals from whom they’ve collected DNA samples,” Watkins explains. “More and more they’re very large projects that involve international teams — the Alzheimer’s consortium, for example — where people have combined their collected samples of DNA. Over the last few years we’ve learned that it takes these large, large sample sizes to figure out the complexity — to get enough power, so to speak — to find genes that contribute to common diseases.”
Gone are the days when genetic data was measured in mega- or even gigabytes. State-of-the-art studies by CIDR are measured in terabytes — each tera being a thousand gigs.
CIDR now produces approximately 150 terabytes of data annually.
“It’s a metric by which I judge the growth of genetics,” Valle says. “People are looking for genetic solutions to all kinds of problems.”
On any given day, packages arrive at CIDR from New Zealand and Finland, Michigan and Maryland, all shipped by scientists who suspect that their contents — DNA samples packed in dry ice — may hold vital clues about the genetic origins of everything from cancer and schizophrenia to suicide risk and diabetes.
CIDR's Ayoola Odeniyide, B.S., research technologist, with the Tecan
Type 2 diabetes (T2D) is a huge and growing public health problem affecting more than 23 million American adults and children and costing $174 billion annually. Dubbed "a geneticist’s nightmare” by virtue of its complexity, type 2 diabetes was the focus of CIDR’s first foray into a groundbreaking technique known as a genome- wide association study (GWAS). Not long ago, GWAS was science fiction. Now, the beauty and utility of this technique that looks across the genome in a powerful and unbiased way is science fact.
“We were CIDR’s pilot project in terms of GWAS,” recalls principal investigator Michael Boehnke, the Richard G. Cornell Distinguished Professor of Biostatistics at the University of Michigan, referring to a recent study by an international consortia that involved samples from 2,500 individuals and generated 800 million genotypes by examining 317,000 SNPs. “We had already known CIDR coming into this, having done previous studies together, all connected with diabetes."
“I think CIDR recognized we’d be good people to work with and could give them useful feedback, and we knew how excellent and responsive they were. So it was an easy choice to do the genotyping for the GWAS study there. We always regard them as collaborators, not just a service organization, because the work they do is so central to what we’re studying; they have a lot to offer intellectually beyond the genotyping.”
When Francis Collins first floated the idea about establishing a facility such as CIDR, Boehnke admits he was dubious about whether people would be sufficiently committed to large-scale high-throughput intensive science if it didn’t involve their own projects. “But you assume Francis is wrong at your peril,” he says, adding: “The beauty of working with the folks at CIDR (which employs just over 50 people) is that they treat the project as if they were its principal investigator.”
Despite that much effort had gone into finding locations in the genome associated with risk of T2D, only three had emerged as of March 2007. Now, GWAS consortia have identified a total of 38 T2D loci — areas of the genome that play a role in disease risk, as well as many more loci responsible for variability in diabetes-related traits such as body mass index, glucose levels and lipid levels. Once chromosomal regions of interest are identified, even more focused studies — such as genetic fine mapping and resequencing — can be used to localize the relevant genetic variants.
“Identifying the relevant genes and variants will help to reveal the complex basis of T2D, assist in disease classification, help identify novel drug and behavioral therapies, improve targeting of therapies, and may support more accurate prediction of T2D risk” Boehnke says.
CIDR's Lindsay Cole, B.S., research technologist, with the
Biomek FX robot.
Currently, CIDR’s biggest studies examine a million carefully selected SNPs in every sample; these bases are in areas of the genome that are likely to have something important to offer up about the genes located in that particular region. Comprehensive as that is, it’s worth noting that the whole genome is 3000 times bigger than CIDR’s biggest studies, says Kimberly Doheny, assistant director.
“It’s not as hypothesis-free as whole genome sequencing, but neither is it limited to a list of genes you think might be involved in a disease and only looking at them. It’s looking at a million of the most polymorphic sites that have been identified largely by the HapMap project.”
Most of CIDR’s work involved large case-control studies in which two data sets — samples from those with a certain disease or condition, and samples from those without — are generated in order to show the frequency of a change in the DNA at a certain position.
The DNA samples that come in from investigators are the raw material on which CIDR works its high-throughput magic, scrutinizing known SNPs and finding subtle differences in the genomes of individuals with and without disease. The product that goes out is data — lots and lots of data — sent to investigators on encrypted hard drives.
Investigating the entire genome would be less arbitrary and more fruitful, but that requires sequencing, the cost of which, at $50,000 per individual, is still prohibitive. Comparatively, CIDR charges around $400 per sample for an entire data set for a million SNPs. Jointly supported by 14 NIH institutes, CIDR’s services are free to investigators who have been approved for access through competitive peer review.
CIDR had one sequencer and just purchased another as it gears up to take part in next-generation sequencing, which will be more powerful accurate and efficient, and is destined to be the next-big thing.
“Illumina (the company that CIDR uses for its genotyping and sequencing technology) just came out with a new machine that’s about four times better than the current one,” Valle marvels. “The pace of this is just so amazing. It’s hard to keep a grip on it.”
--by Maryalice Yakutchik