Lee Watkins Jr., director of bioinformatics at the Institute of Genetic Medicine Center for Inherited Disease Research (CIDR),
on being where the action is:
There’s a graph on the wall outside your office showing the growth of data generated by CIDR in the last 10 years: The line that denotes growth literally goes straight up. As CIDR’s data man, will you comment on that?
WATKINS: It’s a little scary. We couldn’t possibly have imagined going from 22 million genotypes a year in 2005 to 37 billion this year. But we did just that, and successfully. It’s just as overwhelmingly mind-blowing to think that we’ll be generating some astronomically larger amount of data in the future, but we will. It will happen.
In terms of data-storage capacity, you talked in megabytes not all that long ago. Then it was gigabytes, and now it’s terabytes. Can you put these numbers in real-world terms?
WATKINS: We can easily generate one terabyte per day and routinely will. As of March 2010, CIDR’s active data archive – archival data on disk – is well over 400TB, plus untold amounts stored on tape, and we’re about to acquire nearly another 300TB in archival storage.
You can think of one terabyte as 50,000 trees made into paper and printed, or all the X-ray
films in an average hospital. Ten terabytes equals the entire printed collection of the U.S. Library of Congress. And the National Archives of Britain holds over 900 years of written material amounting to about 60TB of data.
When you came to CIDR in December 2003 as the director of bioinformatics, what did you know about this emerging field?
WATKINS: I’m a biologist by training, with a master’s in evolutionary and population biology. I worked in a lab literally in the stone ages of genomic science, at the University of Texas in Austin where I was doing ecological genetics of a plant.
My career got sidetracked in the late 1980s, and I went from lab work to computer work, when I took a job as a statistical consultant at the UT Austin computation center. My first role at Hopkins was in research and academic computing: I was helping people with their research needs in terms of technology. Back then not many people had that sort of expertise. Then, much more recently, I worked in the digital library program, just at the time when the library was going from being a collector of books to collector/creator of digital content. I learned about CIDR in 2003 when the leader of a master’s degree capstone project I sponsored, who happened to work at CIDR, mentioned the center was looking for a director of bioinformatics. I said “Oh, what’s that?”
Semantics aside, you clearly weren’t the least intimidated by the thought of working at the junctures of lab and information technology as well as population and medical genetics.
WATKINS: I like working at the intersection of different fields – it’s where the action is.
When I interviewed with CIDR, I felt like a Gemini astronaut who had been in suspended animation and suddenly walked onto the Starship Enterprise. And that was CIDR’s “old” technology. Then, the standard was genotyping 400 markers. Now we do a million, routinely.
Everything I had learned in grad school and during the 15 years in my information technology career here at Hopkins, I could put to use here at CIDR. Right when I stepped in here, everything was changing. It has worked out that I’ve been able to help CIDR grow successfully, and in return, CIDR’s given me the best job I’ve ever had.
Can you describe one highlight?
WATKINS: Being recognized as a co-author of a paper in the journal Science, with the likes of Kim Doheny, who works here, and Mike Boehnke and Francis Collins. It was our first GWAS study pilot project; CIDR contributed directly to the success of that, and they recognized us as co-authors. People who work on the information technology side of things often labor in the background and don’t get the same level of recognition as others. But these days, the bioinformatics piece is truly integral and indispensible.