A group of Johns Hopkins University scientists has collaborated with more than 100 researchers around the world to assemble and analyze the first complete sequence of a human genome, two decades after the Human Genome Project produced the first draft.
The work is part of the Telomere to Telomere (T2T) consortium, led by researchers at the National Human Genome Research Institute (NHGRI); University of California, Santa Cruz; and University of Washington, Seattle.
Johns Hopkins contributed key research to the effort to decipher our DNA—which has remained a mystery despite the initial progress 20 years ago. The revelations are expected to open new lines of molecular and genetic exploration while providing scientists with a clearer picture of how DNA affects the risks of diseases, and how genes are expressed and regulated.
A package of six papers reporting the achievement appears in today's issue of Science, along with companion papers in several other journals.
"Opening up these new parts of the genome, we think there will be genetic variation contributing to many different traits and disease risk," said Rajiv McCoy, an assistant professor in the university's Department of Biology in the Krieger School of Arts and Sciences whose research focuses on human genetics and evolution. "There's an aspect of this that's like, we don't know yet what we don't know."
McCoy and 12 Johns Hopkins researchers worked on different aspects of the international initiative, contributing to the main genome assembly project and to several companion works analyzing what can be learned about patterns of genetic and epigenetic variation from person to person through the newly sequenced sections of the genome.
Winston Timp, associate professor of biomedical engineering in the Whiting School of Engineering, and his graduate student, Ariel Gershman, worked on a part of the project that focused on how the completed genome will enhance understanding of gene regulation and expression–the process of turning genes "on" and "off."
Johns Hopkins researchers, led by PhD students Samantha Zarate, Stephanie Yan, and Melanie Kirsche, along with postdoctoral researcher Sergey Aganezov, specifically helped demonstrate how having a single complete genome improves the ability of scientists to understand variations in the genomes of individuals from different populations. By analyzing data from more than 3,200 people from around the world, they revealed more than a million genetic variants that were not previously known. To do so, the Hopkins team used the NHGRI Analysis, Visualization, and Informatics Labspace (AnVIL), a cloud-based platform co-lead by Bloomberg Distinguished Professor Michael Schatz, who was also an author of the T2T papers.
The study found that because the previous model, known as the reference genome, was a composite of multiple individuals' genomes essentially "stitched together," it created artificial "seams" where the model switches from the genome of one person to another. The new, complete version eliminates those seams and is more representative of what an individual's actual genome looks like.
Using the new human genome model, the Johns Hopkins contributors also quantified how frequently different versions of the same gene occur in diverse human populations. That serves as an evolutionary record of both random fluctuations and potential natural selection affecting certain parts of the genome.
Coordinating their research during the COVID-19 pandemic through the messaging platform Slack, scientists from 30 different institutions added or corrected more than 200 million DNA base pairs, increasing the total number in the human genome to 3.05 billion. A base pair is two chemical bases bonded to one another to form a "rung" of the DNA ladder. Through the process, they also discovered more than 100 new genes able to produce proteins.
According to Schatz, the sequencing has made accessible a segment of the genome about the same size as one of the larger human chromosomes.
"We've effectively added an entirely new human chromosome to our knowledge," he said. "There's a lot to be gained and learned from it. There's this whole new opportunity for discovery."
At the same time, he said, because errors in the previous sequencing were identified and corrected, scientists now have a more precise view of "clinically relevant genes," a potential boon to personalized medicine.
Of particular interest to the researchers was an enigmatic component of the genome known as centromeres. They are dense bundles of DNA that hold chromosomes together and play a key role in cell division. Previously, however, they had been considered unmappable because they contain thousands of stretches of DNA sequences that repeat over and over.
Timp explained how this work was empowered by long read sequencing, analogous to jigsaw puzzle pieces. Previously these regions were unresolved because they were so repetitive, so all of the pieces were a single color and shape. "It's like all you have are pieces that look like blue sky. They're identical. So, how do you put that together? It becomes almost an impossible problem" he said.
But more sophisticated sequencing technology now enables scientists to make better sense of the once inscrutable region using long reads. "It's like the puzzle pieces are now really big, like a toddler puzzle," Timp said. "And we discovered there are some objects in the pieces, say some grass or the sun. It's not just blue sky."
Being able to track changes over time in these newly accessible genome regions will allow researchers to make more rigorous comparisons from one generation to the next, of people of different origins, or from species to species.
"Finally, from tip to tip, telomere to telomere, we have an assembly of the genome we can look at," Timp said.
One immediate challenge McCoy identified is that clinical labs will need to transition from the previous genome mapping to the new complete version, no small undertaking requiring that they adjust the information they have about the links between genes and diseases.
"There are all sorts of databases and resources that have been built around the previous version, and it can be hard to get people to shift over," he said. "So one goal of our work now is to encourage these important resources to move over to the new mapping to really empower the community."
For Schatz, who switched careers from cybersecurity to genomics in 2002 after being inspired by the original Human Genome Project, the comprehensive assembly of the human genome, and his being able to contribute to it, is particularly gratifying.
"I always believed this could be done," he said. "But I don't think anyone really knew when it could be done and what it would really take. I thought it was going to take many more years. It really was a surprise to me how quickly we could get through it.