COVID-19 Story Tip: Johns Hopkins Helps Lead Creation of National Covid-19 Database for ‘Big Data’ Studies


Big data is defined as extremely large amounts of information that is hard to analyze using traditional techniques because of the volume, the wide variety of types and sources, or the disparate means by which it was collected. Using special evaluative and investigative methods known as big data analytics, researchers can mine huge, seemingly unrelated datasets to uncover hidden patterns, correlations and insights.

Now, a nationwide collaboration of clinicians, informaticians and other biomedical researchers at 60 institutions — with Johns Hopkins Medicine’s Christopher Chute, Dr.PH., M.D., M.P.H., as its co-leader — has begun collecting and harmonizing hundreds of thousands of medical records from COVID-19 patients to extract data for a new, centralized and secure database that will feed big data studies of the disease.

The National COVID Cohort Collaborative (N3C) was officially announced June 15 by its funding agency, the National Institutes of Health’s National Center for Advancing Translational Sciences (NCATS).

It is hoped that by midsummer of this year, the NCATS N3C Data Enclave, as the highly secure repository is known, will contain clinical, laboratory and diagnostic information from the electronic health records (EHRs) of at least 1 million patients from across the United States — with some 300,000 of them testing positive for SARS-CoV-2, the virus that causes COVID-19. The data will be aggregated into a standard format so credentialed researchers and health care providers will have easy, rapid and free access to this valuable resource.

The goals of the N3C, according to NCATS, are to “(1) create a robust data pipeline to harmonize EHR data into a common data model; (2) make it fast and easy for the clinical and research community to access a wealth of COVID-19 clinical data, and use it to research COVID-19 and identify effective interventions as the pandemic continues to evolve; (3) establish a resource for the next five years to understand the long-term health impact of COVID-19; and (4) create a state-of-the-art analytics platform to enable novel analyses that will serve to address COVID-19, as well as demonstrate that this collaborative analytics approach could be invaluable for addressing other diseases in the future.”

Along with serving as the N3C’s co-lead, Chute, the Bloomberg Distinguished Professor of Health Informatics at the Johns Hopkins University School of Medicine and a faculty member at the university’s schools of public health and nursing coordinates a Johns Hopkins team developing software and a “transformation pipeline” to harmonize data from the participating N3C institutions into a common format. Additionally, Chute and his colleagues work on key N3C aspects such as data governance and analytics, and on protecting the rights and privacy of the patients from whom data are collected.

Chute is available for interviews about the N3C initiative and Johns Hopkins Medicine’s key role in the effort.