The Big Data of Genomics
Beyond question, we are in the midst of an explosion of “Big Data” in many facets of human endeavors. In fact, data-storage leader IBM asserts that roughly 2.5 quintillion bytes of data are generated every day and 90% of the world’s data was created in the last two years.
An outpouring of articles in scientific journals and major newspapers has highlighted the promising potential of big data in medicine, including a special section in the current issue of Nature. Genomics has become a major source of the growth of such big data, particularly as the cost of sequencing genomes has plummeted. The raw sequence data for just one person’s whole genome use as much as 100GB—and already hundreds of thousands of individual genomes have been sequenced. With more than 2,500 high-throughput sequencing instruments currently used in 55 countries across the globe, more genomes are added every day. The aggregate amount of genomic data is growing explosively, and next-generation sequencing (NGS) sequencing data are estimated to have doubled in volume annually since 2007.
The accumulation of genomic data is a worldwide phenomenon. Impressive population-wide sequencing efforts are leading the way, from 100,000 genomes in England, Saudi Arabia, and Iceland to 350,000 in Qatar to a million in both China and the U.S.
And earlier this month, the CEO of the Cleveland Clinic predicted that soon children will routinely have their whole genomes sequenced at birth, implying a near-future in which 10s of millions of new genomes are sequenced annually.
Turning Data into Resources
But sequencing genomes is not enough, and the creation of genomic big data is just the beginning.
Thanks to the analysis of big data in genomics and associated informatics, we are seeing meaningful progress in cancer care and the diagnosis of rare diseases, as I have discussed here and here. We clearly have a tremendous opportunity to use the big data of genomics to continue to drive a revolution in healthcare.
Yet there is a broad consensus that a ‘data bottleneck’ is hampering collaboration and discovery. Not all researchers and physicians confronting the current onslaught of genomic big data can readily determine how to use genetic information to prevent or treat disease. To succeed, researchers and physicians clearly need resources that:
- Draw together useful data from disparate sources;
- Facilitate analysis and collaboration; and
- Improve clinical practice.
The power of genomic analysis needs to expand outward from major research centers and hospitals to the myriad clinics and community hospitals where many patients receive care. To have the greatest impact on the broadest population, clinicians throughout the world’s health systems need access to the big data generated by DNA sequencing, even—or perhaps especially—if they are not affiliated with research institutions. They also need to be able to make sense of the data they have access to.
Answers in the Cloud
Sequencing provides the raw data to uncover the genetic variants that contribute to disease. But the datasets are too big to transfer repeatedly—and too big even for smaller hospitals, labs, or clinics to store onsite. Key medical advancements require not only big data, but also tools and resources to generate, interpret, and share analysis of millions of genomes.
Cloud-based platforms—such as WuXi NextCODE’s Exchange—are essential to address the fundamental big data challenge of genomics. Collaboration in the cloud works to dismantle existing “data silos”—genomic information hosted only on local servers and analyzed on idiosyncratic, closed platforms. The NextCODE Exchange, in contrast, is a browser-based hub that affords secure, seamless collaboration with colleagues around the world. Moreover, users get access to NextCODE’s tools for making the critical links between variation in the genome and disease and other phenotypes, backed by harmonized links to the the most important public reference data.
And cloud-based computing is inherently scalable: resources for data storage and analysis expand as needed, allowing researchers and physicians to leverage massive datasets to improve patient care in the clinic. The big data of genomics will continue to expand, and our approaches to analyzing genomic data need to continue to evolve to meet the growing demands of clinicians and researchers.
At WuXi NextCODE, we have built upon our heritage of conducting the largest analysis of genomic data (deCODE’s path-breaking Icelandic analysis) by assembling an ever-growing database of human genomes. We are committed to driving the movement of sequence data into patient diagnosis and care through user-friendly, leading-edge analysis and informatics. I am confident that data analysis and collaboration in the cloud will revolutionize healthcare, and exceptionally proud that WuXi NextCODE’s Exchange is at the forefront of this exciting advancement.