Genomics: Big Data Leading to Big Opportunities

The Big Data of Genomics

WuXi NextCODE Exchange

The big data of genomics will continue to expand, and our approaches to analyzing genomic data need to continue to evolve to meet the growing demands of clinicians and researchers. Cloud-based platforms such as WuXi NextCODE’s Exchange are essential to address the fundamental big data challenge of genomics.

Beyond question, we are in the midst of an explosion of “Big Data” in many facets of human endeavors. In fact, data-storage leader IBM asserts that roughly 2.5 quintillion bytes of data are generated every day and 90% of the world’s data was created in the last two years.

An outpouring of articles in scientific journals and major newspapers has highlighted the promising potential of big data in medicine, including a special section in the current issue of Nature.  Genomics has become a major source of the growth of such big data, particularly as the cost of sequencing genomes has plummeted. The raw sequence data for just one person’s whole genome use as much as 100GB—and already hundreds of thousands of individual genomes have been sequenced.  With more than 2,500 high-throughput sequencing instruments currently used in 55 countries across the globe, more genomes are added every day. The aggregate amount of genomic data is growing explosively, and next-generation sequencing (NGS) sequencing data are estimated to have doubled in volume annually since 2007.

The accumulation of genomic data is a worldwide phenomenon.  Impressive population-wide sequencing efforts are leading the way, from 100,000 genomes in England, Saudi Arabia, and Iceland to 350,000 in Qatar to a million in both China and the U.S.

And earlier this month, the CEO of the Cleveland Clinic predicted that soon children will routinely have their whole genomes sequenced at birth, implying a near-future in which 10s of millions of new genomes are sequenced annually.

Turning Data into Resources

But sequencing genomes is not enough, and the creation of genomic big data is just the beginning.

Thanks to the analysis of big data in genomics and associated informatics, we are seeing meaningful progress in cancer care and the diagnosis of rare diseases, as I have discussed here and here. We clearly have a tremendous opportunity to use the big data of genomics to continue to drive a revolution in healthcare.

Yet there is a broad consensus that a ‘data bottleneck’ is hampering collaboration and discovery. Not all researchers and physicians confronting the current onslaught of genomic big data can readily determine how to use genetic information to prevent or treat disease. To succeed, researchers and physicians clearly need resources that:

  • Draw together useful data from disparate sources;
  • Facilitate analysis and collaboration; and
  • Improve clinical practice.

The power of genomic analysis needs to expand outward from major research centers and hospitals to the myriad clinics and community hospitals where many patients receive care. To have the greatest impact on the broadest population, clinicians throughout the world’s health systems need access to the big data generated by DNA sequencing, even—or perhaps especially—if they are not affiliated with research institutions. They also need to be able to make sense of the data they have access to.

Answers in the Cloud

Sequencing provides the raw data to uncover the genetic variants that contribute to disease. But the datasets are too big to transfer repeatedly—and too big even for smaller hospitals, labs, or clinics to store onsite. Key medical advancements require not only big data, but also tools and resources to generate, interpret, and share analysis of millions of genomes.

Cloud-based platforms—such as WuXi NextCODE’s Exchange—are essential to address the fundamental big data challenge of genomics. Collaboration in the cloud works to dismantle existing “data silos”—genomic information hosted only on local servers and analyzed on idiosyncratic, closed platforms. The NextCODE Exchange, in contrast, is a browser-based hub that affords secure, seamless collaboration with colleagues around the world. Moreover, users get access to NextCODE’s tools for making the critical links between variation in the genome and disease and other phenotypes, backed by harmonized links to the the most important public reference data.

And cloud-based computing is inherently scalable: resources for data storage and analysis expand as needed, allowing researchers and physicians to leverage massive datasets to improve patient care in the clinic. The big data of genomics will continue to expand, and our approaches to analyzing genomic data need to continue to evolve to meet the growing demands of clinicians and researchers.

At WuXi NextCODE, we have built upon our heritage of conducting the largest analysis of genomic data (deCODE’s path-breaking Icelandic analysis) by assembling an ever-growing database of human genomes. We are committed to driving the movement of sequence data into patient diagnosis and care through user-friendly, leading-edge analysis and informatics. I am confident that data analysis and collaboration in the cloud will revolutionize healthcare, and exceptionally proud that WuXi NextCODE’s Exchange is at the forefront of this exciting advancement.

email

Advancing Autism Research By Sharing Genomic Data Online: The Simons Simplex Collection

THE NEXTCODE Exchange is hosting the Simons Simplex Collection (SSC), a global resource for research on autism spectrum disorders comprising genomic data from nearly 2,800 families.

THE NEXTCODE Exchange is hosting the Simons Simplex Collection (SSC), a global resource for research on autism spectrum disorders comprising genomic data from nearly 2,800 families.

Autism research is underway around the world to better understand the genetic basis for the disease, which is difficult to diagnose and has limited treatment options. With vast amounts of data being generated, the answers to this challenging disease may lie in the consolidation of this global data.

The newly launched NextCODE Exchange (read the release here) may be a critical solution in changing how autism is diagnosed and treated. The Exchange is hosting the Simons Simplex Collection (SSC), a global resource for research on autism spectrum disorders comprising genomic data from nearly 2,800 families.

With the Exchange, the SSC will be accessible to the world’s autism researchers to harmonize the growing body of relevant genomic data. By enabling the rapid analysis of massive amounts of sequencing data followed by instant collaboration and validation of findings, the availability of the SSC and other hosted data will accelerate the pace of discovery in this field.

This simple concept is likely to help usher in a new era of genomic medicine, offering global access to data that can answer questions to some of today’s most challenging diseases.

Learn more about the NextCODE Exchange and the Simons Simplex Collection here.

Maintaining Momentum Post-ASHG: Maximizing the Value of Large Genomic Databases

The newly launched NextCODE Exchange provides a browser-based hub for multi-center sharing and collaboration on collective data from massive whole-genome databases like the Haplotype Reference Consortium (HRC).

The newly launched NextCODE Exchange provides a browser-based hub for multi-center sharing and collaboration on collective data from massive whole-genome databases like the Haplotype Reference Consortium (HRC).

The American Society of Human Genetics (ASHG) meeting convened this week in San Diego, bringing together genetics experts from around the world to discuss programs with great potential to advance genomic-based medicine in the years to come.

To maintain the momentum generated this week, we need to find ways to integrate these important ideas, insights and programs, and to maximize the use of the massive databases that have been launched to support research on cancer, rare diseases and other pressing health topics.

One of the databases unveiled during the meeting was the Haplotype Reference Consortium, which aims to become the world’s most comprehensive database of genetic variations. Large databases like the HRC, along with several others already underway, can be tremendously helpful to researchers finding answers to some of the most challenging diseases. But there remains a significant bottleneck: these large, cumbersome databases cannot easily be shared and manipulated, limiting their utility for broad, multi-center genomic research.

The solution lies in the newly launched NextCODE Exchange (see release here). This browser-based hub allows for the sharing and harmonizing of massive whole-genome databases like the HRC to accelerate research. The integrated architecture allows users to visually confirm and validate findings in raw sequences, collaborating and sharing with others around the world who may have complementary research underway.

The momentum generated during ASHG will be multiplied by sharing and learning from the world’s collective genomic data on the NextCODE Exchange. Learn more here.

Imagine the Potential: The World’s First Online Hub for Global Genomic Data Access

The NextCODE Exchange, a new browser-based hub, allows for real-time sharing of whole genome collections in a simple, consistent format.

The NextCODE Exchange, a new browser-based hub, allows for real-time sharing of whole genome collections in a simple, consistent format.

The field of genomic medicine is rapidly advancing as the research community becomes more comfortable manipulating genomic data with the goal of discovering insights about disease causes and risks. Yet each database is hosted within separate organizations, organized in unique ways and vastly too cumbersome to easily share with others who may be working on similar research.

This weekend a new tool launched to enable just that. The NextCODE Exchange (see release here), a new browser-based hub, allows for real-time sharing of whole genome collections in a simple, consistent format.

The availability of this Exchange is a critical advance in extending the utility of genomic data by allowing organizations around the world to access and harmonize large complementary datasets, potentially multiplying their study data sets to gain more reliable insights than ever before.

Already, numerous organizations are participating in the NextCODE Exchange to add and share their genomic data, including clinicians and researchers affiliated with Boston Children’s Hospital, University College Dublin, Queensland Institute of Medical Research (Australia), and Saitama Medical University (Japan).

As new institutions look to the Exchange to share genomic data, this hub holds significant potential to help advance progress in genomic-based medicine.

Learn more about the NextCODE Exchange here.