Genomics: Big Data Leading to Big Opportunities

The Big Data of Genomics

WuXi NextCODE Exchange

The big data of genomics will continue to expand, and our approaches to analyzing genomic data need to continue to evolve to meet the growing demands of clinicians and researchers. Cloud-based platforms such as WuXi NextCODE’s Exchange are essential to address the fundamental big data challenge of genomics.

Beyond question, we are in the midst of an explosion of “Big Data” in many facets of human endeavors. In fact, data-storage leader IBM asserts that roughly 2.5 quintillion bytes of data are generated every day and 90% of the world’s data was created in the last two years.

An outpouring of articles in scientific journals and major newspapers has highlighted the promising potential of big data in medicine, including a special section in the current issue of Nature.  Genomics has become a major source of the growth of such big data, particularly as the cost of sequencing genomes has plummeted. The raw sequence data for just one person’s whole genome use as much as 100GB—and already hundreds of thousands of individual genomes have been sequenced.  With more than 2,500 high-throughput sequencing instruments currently used in 55 countries across the globe, more genomes are added every day. The aggregate amount of genomic data is growing explosively, and next-generation sequencing (NGS) sequencing data are estimated to have doubled in volume annually since 2007.

The accumulation of genomic data is a worldwide phenomenon.  Impressive population-wide sequencing efforts are leading the way, from 100,000 genomes in England, Saudi Arabia, and Iceland to 350,000 in Qatar to a million in both China and the U.S.

And earlier this month, the CEO of the Cleveland Clinic predicted that soon children will routinely have their whole genomes sequenced at birth, implying a near-future in which 10s of millions of new genomes are sequenced annually.

Turning Data into Resources

But sequencing genomes is not enough, and the creation of genomic big data is just the beginning.

Thanks to the analysis of big data in genomics and associated informatics, we are seeing meaningful progress in cancer care and the diagnosis of rare diseases, as I have discussed here and here. We clearly have a tremendous opportunity to use the big data of genomics to continue to drive a revolution in healthcare.

Yet there is a broad consensus that a ‘data bottleneck’ is hampering collaboration and discovery. Not all researchers and physicians confronting the current onslaught of genomic big data can readily determine how to use genetic information to prevent or treat disease. To succeed, researchers and physicians clearly need resources that:

  • Draw together useful data from disparate sources;
  • Facilitate analysis and collaboration; and
  • Improve clinical practice.

The power of genomic analysis needs to expand outward from major research centers and hospitals to the myriad clinics and community hospitals where many patients receive care. To have the greatest impact on the broadest population, clinicians throughout the world’s health systems need access to the big data generated by DNA sequencing, even—or perhaps especially—if they are not affiliated with research institutions. They also need to be able to make sense of the data they have access to.

Answers in the Cloud

Sequencing provides the raw data to uncover the genetic variants that contribute to disease. But the datasets are too big to transfer repeatedly—and too big even for smaller hospitals, labs, or clinics to store onsite. Key medical advancements require not only big data, but also tools and resources to generate, interpret, and share analysis of millions of genomes.

Cloud-based platforms—such as WuXi NextCODE’s Exchange—are essential to address the fundamental big data challenge of genomics. Collaboration in the cloud works to dismantle existing “data silos”—genomic information hosted only on local servers and analyzed on idiosyncratic, closed platforms. The NextCODE Exchange, in contrast, is a browser-based hub that affords secure, seamless collaboration with colleagues around the world. Moreover, users get access to NextCODE’s tools for making the critical links between variation in the genome and disease and other phenotypes, backed by harmonized links to the the most important public reference data.

And cloud-based computing is inherently scalable: resources for data storage and analysis expand as needed, allowing researchers and physicians to leverage massive datasets to improve patient care in the clinic. The big data of genomics will continue to expand, and our approaches to analyzing genomic data need to continue to evolve to meet the growing demands of clinicians and researchers.

At WuXi NextCODE, we have built upon our heritage of conducting the largest analysis of genomic data (deCODE’s path-breaking Icelandic analysis) by assembling an ever-growing database of human genomes. We are committed to driving the movement of sequence data into patient diagnosis and care through user-friendly, leading-edge analysis and informatics. I am confident that data analysis and collaboration in the cloud will revolutionize healthcare, and exceptionally proud that WuXi NextCODE’s Exchange is at the forefront of this exciting advancement.


Bringing Together Core Technologies Unlocks Genomic Data to Improve Healthcare

genome analysis technologies

Within the “3-legged stool” of genomics-enabling technologies, lower-cost genome sequencing has reached a point of strong commercial viability, and the remaining two legs—genomic analysis tools database storage—are rapidly evolving to support the use of genomic information in medical care.

The adoption of genome sequencing technology is rapidly expanding as medical centers around the world embrace its utility in informing healthcare decisions—an emerging reality of personalized medicine.

There are three important areas of technology that are driving the use of genomic data in healthcare:  genome sequencing, genomic analysis tools, and database storage.

The first of these—genome sequencing—has advanced to the point that it is more widely accessible, with the cost of sequencing at nearly $1,000 or less. This lower cost of genome sequencing has reached a critical milestone to enable the use of sequencing as a mass-market product for medical care.

The second and third core genomic technologies—genomic analysis tools and database storage—are in the midst of evolution. Their progress and integration are critical for the next stage of adoption of genomic data into health care.

The rapidly evolving legs of the “3-legged stool” of genomics technology are genomic analysis tools and database storage.

  • Genomic Analysis Tools: Since the human genome was first sequenced more than a decade ago, an increasingly robust body of research has showcased the links between mutations identified in the genome and disease risk. Informatics tools have been developed by medical centers and genomics companies to apply to whole-genome samples. Increasingly, these genome analysis tools will need to adapt to the steady pace of new genomic linkages to disease and to operate at a level approaching “big data.”
  • Database Storage for Human Genomes: There are a growing number of robust databases of human genomes, including data for healthy people or those with certain diseases. When properly analyzed, these databases offer the potential to provide the medical community with a reference library against which to compare genetic data. Large-scale, high-quality databases are an essential element to cross-reference a patient genome to guide more informed medical decisions.

Recently, two leading genomics companies—WuXi and NextCODE Health—have combined their technology capabilities in these two areas. WuXi has industry-leading capabilities to analyze, store, and manage the vast amounts of genomic data. NextCODE Health brings a leading-edge system for sequence-based clinical diagnostic applications and genome analysis.

The combination of WuXi’s foundational genomic database storage and management and NextCODE’s sophisticated genome analysis tools will integrated the key components that are most rapidly evolving to apply genomics to medical care.

Initiatives like these advance the state-of-the-art in genomic analysis and database storage, bringing us to the heart of helping the world to fully harness personalized medicine and providing tools directly to doctors to provide better diagnostics and treatments to patients.

The progress to date has been amazing. Yet the opportunities ahead are even more extraordinary to improve the speed, accuracy, and accessibility of genomic information to improve human health.

Early Adopters of Sequencing in the Clinic

early adopters of sequencing in the clinic

Leaders in the medical community are actively enhancing their facilities with DNA sequencers and supercomputers—steps toward the routine sequencing of patient genomes that will inform the full spectrum of care decisions.

It is increasingly evident that sequencing and analyzing genomic information can contribute to more informed healthcare decisions, and major research institutions and medical centers around the world seem to agree.

Leaders in the medical community are actively enhancing their facilities with DNA sequencers and supercomputers, recognizing the efficiencies of having this advanced technology at their disposal for innovative research programs. And as they look to the future, they are taking steps toward the routine sequencing of patient genomes that will inform the full spectrum of care decisions, from defining risk, to diagnosing disease, to defining the ideal course of treatment for the best possible outcome.

Just a few examples of the major advances in the use of sequencing technologies that have been announced recently…

From medical centers:

  • Mount Sinai Medical Center in New York initiated a program in which 24,000 patients participate in a biobank to include their DNA sequence and research over their lifetimes. The program, called BioMe™, is among the largest in the United States.
  • Memorial Sloan-Kettering Cancer Center researchers are active in a range of collaborations that seek to understand the molecular changes that characterize cancer, the largest of which is The Cancer Genome Atlas (TCGA), a project jointly funded by the NCI and the National Human Genome Research Institute. MSK currently houses one of TCGA’s Genome Data Analysis Centers.
  • Phoenix Children’s Hospital launched a new molecular and personalized medicine research institute that will “bring genomics research to the forefront of pediatrics.” The infrastructure will include a range of capabilities, such as a biospecimen repository, DNA sequencing and analysis, and a CLIA lab for genomic profiling.

And research institutions:

  • The Wellcome Trust Sanger Institute is dramatically upgrading its storage and data management capacity.  The Institute already operates 30 DNA sequencers, each of which generates roughly a terabyte of data every day. New upgrades will double their capacity and improve data management and organization software.
  • Harvard Medical School’s Center for Biomedical Informatics, conducts informatics research with a strong emphasis on translational science informed by innovative computational strategies; the research staff use mathematical modeling to predict when genetic information could lead to more effective treatment.

By members of industry:

  • Google is jumping into the genomics industry with the launch of “Calico,” a new company that will focus on genomic sequencing and advanced analytics to identify solutions for some of the most challenging diseases today.
  • “N-of-One” is a company offering personalized cancer treatment strategies as a new employee benefit tool for innovative, health-minded employers. Through the service, the company provides interpretation of molecular profiling to employees, their family members fighting cancer and their physicians to help inform treatment decisions.

And even the U.S. government:

  • The National Institutes of Health is one of the greatest proponents of genomic sequencing for research purposes. In fact, a recently initiated program is funding research teams to examine whether sequencing newborn genomes or exomes may provide useful information beyond what is currently captured in newborn screening programs.
  • Further, in the fight against infectious diseases and “super-bugs,” the National Institute of Allergy and Infectious Diseases established the Genomic Sequencing Centers for Infectious Diseases (GSCID) to sequencing priority pathogens, microorganisms responsible for emerging and re-emerging infectious diseases and related organisms.

With such a broad array of innovative research underway within the halls of the world’s leading institutions, there is no doubt sequencing is on the verge of delivering exciting breakthroughs in medicine. In fact, we’re seeing evidence of this with NextCODE, which has engaged with several “early adopter” organizations around the globe.  Check it out here.