Genomics: Big Data Leading to Big Opportunities

The Big Data of Genomics

WuXi NextCODE Exchange

The big data of genomics will continue to expand, and our approaches to analyzing genomic data need to continue to evolve to meet the growing demands of clinicians and researchers. Cloud-based platforms such as WuXi NextCODE’s Exchange are essential to address the fundamental big data challenge of genomics.

Beyond question, we are in the midst of an explosion of “Big Data” in many facets of human endeavors. In fact, data-storage leader IBM asserts that roughly 2.5 quintillion bytes of data are generated every day and 90% of the world’s data was created in the last two years.

An outpouring of articles in scientific journals and major newspapers has highlighted the promising potential of big data in medicine, including a special section in the current issue of Nature.  Genomics has become a major source of the growth of such big data, particularly as the cost of sequencing genomes has plummeted. The raw sequence data for just one person’s whole genome use as much as 100GB—and already hundreds of thousands of individual genomes have been sequenced.  With more than 2,500 high-throughput sequencing instruments currently used in 55 countries across the globe, more genomes are added every day. The aggregate amount of genomic data is growing explosively, and next-generation sequencing (NGS) sequencing data are estimated to have doubled in volume annually since 2007.

The accumulation of genomic data is a worldwide phenomenon.  Impressive population-wide sequencing efforts are leading the way, from 100,000 genomes in England, Saudi Arabia, and Iceland to 350,000 in Qatar to a million in both China and the U.S.

And earlier this month, the CEO of the Cleveland Clinic predicted that soon children will routinely have their whole genomes sequenced at birth, implying a near-future in which 10s of millions of new genomes are sequenced annually.

Turning Data into Resources

But sequencing genomes is not enough, and the creation of genomic big data is just the beginning.

Thanks to the analysis of big data in genomics and associated informatics, we are seeing meaningful progress in cancer care and the diagnosis of rare diseases, as I have discussed here and here. We clearly have a tremendous opportunity to use the big data of genomics to continue to drive a revolution in healthcare.

Yet there is a broad consensus that a ‘data bottleneck’ is hampering collaboration and discovery. Not all researchers and physicians confronting the current onslaught of genomic big data can readily determine how to use genetic information to prevent or treat disease. To succeed, researchers and physicians clearly need resources that:

  • Draw together useful data from disparate sources;
  • Facilitate analysis and collaboration; and
  • Improve clinical practice.

The power of genomic analysis needs to expand outward from major research centers and hospitals to the myriad clinics and community hospitals where many patients receive care. To have the greatest impact on the broadest population, clinicians throughout the world’s health systems need access to the big data generated by DNA sequencing, even—or perhaps especially—if they are not affiliated with research institutions. They also need to be able to make sense of the data they have access to.

Answers in the Cloud

Sequencing provides the raw data to uncover the genetic variants that contribute to disease. But the datasets are too big to transfer repeatedly—and too big even for smaller hospitals, labs, or clinics to store onsite. Key medical advancements require not only big data, but also tools and resources to generate, interpret, and share analysis of millions of genomes.

Cloud-based platforms—such as WuXi NextCODE’s Exchange—are essential to address the fundamental big data challenge of genomics. Collaboration in the cloud works to dismantle existing “data silos”—genomic information hosted only on local servers and analyzed on idiosyncratic, closed platforms. The NextCODE Exchange, in contrast, is a browser-based hub that affords secure, seamless collaboration with colleagues around the world. Moreover, users get access to NextCODE’s tools for making the critical links between variation in the genome and disease and other phenotypes, backed by harmonized links to the the most important public reference data.

And cloud-based computing is inherently scalable: resources for data storage and analysis expand as needed, allowing researchers and physicians to leverage massive datasets to improve patient care in the clinic. The big data of genomics will continue to expand, and our approaches to analyzing genomic data need to continue to evolve to meet the growing demands of clinicians and researchers.

At WuXi NextCODE, we have built upon our heritage of conducting the largest analysis of genomic data (deCODE’s path-breaking Icelandic analysis) by assembling an ever-growing database of human genomes. We are committed to driving the movement of sequence data into patient diagnosis and care through user-friendly, leading-edge analysis and informatics. I am confident that data analysis and collaboration in the cloud will revolutionize healthcare, and exceptionally proud that WuXi NextCODE’s Exchange is at the forefront of this exciting advancement.


2015: An Inflection Point for Genomics Adoption Around the Globe

2015 genomics hannes smarason

2015 is shaping up to be a significant year in the advancement and adoption of genome sequencing and personalized medicine around the globe.

The year 2015 is shaping up to be an inflection point in the advancement and adoption of genome sequencing and personalized medicine.  While private initiatives are often the centerpiece of media coverage, leading governments clearly have advanced a number of important initiatives this year.  Indeed, many governments around the globe are actively promoting widespread utilization of genomics, supporting academic research, establishing industry guidelines, and raising public awareness.

Governments Serving as Catalysts for Genomics Progress

The efforts of officials worldwide to engage with and support the private sector’s tremendous potential have helped to make 2015 a significant year for expanding the use of genomics in clinical care.  A few highlights of 2015 include:

— In the U.S., President Obama made precision health one of the centerpieces of his State of the Union address in January. Obama’s administration kicked this effort off by requesting a $215M investment in a Precision Medicine Initiative with the following key attributes:

  • The cornerstone of Obama’s proposal is the plan to collect and analyze genomic data from a million or more volunteers;
  • The initiative further supports genomics through expanded research into the genetic mutations that drive cancer;
  • Additional funding is earmarked to maintain databases and develop industry standards.

— Germany and the U.K. expanded eligibility for government-funded genetic testing for breast cancer patients.

— Israel announced its intent to establish a government-sponsored genetic database.

— Through the National Institutes of Health and the National Cancer Institute, the U.S. federal government proposed dozens of new funding opportunities to support research in genetic sequencing and analysis.

— Japan launched an Initiative on Rare and Undiagnosed Diseases to provide genomic analysis and expert consultation for up to 1,000 individuals with childhood onset of undiagnosed conditions.

— Through Genomics England (which I described in further detail here), the U.K. Department of Health tapped WuXi NextCODE and others to begin interpretation in its groundbreaking 100,000 Genomes Project.

In news today, the trend toward globalization of genomics continues, as private sector leaders aligned to meet the needs of the forward-looking government health initiatives of Qatar:

— WuXi NextCODE and the Sidra Medical and Research Center partner to power population genomics and precision medicine in Qatar. Our partnership will:

  • Facilitate clinical diagnostics;
  •  Accelerate research; and
  • Support the Qatar Genome Project.

As I have discussed in an earlier post, large-scale population studies are an essential step in harnessing the power of genomics to improve health worldwide.  Since WuXi NextCODE’s foundational heritage as part of deCODE Genetics’ landmark analysis of Icelanders, we have always developed the tools to help translate sequence data into precision medicine on a large scale.  In our work with Genomics England, our collaboration with Fudan Children’s Hospital to diagnose rare diseases in China, and now our partnership with Sidra, the team at WuXi NextCODE is leading the effort to realize the potential of genomics on a truly global scale. The increasing interest in supporting those efforts shown by leading governments across the globe is helping to drive the successful use and application of genomics worldwide.

Population-Scale Research Efforts Enabled by Progress in Sequencing

population-scale genomics

Significant insights gained from population-scale genomic studies, based on the knowledge of genetic variation and disease causation, will help to enable a new reality of personalized medicine and treatment.

The ability to sequence whole genomes quickly and economically is driving interest in population-scale sequencing efforts that can reveal meaningful insights on a much more systematic basis than previous approaches. A range of large initiatives announced recently are prime examples of the trend in population sequencing, including industry programs by Regeneron and Human Longevity, and the 100,000 Genomes Project by Genomics England. Perhaps better than any other effort since the founding of deCODE in Iceland, the establishment of a high-throughput Genomics Center at Sidra Medical and Research Center in Qatar embodies the movement toward these types of population studies. The eventual goal of the project is to sequence the entire Qatari population of some 300,000 people. But from the beginning, the Sidra facility will help advance genetic mapping projects, including the creation of Arab consensus genome to obtain a better understanding of genetic variants that influence health across Arab populations and, indeed, beyond. In addition to these efforts, the center will focus on uncovering the causes of rare genetic diseases. The significant insights that can be gained from population-scale studies, based on the knowledge of genetic variation and disease causation, will help to enable a new reality of personalized medicine and treatment. And this is where efficient, powerful and industrial-scale analysis will become critical. NextCODE’s analytics and interpretation systems have already been tested at such scale, as they are based on the world’s first and largest population genomics effort—that of deCODE. [see blog post] Our systems will be useful tools to efficiently deliver insights based on the vast amount of data that will be generated by these major population-based efforts to improve the state of global healthcare.

A Standard Database Architecture Will Build a Stronger Foundation for Genome Discoveries

big data genome sequencing hannes smarason

The general adoption of the Genomically-Ordered Relational database (GOR) as a data standard for storing genomic data may greatly accelerate the spread of sequencing and its effectiveness as a tool for advancing medicine.

It is widely accepted that the ability to share the analysis and insights from DNA sequencing will be a key driver of discovery and innovation. But one current limitation to extending this knowledge is that sequencing and analysis platforms, as well as samples, are often proprietary to and stored at different institutions. Perhaps more important, the structures and formats in which genomic data has customarily been stored—the relational databases developed by the likes of IBM and Oracle—make it unwieldy to analyze as the amount of data grows, and very difficult to share. The upshot is that institutions cannot easily share and consolidate information to generate more robust analyses and clinically relevant insights. This presents a serious hurdle to discovery both in rare disorders, where samples need to be gathered in order to generated adequate analytical power, and in complex ones, where truly massive studies can tease apart different facets of disease and reveal their causes.

Over the past decade, a novel and comprehensive database model has been developed to solve this bottleneck, offering a flexible and fast means to overcome these problems. It is called the Genomically-Ordered Relational database, or GOR, and was designed to manage and query the detailed genomic data amassed by deCODE genetics in Iceland – the world’s first and still by far largest and most comprehensive population-based genomic database.

The thinking behind the GOR is as simple as it is revolutionary. Genomic data is a sort of big data but one with an important difference: It is divided up in distinct packets—the chromosomes—and then arranged within each chromosome in linear fashion. The GOR makes use of this by storing and querying sequence data according to its unique position in the genome, rather than as huge files as long as the sequence. This radically reduces the data burden of querying even large numbers of whole genomes, at the same time making it possible to store and visualize instantly the raw sequence underlying an analysis.

In practice, the GOR thereby enables researchers to home in on specific variants without having first to call up entire patient genomes, and separates raw data from annotations to focus in on only the most relevant search components. It’s these types of functions and features that can be consistently applied across data storing systems to allow for more multi-institutional, collaborative research and consistency in outcomes worldwide.

Leaders in the genomic research community are now beginning to create coalitions and working groups to underpin and coordinate the adoption of standards for sharing genomic data. As these groups create flexible and efficient policy frameworks, the GOR is tested and ready to support the fundamental data requirements of global data sharing and the acceleration of discoveries in genome-based medicine. The general adoption of the GOR as a data standard for storing genomic data may greatly accelerate the spread of sequencing and its effectiveness as a tool for advancing medicine around the world.

Genomics-Based Medicine Coming Into View

NextCODE Health

NextCODE Health has quickly gained recognition for its unique capabilities to address unmet needs in the genomics space through a massive genomics database that interprets DNA samples to identify relevant disease markers.

The practice and adoption of genomic medicine is accelerating as technologies improve, costs fall and new insights drive better patient care. While many companies are supporting this emerging field, a select few are providing the unique perspectives and capabilities to advance progress even faster.

NextCODE Health made headlines less than a year ago with the announcement of its launch and funding by major investors in healthcare and biotechnology. The company quickly gained recognition for its unique capabilities to address unmet needs in the genomics space through a massive genomics database that interprets DNA samples to identify relevant disease markers. (See the features in Xconomy, Bio-IT World and PLOS Blog.) The company was later mentioned in Nature Biotechnology News for its potential contributions to genome studies by leveraging key reference data from deCODE’s Icelandic work in Iceland.

Its rapid trajectory since launch and the utility of its genomic analysis technology was featured in BioCentury in May, featuring testimonials from clinicians using NextCODE capabilities to diagnose patients at Boston Children’s Hospital, the Baylor College of Medicine, and the Sanford School of Medicine. In June, it was featured in a major interview with Bio-IT World and the company continues to expand. Since then, NextCODE has announced several programs through which global pioneers in clinical genomics research are applying its interpretation and analysis technology to support research and diagnosis in rare diseases, including:

As more organizations employ genomics in major research initiatives, NextCODE’s interpretation technology will be an increasingly important asset in delivering meaningful insights from the wealth of genomic data being produced. Visit NextCode for the latest on how the future of genomics-based medicine continues to evolve.

Pioneering Genome Sequencing Effort in England Aims to Shape the Future of Global Medicine

£300 million in new investments for Genomics England

Genomics England 100,000 Genomes Project

Genomics England was set up by the UK Department of Health to deliver the 100,000 Genomes Project. Initially the focus will be on rare disease, cancer, and infectious disease. The project is currently in its pilot phase and will be completed by the end of 2017.

These are exciting times for large-scale sequencing projects. Last week, U.K. Prime Minister David Cameron announced over £300 million ($509.4 million) in new investments for Genomics England, which aims to sequence, analyze, and store the genomes of 100,000 UK National Health Service (NHS) patients by 2017. The investments include about £162 million ($275.1 million) from Illumina Inc. (NASDAQ:ILMN), the partner for the sequencing element of the project. In turn, Genomics England will pay Illumina about £78 million ($132.4 million) for its services.

At the same time, the Wellcome Trust will put £27 million ($45.8 million) into a new sequencing hub at its genome campus in Cambridge; the Medical Research Council, or MRC, is investing £24 million ($40.7 million) to support data analysis and interpretation, and the NHS will make £20 million ($34 million) available for the establishment of patient sequencing centers.

This is a prime example of how the implementation of sequencing technologies promises to drive a revolution in the structure of medical research. These new projects aim to capture more data on human DNA than ever before, with the goal of advancing care and solving healthcare challenges.

The 100,000 Genomes Project, developed by the NHS, has the potential to significantly influence the global community through its plans to integrate sequencing data into standard medical practice.

Genomics England plans to generate 100,000 whole genome sequences from NHS patients with cancer, rare diseases, and other conditions, and to share the resulting data for research and development purposes. In the early phases, the program will also seek to develop standards for consent, sample storage, data generation and variant analysis that may be useful for many other organizations conducting large-scale projects within public health systems.

The project is enlisting the help of organizations from around the world to undertake this significant effort. In fact, it recently selected Illumina to conduct the sequencing efforts and is evaluating technologies for storing, annotating, and interpreting the data so that it can be used  for both clinical diagnostics and drug discovery, development, and delivery to the right patients.

The challenges of analyzing data on such a large scale are formidable, but the end result carries great potential to address some of the significant unmet medical needs. NextCODE’s technology has already accomplished analytics on this scale based on its work with the Icelandic population through deCODE genetics. It’s an exciting prospect for advancing the future of genomics-driven medicine and one to watch.

Personalized Medicine: The Future is Almost Here

The new era of personalized medicine.

The achievement of low-cost genome sequencing and the use of genomic data to better understand diseases are advancing the exciting new era of personalized medicine.

It’s been more than a decade since the human genome was first sequenced. Since then, we have been on the journey of applying this profound new discovery to create personalized medicine and advance human health.

Two significant triumphs along this human genome journey:

  • Using genomic data to better understand diseases; and
  • Achieving low-cost genome sequencing.

Each of these accomplishments has been a stepping stone into the exciting new era that is dawning now: where genomic information is becoming integrated into medical care.

Using Genomic Data to Better Understand Diseases

Let’s take a look back at the early days of using genomic data to connect the dots between genetic mutations and disease. From 1997-2004, I was part of the leadership team at deCODE, the Icelandic genomic company. This was the period when deCODE was building the world’s most productive human genomics platform, with a database of  tens of thousands of individuals who participated in genetic studies and including the largest database of genomes to this day. deCODE’s genomic engine was able to successfully identify the genetic variations associated with human disease. This resulted in dozens of groundbreaking discoveries that were published in major, peer-reviewed journals.

The legacy of deCODE was the creation of an industrialized platform capable of massive storage and analysis capabilities. This enabled researchers to crunch genomic data to gain insights about genetic variants, or risk factors, associated with many common diseases. deCODE’s premise was that once the genetics of disease was better understood that information could be used to create new ways to diagnose, treat and prevent disease. However, when I left deCODE in 2004, there were still barriers to overcome before this genomic information could be widely applied to the level of an individual patient. Chief among them was that the cost of genome sequencing was still prohibitively high. (deCODE was subsequently acquired by Amgen).

Achieving Low-Cost Genome Sequencing

Back in 2004, the cost to sequence a single human genome was hundreds of thousands of dollars. Today that cost is a few thousand dollars (and, in fact, fast approaching $1,000) for a whole genome sequence. DNA sequencing costs continue to fall, as speed and accuracy increase.

This means we are rapidly approaching a tipping point where, as the sequencing of human genomes becomes more economical, its adoption in the medical community becomes more widespread and genomic data can become more routine in medical care. This is why personalized medicine is becoming a reality.

The Era of Genome Sequencing in Medical Care

The steep drop in the costs of sequencing, combined with the explosion of research on gene variants and disease, mean the time is fast approaching when genome sequencing will become routine in medical care. Today, pathologists perform blood cultures to decide which antibiotics will stop a patient’s bacterial infection. Soon a patient sample can be taken to perform a genome sequencing to analyze the genetic characteristics of a patient to determine ways a disease can be prevented or, if they are sick, which treatments might work best for their disease.

The body of genomic knowledge and the large databank of human genomes built by pioneers like deCODE established the key building blocks that enable genome sequencing to have predictive power for individual patients. As more human genomes are sequenced and more genetic variants are associated with disease, the predictive power of knowing about risk genes and effective treatments for each patient – a.k.a. personalized medicine – will become an essential part of medical care.

Genome Sequencing Being Implemented by Medical Centers

In preparation for the future of personalized medicine, major medical centers in the U.S., Europe and Asia are actively beginning to install DNA sequencers and supercomputers as important tools for integrating genome sequencing into medical care. These medical centers are taking initial steps toward the routine sequencing of every patient’s genome to define the ideal course of prevention and treatment based on variants found in a patient’s genes.

Evidence of this adoption of genome sequencing by medical centers appeared in an article in The New York Times in April 2013 citing that:

  • Medical centers in New York City are spending more than $1 billion on new genomic research centers;
  • Several hospitals around the U.S. are undertaking systematic genome sequencing in patients;
  • Mount Sinai Medical Center has a program in which 24,000 patients participate in a biobank to include their DNA sequence and research over their lifetimes;
  • Memorial Sloan-Kettering Cancer Center sequenced 16,000 tumors from cancer patients in 2012; and
  • Phoenix Children’s Hospital opened a new institute in December 2012 to sequence the genomes of 30 percent of their childhood cancer patients.

For now, the use of whole genome sequencing in medical practice is still in its infancy, but the pace of progress continues to accelerate. Clearly, genome sequencing will soon become part of the nucleus of medical care. This will herald a new era in personalized medicine revolutionizing healthcare as we know it and transforming our lives. When do you think genome sequencing will become a part of the medical decisions in your life?