Genomics: Big Data Leading to Big Opportunities

The Big Data of Genomics

WuXi NextCODE Exchange

The big data of genomics will continue to expand, and our approaches to analyzing genomic data need to continue to evolve to meet the growing demands of clinicians and researchers. Cloud-based platforms such as WuXi NextCODE’s Exchange are essential to address the fundamental big data challenge of genomics.

Beyond question, we are in the midst of an explosion of “Big Data” in many facets of human endeavors. In fact, data-storage leader IBM asserts that roughly 2.5 quintillion bytes of data are generated every day and 90% of the world’s data was created in the last two years.

An outpouring of articles in scientific journals and major newspapers has highlighted the promising potential of big data in medicine, including a special section in the current issue of Nature.  Genomics has become a major source of the growth of such big data, particularly as the cost of sequencing genomes has plummeted. The raw sequence data for just one person’s whole genome use as much as 100GB—and already hundreds of thousands of individual genomes have been sequenced.  With more than 2,500 high-throughput sequencing instruments currently used in 55 countries across the globe, more genomes are added every day. The aggregate amount of genomic data is growing explosively, and next-generation sequencing (NGS) sequencing data are estimated to have doubled in volume annually since 2007.

The accumulation of genomic data is a worldwide phenomenon.  Impressive population-wide sequencing efforts are leading the way, from 100,000 genomes in England, Saudi Arabia, and Iceland to 350,000 in Qatar to a million in both China and the U.S.

And earlier this month, the CEO of the Cleveland Clinic predicted that soon children will routinely have their whole genomes sequenced at birth, implying a near-future in which 10s of millions of new genomes are sequenced annually.

Turning Data into Resources

But sequencing genomes is not enough, and the creation of genomic big data is just the beginning.

Thanks to the analysis of big data in genomics and associated informatics, we are seeing meaningful progress in cancer care and the diagnosis of rare diseases, as I have discussed here and here. We clearly have a tremendous opportunity to use the big data of genomics to continue to drive a revolution in healthcare.

Yet there is a broad consensus that a ‘data bottleneck’ is hampering collaboration and discovery. Not all researchers and physicians confronting the current onslaught of genomic big data can readily determine how to use genetic information to prevent or treat disease. To succeed, researchers and physicians clearly need resources that:

  • Draw together useful data from disparate sources;
  • Facilitate analysis and collaboration; and
  • Improve clinical practice.

The power of genomic analysis needs to expand outward from major research centers and hospitals to the myriad clinics and community hospitals where many patients receive care. To have the greatest impact on the broadest population, clinicians throughout the world’s health systems need access to the big data generated by DNA sequencing, even—or perhaps especially—if they are not affiliated with research institutions. They also need to be able to make sense of the data they have access to.

Answers in the Cloud

Sequencing provides the raw data to uncover the genetic variants that contribute to disease. But the datasets are too big to transfer repeatedly—and too big even for smaller hospitals, labs, or clinics to store onsite. Key medical advancements require not only big data, but also tools and resources to generate, interpret, and share analysis of millions of genomes.

Cloud-based platforms—such as WuXi NextCODE’s Exchange—are essential to address the fundamental big data challenge of genomics. Collaboration in the cloud works to dismantle existing “data silos”—genomic information hosted only on local servers and analyzed on idiosyncratic, closed platforms. The NextCODE Exchange, in contrast, is a browser-based hub that affords secure, seamless collaboration with colleagues around the world. Moreover, users get access to NextCODE’s tools for making the critical links between variation in the genome and disease and other phenotypes, backed by harmonized links to the the most important public reference data.

And cloud-based computing is inherently scalable: resources for data storage and analysis expand as needed, allowing researchers and physicians to leverage massive datasets to improve patient care in the clinic. The big data of genomics will continue to expand, and our approaches to analyzing genomic data need to continue to evolve to meet the growing demands of clinicians and researchers.

At WuXi NextCODE, we have built upon our heritage of conducting the largest analysis of genomic data (deCODE’s path-breaking Icelandic analysis) by assembling an ever-growing database of human genomes. We are committed to driving the movement of sequence data into patient diagnosis and care through user-friendly, leading-edge analysis and informatics. I am confident that data analysis and collaboration in the cloud will revolutionize healthcare, and exceptionally proud that WuXi NextCODE’s Exchange is at the forefront of this exciting advancement.

email

2015: An Inflection Point for Genomics Adoption Around the Globe

2015 genomics hannes smarason

2015 is shaping up to be a significant year in the advancement and adoption of genome sequencing and personalized medicine around the globe.

The year 2015 is shaping up to be an inflection point in the advancement and adoption of genome sequencing and personalized medicine.  While private initiatives are often the centerpiece of media coverage, leading governments clearly have advanced a number of important initiatives this year.  Indeed, many governments around the globe are actively promoting widespread utilization of genomics, supporting academic research, establishing industry guidelines, and raising public awareness.

Governments Serving as Catalysts for Genomics Progress

The efforts of officials worldwide to engage with and support the private sector’s tremendous potential have helped to make 2015 a significant year for expanding the use of genomics in clinical care.  A few highlights of 2015 include:

— In the U.S., President Obama made precision health one of the centerpieces of his State of the Union address in January. Obama’s administration kicked this effort off by requesting a $215M investment in a Precision Medicine Initiative with the following key attributes:

  • The cornerstone of Obama’s proposal is the plan to collect and analyze genomic data from a million or more volunteers;
  • The initiative further supports genomics through expanded research into the genetic mutations that drive cancer;
  • Additional funding is earmarked to maintain databases and develop industry standards.

— Germany and the U.K. expanded eligibility for government-funded genetic testing for breast cancer patients.

— Israel announced its intent to establish a government-sponsored genetic database.

— Through the National Institutes of Health and the National Cancer Institute, the U.S. federal government proposed dozens of new funding opportunities to support research in genetic sequencing and analysis.

— Japan launched an Initiative on Rare and Undiagnosed Diseases to provide genomic analysis and expert consultation for up to 1,000 individuals with childhood onset of undiagnosed conditions.

— Through Genomics England (which I described in further detail here), the U.K. Department of Health tapped WuXi NextCODE and others to begin interpretation in its groundbreaking 100,000 Genomes Project.

In news today, the trend toward globalization of genomics continues, as private sector leaders aligned to meet the needs of the forward-looking government health initiatives of Qatar:

— WuXi NextCODE and the Sidra Medical and Research Center partner to power population genomics and precision medicine in Qatar. Our partnership will:

  • Facilitate clinical diagnostics;
  •  Accelerate research; and
  • Support the Qatar Genome Project.

As I have discussed in an earlier post, large-scale population studies are an essential step in harnessing the power of genomics to improve health worldwide.  Since WuXi NextCODE’s foundational heritage as part of deCODE Genetics’ landmark analysis of Icelanders, we have always developed the tools to help translate sequence data into precision medicine on a large scale.  In our work with Genomics England, our collaboration with Fudan Children’s Hospital to diagnose rare diseases in China, and now our partnership with Sidra, the team at WuXi NextCODE is leading the effort to realize the potential of genomics on a truly global scale. The increasing interest in supporting those efforts shown by leading governments across the globe is helping to drive the successful use and application of genomics worldwide.

A New Era, New Vision for WuXi and NextCODE Health

WuXi-NextCODE

WuXi PharmaTech has acquired NextCODE Health to create WuXi NextCODE Genomics, a global leader in genomic medicine. Pairing WuXi’s technology and existing reach with NextCODE’s leading analytics and database promises to advance the pace of genomics research today.

In the fast-paced genomics community, we continually look for new opportunities and strategies to enhance the value of genomics and use the increasingly robust body of genomic data for the advancement of clinical medicine.

We’re excited to announce a new, ambitious vision to do just that, with WuXi’s acquisition of NextCODE Health. NextCODE will be merged with WuXi’s existing Genome Center in wholly-owned subsidiary called WuXi NextCODE Genomics, with unique, comprehensive and global capabilities for using genomic data to deliver better medicine and improve healthcare.

WuXi, a Shanghai-based genomic laboratory service partner for companies in the pharma and biotech community, has already been collaborating with NextCODE to provide analysis services to customers of the WuXi Genome Center. Now, with the in-house capability to analyze, store, and manage the vast amount of genomic data, NextCODE’s industry-leading genome sequence analysis platform will expand WuXi’s core next-generation sequencing benefits and services.

Pairing WuXi’s technology and existing reach with NextCODE’s leading analytics and database promises to advance the pace of genomics research today. More importantly, however, this new era for NextCODE brings exciting opportunities to maximize the most advanced tools available today and contribute to major advances in genomic medicine.

Advancing Autism Research By Sharing Genomic Data Online: The Simons Simplex Collection

THE NEXTCODE Exchange is hosting the Simons Simplex Collection (SSC), a global resource for research on autism spectrum disorders comprising genomic data from nearly 2,800 families.

THE NEXTCODE Exchange is hosting the Simons Simplex Collection (SSC), a global resource for research on autism spectrum disorders comprising genomic data from nearly 2,800 families.

Autism research is underway around the world to better understand the genetic basis for the disease, which is difficult to diagnose and has limited treatment options. With vast amounts of data being generated, the answers to this challenging disease may lie in the consolidation of this global data.

The newly launched NextCODE Exchange (read the release here) may be a critical solution in changing how autism is diagnosed and treated. The Exchange is hosting the Simons Simplex Collection (SSC), a global resource for research on autism spectrum disorders comprising genomic data from nearly 2,800 families.

With the Exchange, the SSC will be accessible to the world’s autism researchers to harmonize the growing body of relevant genomic data. By enabling the rapid analysis of massive amounts of sequencing data followed by instant collaboration and validation of findings, the availability of the SSC and other hosted data will accelerate the pace of discovery in this field.

This simple concept is likely to help usher in a new era of genomic medicine, offering global access to data that can answer questions to some of today’s most challenging diseases.

Learn more about the NextCODE Exchange and the Simons Simplex Collection here.

Maintaining Momentum Post-ASHG: Maximizing the Value of Large Genomic Databases

The newly launched NextCODE Exchange provides a browser-based hub for multi-center sharing and collaboration on collective data from massive whole-genome databases like the Haplotype Reference Consortium (HRC).

The newly launched NextCODE Exchange provides a browser-based hub for multi-center sharing and collaboration on collective data from massive whole-genome databases like the Haplotype Reference Consortium (HRC).

The American Society of Human Genetics (ASHG) meeting convened this week in San Diego, bringing together genetics experts from around the world to discuss programs with great potential to advance genomic-based medicine in the years to come.

To maintain the momentum generated this week, we need to find ways to integrate these important ideas, insights and programs, and to maximize the use of the massive databases that have been launched to support research on cancer, rare diseases and other pressing health topics.

One of the databases unveiled during the meeting was the Haplotype Reference Consortium, which aims to become the world’s most comprehensive database of genetic variations. Large databases like the HRC, along with several others already underway, can be tremendously helpful to researchers finding answers to some of the most challenging diseases. But there remains a significant bottleneck: these large, cumbersome databases cannot easily be shared and manipulated, limiting their utility for broad, multi-center genomic research.

The solution lies in the newly launched NextCODE Exchange (see release here). This browser-based hub allows for the sharing and harmonizing of massive whole-genome databases like the HRC to accelerate research. The integrated architecture allows users to visually confirm and validate findings in raw sequences, collaborating and sharing with others around the world who may have complementary research underway.

The momentum generated during ASHG will be multiplied by sharing and learning from the world’s collective genomic data on the NextCODE Exchange. Learn more here.

Imagine the Potential: The World’s First Online Hub for Global Genomic Data Access

The NextCODE Exchange, a new browser-based hub, allows for real-time sharing of whole genome collections in a simple, consistent format.

The NextCODE Exchange, a new browser-based hub, allows for real-time sharing of whole genome collections in a simple, consistent format.

The field of genomic medicine is rapidly advancing as the research community becomes more comfortable manipulating genomic data with the goal of discovering insights about disease causes and risks. Yet each database is hosted within separate organizations, organized in unique ways and vastly too cumbersome to easily share with others who may be working on similar research.

This weekend a new tool launched to enable just that. The NextCODE Exchange (see release here), a new browser-based hub, allows for real-time sharing of whole genome collections in a simple, consistent format.

The availability of this Exchange is a critical advance in extending the utility of genomic data by allowing organizations around the world to access and harmonize large complementary datasets, potentially multiplying their study data sets to gain more reliable insights than ever before.

Already, numerous organizations are participating in the NextCODE Exchange to add and share their genomic data, including clinicians and researchers affiliated with Boston Children’s Hospital, University College Dublin, Queensland Institute of Medical Research (Australia), and Saitama Medical University (Japan).

As new institutions look to the Exchange to share genomic data, this hub holds significant potential to help advance progress in genomic-based medicine.

Learn more about the NextCODE Exchange here.

Genome Data Interpretation: How to Ease the Bottleneck

Bloomberg NextCODE Hannes Smarason

Bloomberg BNA Business’ “Diagnostic Testing & Emerging Technologies,” highlights how NextCODE is providing a qualitatively different way to store and analyze genomic information to meet growing opportunities in personalized medicine.

With advances in sequencing technology and reduced costs, more and more data are generated every day on the genetic basis of disease. The challenge has become how to derive meaningful information from these mountains of data.

While various systems have been established in recent years to store the large amounts of genomic data from patients’ DNA, a remaining obstacle is to “break the bottleneck” so that researchers can process the vast data in multiple human genomes in order to identify and isolate a small, useful piece of information about disease. Conventional databases and algorithms have not been able to efficiently and reliably identify subset information among the millions of genetic markers in order to inform clinical decisions. This has become a major data management roadblock.

The key is to find new approaches for databases and algorithms that accommodate the unique ways that genomic information is analyzed and interpreted. As discussed in Bloomberg BNA, Diagnostic Testing & Emerging Technologies, NextCODE is already easing this bottleneck by providing a qualitatively different way to store and analyze genomic information and apply it to meet the growing opportunities for personalized medicine.

NextCODE’s Genomically Ordered Relational (or GOR) database infrastructure is a truly different way of storing this huge amount of data. The principle is very simple: rather than store sequence and reference data in vast unwieldy files, it ties data directly to its specific genomic position. As a result, the algorithms are vastly more efficient compared to a traditional relational database because they can isolate by location in the genome. That makes analysis faster, more powerful, and radically more efficient, both in terms of clinicians’ and researchers’ time, as well as computer infrastructure, I/O, and CPU usage.

This holistic approach applies broadly to the priorities of genome scientists around the world, helping them eliminate the data management bottleneck to identify more culprits to many inherited diseases, more quickly and cost effectively.

Read more about NextCODE’s work here.

Trends in Sequencing and Analysis Today Leading to Tomorrow’s Clinical Advances

The insights we’re gaining from sequencing and analysis techniques are delivering new advances in healthcare with ever greater speed and precision.

The challenge for programs seeking to accelerate their research discoveries with genomic data is how to analyze the wealth of information—to make it clinically relevant and rapidly deliver reliable insights to better inform patient care.

The insights we’re gaining from sequencing and analysis techniques are delivering new advances in healthcare with ever greater speed and precision. It’s a particularly exciting time to be a part of this evolving industry, with continual opportunities for new clinical applications of these technologies and platforms.

Companies like Illumina and others who are delivering next-generation sequencing technologies are gaining global exposure. New partnerships and programs are placing these advanced techniques into the hands of the world’s leading clinicians and researchers, who are then applying them to some of today’s greatest medical challenges.  Recently, plans to integrate sequencing technologies have been announced by world renowned organizations like the Baylor College of Medicine in the U.S., Genomics England, and Sidra Medical and Research Center in Qatar.

The challenge for these and other programs seeking to accelerate their research discoveries with genomic data is how to analyze this wealth of information – to make it clinically relevant and rapidly deliver reliable insights to better inform patient care.

NextCODE Health is working to advance this piece of the puzzle with its Genomically Ordered Relational (GOR) database and its clinical and discovery interfaces (the Clinical Sequence Analyzer​™ and Sequence Miner™).  Combining next-generation sequencing techniques with increasingly robust analysis tools, NextCODE Health is helping to accelerate global research progress today to deliver unprecedented advances in patient care in the years just ahead.

A Standard Database Architecture Will Build a Stronger Foundation for Genome Discoveries

big data genome sequencing hannes smarason

The general adoption of the Genomically-Ordered Relational database (GOR) as a data standard for storing genomic data may greatly accelerate the spread of sequencing and its effectiveness as a tool for advancing medicine.

It is widely accepted that the ability to share the analysis and insights from DNA sequencing will be a key driver of discovery and innovation. But one current limitation to extending this knowledge is that sequencing and analysis platforms, as well as samples, are often proprietary to and stored at different institutions. Perhaps more important, the structures and formats in which genomic data has customarily been stored—the relational databases developed by the likes of IBM and Oracle—make it unwieldy to analyze as the amount of data grows, and very difficult to share. The upshot is that institutions cannot easily share and consolidate information to generate more robust analyses and clinically relevant insights. This presents a serious hurdle to discovery both in rare disorders, where samples need to be gathered in order to generated adequate analytical power, and in complex ones, where truly massive studies can tease apart different facets of disease and reveal their causes.

Over the past decade, a novel and comprehensive database model has been developed to solve this bottleneck, offering a flexible and fast means to overcome these problems. It is called the Genomically-Ordered Relational database, or GOR, and was designed to manage and query the detailed genomic data amassed by deCODE genetics in Iceland – the world’s first and still by far largest and most comprehensive population-based genomic database.

The thinking behind the GOR is as simple as it is revolutionary. Genomic data is a sort of big data but one with an important difference: It is divided up in distinct packets—the chromosomes—and then arranged within each chromosome in linear fashion. The GOR makes use of this by storing and querying sequence data according to its unique position in the genome, rather than as huge files as long as the sequence. This radically reduces the data burden of querying even large numbers of whole genomes, at the same time making it possible to store and visualize instantly the raw sequence underlying an analysis.

In practice, the GOR thereby enables researchers to home in on specific variants without having first to call up entire patient genomes, and separates raw data from annotations to focus in on only the most relevant search components. It’s these types of functions and features that can be consistently applied across data storing systems to allow for more multi-institutional, collaborative research and consistency in outcomes worldwide.

Leaders in the genomic research community are now beginning to create coalitions and working groups to underpin and coordinate the adoption of standards for sharing genomic data. As these groups create flexible and efficient policy frameworks, the GOR is tested and ready to support the fundamental data requirements of global data sharing and the acceleration of discoveries in genome-based medicine. The general adoption of the GOR as a data standard for storing genomic data may greatly accelerate the spread of sequencing and its effectiveness as a tool for advancing medicine around the world.

Genomics-Based Medicine Coming Into View

NextCODE Health

NextCODE Health has quickly gained recognition for its unique capabilities to address unmet needs in the genomics space through a massive genomics database that interprets DNA samples to identify relevant disease markers.

The practice and adoption of genomic medicine is accelerating as technologies improve, costs fall and new insights drive better patient care. While many companies are supporting this emerging field, a select few are providing the unique perspectives and capabilities to advance progress even faster.

NextCODE Health made headlines less than a year ago with the announcement of its launch and funding by major investors in healthcare and biotechnology. The company quickly gained recognition for its unique capabilities to address unmet needs in the genomics space through a massive genomics database that interprets DNA samples to identify relevant disease markers. (See the features in Xconomy, Bio-IT World and PLOS Blog.) The company was later mentioned in Nature Biotechnology News for its potential contributions to genome studies by leveraging key reference data from deCODE’s Icelandic work in Iceland.

Its rapid trajectory since launch and the utility of its genomic analysis technology was featured in BioCentury in May, featuring testimonials from clinicians using NextCODE capabilities to diagnose patients at Boston Children’s Hospital, the Baylor College of Medicine, and the Sanford School of Medicine. In June, it was featured in a major interview with Bio-IT World and the company continues to expand. Since then, NextCODE has announced several programs through which global pioneers in clinical genomics research are applying its interpretation and analysis technology to support research and diagnosis in rare diseases, including:

As more organizations employ genomics in major research initiatives, NextCODE’s interpretation technology will be an increasingly important asset in delivering meaningful insights from the wealth of genomic data being produced. Visit NextCode for the latest on how the future of genomics-based medicine continues to evolve.