Genome Data Interpretation: How to Ease the Bottleneck

Bloomberg NextCODE Hannes Smarason

Bloomberg BNA Business’ “Diagnostic Testing & Emerging Technologies,” highlights how NextCODE is providing a qualitatively different way to store and analyze genomic information to meet growing opportunities in personalized medicine.

With advances in sequencing technology and reduced costs, more and more data are generated every day on the genetic basis of disease. The challenge has become how to derive meaningful information from these mountains of data.

While various systems have been established in recent years to store the large amounts of genomic data from patients’ DNA, a remaining obstacle is to “break the bottleneck” so that researchers can process the vast data in multiple human genomes in order to identify and isolate a small, useful piece of information about disease. Conventional databases and algorithms have not been able to efficiently and reliably identify subset information among the millions of genetic markers in order to inform clinical decisions. This has become a major data management roadblock.

The key is to find new approaches for databases and algorithms that accommodate the unique ways that genomic information is analyzed and interpreted. As discussed in Bloomberg BNA, Diagnostic Testing & Emerging Technologies, NextCODE is already easing this bottleneck by providing a qualitatively different way to store and analyze genomic information and apply it to meet the growing opportunities for personalized medicine.

NextCODE’s Genomically Ordered Relational (or GOR) database infrastructure is a truly different way of storing this huge amount of data. The principle is very simple: rather than store sequence and reference data in vast unwieldy files, it ties data directly to its specific genomic position. As a result, the algorithms are vastly more efficient compared to a traditional relational database because they can isolate by location in the genome. That makes analysis faster, more powerful, and radically more efficient, both in terms of clinicians’ and researchers’ time, as well as computer infrastructure, I/O, and CPU usage.

This holistic approach applies broadly to the priorities of genome scientists around the world, helping them eliminate the data management bottleneck to identify more culprits to many inherited diseases, more quickly and cost effectively.

Read more about NextCODE’s work here.


Four Factors for Improving Genomic Data for Personalized Medicine

advancing the use of genomic data for personalized medicine

The pace of progress has been astounding with advances in the use of genomic information to provide faster, more accurate, and more in-depth information to enable personalized patient care.

We’ve come a long way in improving the way that a patient’s genome sequence data is analyzed and interpreted to realize the full potential of personalized medicine. Here are four factors helping to overcome barriers and achieve new milestones for using genomic data to provide faster, more accurate, and more in-depth information to guide clinicians in delivering personalized care for patients.

Factor #1: Fast database query of the genome

Problem: Relational database architectures make it possible to store large quantities of sequencing data, but querying whole genome data can be time-consuming and take days to weeks.

Solution: The GOR (Genomic Ordered Relations) database is able to query whole-genome sequences in real time. The reason is that GOR understands the genome in terms of chromosomes, its natural structure, rather than as a continuous string of sequence. That’s both intuitive and innovative. When searching for a variant, tools in the GOR architecture don’t have to scan each individual’s entire sequence; they retrieve the variant straight from its location. Annotation data – information on what diseases or conditions variants have been linked to – are also stored in the same way. The GOR database was pioneered a decade ago by deCODE genetics, one of the first organizations to manage truly large genetic datasets, and is now being used by NextCODE for clinical applications of genomic data.

Factor #2: Fast, reliable identification of disease-causing variants

Problem: Many sequencing analysis pipelines are only powered to process data in a compressed format called Variant Call Format (VCF) files. These comprise only a tiny fraction of the genome, and being working only with VCF files makes it difficult to correct common alignment and allele-calling errors. That can result in both false positive and negative results, or to missing the key causative variants altogether.

Solution: The foundation for improved sensitivity and specificity is the ability to use VCF data on top of the raw sequence data from which it was derived. NextCODE’s pipeline and clinical interfaces, powered by GOR, give users the ability to go back to and visualize raw sequence data at a click. This approach enables genomic analysis and interpretation by seeking out disease-causing genetic variants, either in specific patients, or for research studies in a clinical setting.

Factor #3: Patient genomic information at the fingertips of the clinician

Problem: Many of today’s genomic interpretation tools are too complex and difficult to use by clinicians who may have minimal experience with genetic informatics tools.

Solution: All of the complex informatics required by a clinical analysis tool should disappear at the fingertips of a clinician. It starts by having a robust foundation to the informatics platform, and using the GOR database architecture enables rapid cycling between personal sequence data and broad clinical knowledge. The result is the Clinical Sequence Analyzer (CSA) in which clinicians can simply type in a patient’s symptoms, and CSA will search the patient’s whole genome for variants that may be relevant.

Factor #4: Applying the full power of whole-genome sequencing to cancer tumor analysis

Problem: Many of today’s approaches to the analysis of cancer genomes only look at the immediate next step for a course of treatment, an important capability but only part of a holistic view of a the genetic profile of a patient’s cancer and what can be done to fight it.

Solution: The Tumor Mutation Analyzer makes a more holistic approach possible, analyzing a whole exome or whole genome sequence from a patient’s own genome and from tumor cells. Comparing the two it is possible to isolate the variants likely to be cancer drivers. The distinguishing feature of TMA is the depth of the data it stores and the unprecedented level of detail it provides to more accurately identify variations. This level of detail is especially important in cancer genetics, where the chances of finding previously unknown variants are very high, and even if a mutation is successfully targeted with a course of treatment, another potential driver is often waiting in the wings.

The pace of progress has been astounding with advances in the use of genomic information for patient care. How will the path continue in the future? Stay tuned.


Personalized Medicine: The Future is Almost Here

The new era of personalized medicine.

The achievement of low-cost genome sequencing and the use of genomic data to better understand diseases are advancing the exciting new era of personalized medicine.

It’s been more than a decade since the human genome was first sequenced. Since then, we have been on the journey of applying this profound new discovery to create personalized medicine and advance human health.

Two significant triumphs along this human genome journey:

  • Using genomic data to better understand diseases; and
  • Achieving low-cost genome sequencing.

Each of these accomplishments has been a stepping stone into the exciting new era that is dawning now: where genomic information is becoming integrated into medical care.

Using Genomic Data to Better Understand Diseases

Let’s take a look back at the early days of using genomic data to connect the dots between genetic mutations and disease. From 1997-2004, I was part of the leadership team at deCODE, the Icelandic genomic company. This was the period when deCODE was building the world’s most productive human genomics platform, with a database of  tens of thousands of individuals who participated in genetic studies and including the largest database of genomes to this day. deCODE’s genomic engine was able to successfully identify the genetic variations associated with human disease. This resulted in dozens of groundbreaking discoveries that were published in major, peer-reviewed journals.

The legacy of deCODE was the creation of an industrialized platform capable of massive storage and analysis capabilities. This enabled researchers to crunch genomic data to gain insights about genetic variants, or risk factors, associated with many common diseases. deCODE’s premise was that once the genetics of disease was better understood that information could be used to create new ways to diagnose, treat and prevent disease. However, when I left deCODE in 2004, there were still barriers to overcome before this genomic information could be widely applied to the level of an individual patient. Chief among them was that the cost of genome sequencing was still prohibitively high. (deCODE was subsequently acquired by Amgen).

Achieving Low-Cost Genome Sequencing

Back in 2004, the cost to sequence a single human genome was hundreds of thousands of dollars. Today that cost is a few thousand dollars (and, in fact, fast approaching $1,000) for a whole genome sequence. DNA sequencing costs continue to fall, as speed and accuracy increase.

This means we are rapidly approaching a tipping point where, as the sequencing of human genomes becomes more economical, its adoption in the medical community becomes more widespread and genomic data can become more routine in medical care. This is why personalized medicine is becoming a reality.

The Era of Genome Sequencing in Medical Care

The steep drop in the costs of sequencing, combined with the explosion of research on gene variants and disease, mean the time is fast approaching when genome sequencing will become routine in medical care. Today, pathologists perform blood cultures to decide which antibiotics will stop a patient’s bacterial infection. Soon a patient sample can be taken to perform a genome sequencing to analyze the genetic characteristics of a patient to determine ways a disease can be prevented or, if they are sick, which treatments might work best for their disease.

The body of genomic knowledge and the large databank of human genomes built by pioneers like deCODE established the key building blocks that enable genome sequencing to have predictive power for individual patients. As more human genomes are sequenced and more genetic variants are associated with disease, the predictive power of knowing about risk genes and effective treatments for each patient – a.k.a. personalized medicine – will become an essential part of medical care.

Genome Sequencing Being Implemented by Medical Centers

In preparation for the future of personalized medicine, major medical centers in the U.S., Europe and Asia are actively beginning to install DNA sequencers and supercomputers as important tools for integrating genome sequencing into medical care. These medical centers are taking initial steps toward the routine sequencing of every patient’s genome to define the ideal course of prevention and treatment based on variants found in a patient’s genes.

Evidence of this adoption of genome sequencing by medical centers appeared in an article in The New York Times in April 2013 citing that:

  • Medical centers in New York City are spending more than $1 billion on new genomic research centers;
  • Several hospitals around the U.S. are undertaking systematic genome sequencing in patients;
  • Mount Sinai Medical Center has a program in which 24,000 patients participate in a biobank to include their DNA sequence and research over their lifetimes;
  • Memorial Sloan-Kettering Cancer Center sequenced 16,000 tumors from cancer patients in 2012; and
  • Phoenix Children’s Hospital opened a new institute in December 2012 to sequence the genomes of 30 percent of their childhood cancer patients.

For now, the use of whole genome sequencing in medical practice is still in its infancy, but the pace of progress continues to accelerate. Clearly, genome sequencing will soon become part of the nucleus of medical care. This will herald a new era in personalized medicine revolutionizing healthcare as we know it and transforming our lives. When do you think genome sequencing will become a part of the medical decisions in your life?