Genome Data Interpretation: How to Ease the Bottleneck

Bloomberg NextCODE Hannes Smarason

Bloomberg BNA Business’ “Diagnostic Testing & Emerging Technologies,” highlights how NextCODE is providing a qualitatively different way to store and analyze genomic information to meet growing opportunities in personalized medicine.

With advances in sequencing technology and reduced costs, more and more data are generated every day on the genetic basis of disease. The challenge has become how to derive meaningful information from these mountains of data.

While various systems have been established in recent years to store the large amounts of genomic data from patients’ DNA, a remaining obstacle is to “break the bottleneck” so that researchers can process the vast data in multiple human genomes in order to identify and isolate a small, useful piece of information about disease. Conventional databases and algorithms have not been able to efficiently and reliably identify subset information among the millions of genetic markers in order to inform clinical decisions. This has become a major data management roadblock.

The key is to find new approaches for databases and algorithms that accommodate the unique ways that genomic information is analyzed and interpreted. As discussed in Bloomberg BNA, Diagnostic Testing & Emerging Technologies, NextCODE is already easing this bottleneck by providing a qualitatively different way to store and analyze genomic information and apply it to meet the growing opportunities for personalized medicine.

NextCODE’s Genomically Ordered Relational (or GOR) database infrastructure is a truly different way of storing this huge amount of data. The principle is very simple: rather than store sequence and reference data in vast unwieldy files, it ties data directly to its specific genomic position. As a result, the algorithms are vastly more efficient compared to a traditional relational database because they can isolate by location in the genome. That makes analysis faster, more powerful, and radically more efficient, both in terms of clinicians’ and researchers’ time, as well as computer infrastructure, I/O, and CPU usage.

This holistic approach applies broadly to the priorities of genome scientists around the world, helping them eliminate the data management bottleneck to identify more culprits to many inherited diseases, more quickly and cost effectively.

Read more about NextCODE’s work here.

Trends in Sequencing and Analysis Today Leading to Tomorrow’s Clinical Advances

The insights we’re gaining from sequencing and analysis techniques are delivering new advances in healthcare with ever greater speed and precision.

The challenge for programs seeking to accelerate their research discoveries with genomic data is how to analyze the wealth of information—to make it clinically relevant and rapidly deliver reliable insights to better inform patient care.

The insights we’re gaining from sequencing and analysis techniques are delivering new advances in healthcare with ever greater speed and precision. It’s a particularly exciting time to be a part of this evolving industry, with continual opportunities for new clinical applications of these technologies and platforms.

Companies like Illumina and others who are delivering next-generation sequencing technologies are gaining global exposure. New partnerships and programs are placing these advanced techniques into the hands of the world’s leading clinicians and researchers, who are then applying them to some of today’s greatest medical challenges.  Recently, plans to integrate sequencing technologies have been announced by world renowned organizations like the Baylor College of Medicine in the U.S., Genomics England, and Sidra Medical and Research Center in Qatar.

The challenge for these and other programs seeking to accelerate their research discoveries with genomic data is how to analyze this wealth of information – to make it clinically relevant and rapidly deliver reliable insights to better inform patient care.

NextCODE Health is working to advance this piece of the puzzle with its Genomically Ordered Relational (GOR) database and its clinical and discovery interfaces (the Clinical Sequence Analyzer​™ and Sequence Miner™).  Combining next-generation sequencing techniques with increasingly robust analysis tools, NextCODE Health is helping to accelerate global research progress today to deliver unprecedented advances in patient care in the years just ahead.

Genomics and Rare Diseases: Hope for Solving Unanswered Questions

genomics and rare diseases

Leading institutions around the world are leveraging the power of advanced sequencing technology to solve some of the greatest unanswered questions in medicine.

As we learn more about disease biology and uncover new insights thanks to the availability of genomic technologies, we are making meaningful progress in identifying means to address many rare diseases for which there is little medical hope today.

With these new genomic tools and insights, a wide range of opportunities has emerged to improve diagnosis and treatment of rare diseases. Over the past few years, DNA sequencing has begun to uncover the causes of rare diseases and, at the heart of each case solved is a patient and a family that has gained new understanding about their condition. With time, these success stories in diagnosis will lead to more successes in treatment.

Now more than ever, there is more hope that identifying the key mutations will lead to better understanding of the biology of disease and then to novel therapies. Better and faster technologies are being promoted by leaders in the field of genomics that are enabling much more rapid analysis and interpretation of a patient’s genome to find answers. The critical first step is to obtain sufficient data to analyze, compare it against a robust database of reference data, and gain an accurate understanding of potential mutations associated with these rare conditions.

As researchers focus on specific areas, new partnerships are extending access to data and accelerating progress with rare diseases around the world. Recently, genomic analysis collaborations were initiated by ACoRD at University College Dublin to implement NextCODE’s proprietary database and analytical tools to mine whole genome data for variants linked to autism spectrum disorders. [See blog post here]. Another genomic analysis program with ANZAC in Australia applies advanced sequencing analysis technology to better understand X-linked Charcot-Marie-Tooth Syndrome, a rare and progressively debilitating neurodegenerative disorder. [See blog post here] More collaborations are in the works and we’ll be talking about them as soon as we can.

We look forward to the results of these and other collaborations as leading institutions around the world make efforts to leverage the power of advanced sequencing technology to solve some of the greatest unanswered questions in medicine.

A Standard Database Architecture Will Build a Stronger Foundation for Genome Discoveries

big data genome sequencing hannes smarason

The general adoption of the Genomically-Ordered Relational database (GOR) as a data standard for storing genomic data may greatly accelerate the spread of sequencing and its effectiveness as a tool for advancing medicine.

It is widely accepted that the ability to share the analysis and insights from DNA sequencing will be a key driver of discovery and innovation. But one current limitation to extending this knowledge is that sequencing and analysis platforms, as well as samples, are often proprietary to and stored at different institutions. Perhaps more important, the structures and formats in which genomic data has customarily been stored—the relational databases developed by the likes of IBM and Oracle—make it unwieldy to analyze as the amount of data grows, and very difficult to share. The upshot is that institutions cannot easily share and consolidate information to generate more robust analyses and clinically relevant insights. This presents a serious hurdle to discovery both in rare disorders, where samples need to be gathered in order to generated adequate analytical power, and in complex ones, where truly massive studies can tease apart different facets of disease and reveal their causes.

Over the past decade, a novel and comprehensive database model has been developed to solve this bottleneck, offering a flexible and fast means to overcome these problems. It is called the Genomically-Ordered Relational database, or GOR, and was designed to manage and query the detailed genomic data amassed by deCODE genetics in Iceland – the world’s first and still by far largest and most comprehensive population-based genomic database.

The thinking behind the GOR is as simple as it is revolutionary. Genomic data is a sort of big data but one with an important difference: It is divided up in distinct packets—the chromosomes—and then arranged within each chromosome in linear fashion. The GOR makes use of this by storing and querying sequence data according to its unique position in the genome, rather than as huge files as long as the sequence. This radically reduces the data burden of querying even large numbers of whole genomes, at the same time making it possible to store and visualize instantly the raw sequence underlying an analysis.

In practice, the GOR thereby enables researchers to home in on specific variants without having first to call up entire patient genomes, and separates raw data from annotations to focus in on only the most relevant search components. It’s these types of functions and features that can be consistently applied across data storing systems to allow for more multi-institutional, collaborative research and consistency in outcomes worldwide.

Leaders in the genomic research community are now beginning to create coalitions and working groups to underpin and coordinate the adoption of standards for sharing genomic data. As these groups create flexible and efficient policy frameworks, the GOR is tested and ready to support the fundamental data requirements of global data sharing and the acceleration of discoveries in genome-based medicine. The general adoption of the GOR as a data standard for storing genomic data may greatly accelerate the spread of sequencing and its effectiveness as a tool for advancing medicine around the world.

Genomics-Based Medicine Coming Into View

NextCODE Health

NextCODE Health has quickly gained recognition for its unique capabilities to address unmet needs in the genomics space through a massive genomics database that interprets DNA samples to identify relevant disease markers.

The practice and adoption of genomic medicine is accelerating as technologies improve, costs fall and new insights drive better patient care. While many companies are supporting this emerging field, a select few are providing the unique perspectives and capabilities to advance progress even faster.

NextCODE Health made headlines less than a year ago with the announcement of its launch and funding by major investors in healthcare and biotechnology. The company quickly gained recognition for its unique capabilities to address unmet needs in the genomics space through a massive genomics database that interprets DNA samples to identify relevant disease markers. (See the features in Xconomy, Bio-IT World and PLOS Blog.) The company was later mentioned in Nature Biotechnology News for its potential contributions to genome studies by leveraging key reference data from deCODE’s Icelandic work in Iceland.

Its rapid trajectory since launch and the utility of its genomic analysis technology was featured in BioCentury in May, featuring testimonials from clinicians using NextCODE capabilities to diagnose patients at Boston Children’s Hospital, the Baylor College of Medicine, and the Sanford School of Medicine. In June, it was featured in a major interview with Bio-IT World and the company continues to expand. Since then, NextCODE has announced several programs through which global pioneers in clinical genomics research are applying its interpretation and analysis technology to support research and diagnosis in rare diseases, including:

As more organizations employ genomics in major research initiatives, NextCODE’s interpretation technology will be an increasingly important asset in delivering meaningful insights from the wealth of genomic data being produced. Visit NextCode for the latest on how the future of genomics-based medicine continues to evolve.

Pioneering Genome Sequencing Effort in England Aims to Shape the Future of Global Medicine

£300 million in new investments for Genomics England

Genomics England 100,000 Genomes Project

Genomics England was set up by the UK Department of Health to deliver the 100,000 Genomes Project. Initially the focus will be on rare disease, cancer, and infectious disease. The project is currently in its pilot phase and will be completed by the end of 2017.

These are exciting times for large-scale sequencing projects. Last week, U.K. Prime Minister David Cameron announced over £300 million ($509.4 million) in new investments for Genomics England, which aims to sequence, analyze, and store the genomes of 100,000 UK National Health Service (NHS) patients by 2017. The investments include about £162 million ($275.1 million) from Illumina Inc. (NASDAQ:ILMN), the partner for the sequencing element of the project. In turn, Genomics England will pay Illumina about £78 million ($132.4 million) for its services.

At the same time, the Wellcome Trust will put £27 million ($45.8 million) into a new sequencing hub at its genome campus in Cambridge; the Medical Research Council, or MRC, is investing £24 million ($40.7 million) to support data analysis and interpretation, and the NHS will make £20 million ($34 million) available for the establishment of patient sequencing centers.

This is a prime example of how the implementation of sequencing technologies promises to drive a revolution in the structure of medical research. These new projects aim to capture more data on human DNA than ever before, with the goal of advancing care and solving healthcare challenges.

The 100,000 Genomes Project, developed by the NHS, has the potential to significantly influence the global community through its plans to integrate sequencing data into standard medical practice.

Genomics England plans to generate 100,000 whole genome sequences from NHS patients with cancer, rare diseases, and other conditions, and to share the resulting data for research and development purposes. In the early phases, the program will also seek to develop standards for consent, sample storage, data generation and variant analysis that may be useful for many other organizations conducting large-scale projects within public health systems.

The project is enlisting the help of organizations from around the world to undertake this significant effort. In fact, it recently selected Illumina to conduct the sequencing efforts and is evaluating technologies for storing, annotating, and interpreting the data so that it can be used  for both clinical diagnostics and drug discovery, development, and delivery to the right patients.

The challenges of analyzing data on such a large scale are formidable, but the end result carries great potential to address some of the significant unmet medical needs. NextCODE’s technology has already accomplished analytics on this scale based on its work with the Icelandic population through deCODE genetics. It’s an exciting prospect for advancing the future of genomics-driven medicine and one to watch.

Seeking Genomic Answers to Autism and Rare, Idiopathic Diseases

rare-diseases-hannes-smarasonAs more is learned about autism spectrum disorders, more questions seem to arise. Yet with DNA sequencing, researchers are able to investigate the genetic roots of this and other diseases that are not yet well understood. It’s another instance in which genomics can shed light upon the workings of that most important organ system—the brain—which is so difficult to analyze.

Institutions around the world have sought to fill in pieces of the autism puzzle with links to other disorders and diagnostic insights, and these efforts have in recent years uncovered a number of possible genetic triggers and pathways. Yet the causes and manifestation of these diseases remain largely elusive.

University College Dublin’s Academic Centre on Rare Diseases (ACoRD) in Ireland, which is world renowned for its discoveries in rare genetics, is using NextCODE’s genome analysis technology to power large-scale, sequencing-based diagnostics programs and genome discovery efforts to study autism and rare pediatric disorders.

Recognizing the enormous potential of large-scale sequencing to mine whole genomes and accelerate discoveries in rare genetic diseases, ACoRD will focus on some of the most challenging areas to inform and provide new directions for research that may help lead to diagnosis, treatment, and even prevention for these disorders. In using NextCODE technology both for analyzing as well as storing large-scale genomic data, ACoRD is well positioned to become a focal point for multinational research and clinical diagnosis in conditions that require the gathering and collective analysis of genomes from many participants in many countries.

Rare Disease Research Focuses Charcot-Marie-Tooth Syndrome, Guided by DNA Sequencing

rare diseases nextCODE hannes smarasonGenome sequencing is a relatively young technology and has been in active use in the research space for just over a decade. Yet already it has found very meaningful applications in clinical care, supporting the world’s leading researchers in discovering answers to some of the most rare and confounding diseases. The interface between the research and clinical realms is seeing some of the most exciting and fruitful applications of the power of sequencing. The ANZAC Research Institute in Sydney, Australia sits right at this nexus and is using the latest DNA sequencing and interpretation technology from NextCODE to mine genomes in search of genetic mutations that are associated with X-linked Charcot-Marie-Tooth syndrome (CMTX). CMTX is a rare, progressively debilitating neurodegenerative disorder that can be caused by mutations in many different places in the genome, including the X chromosome. At present there is no cure or drug treatment available. The team at the ANZAC Research Institute, recognized for their expertise in familial genetics, sought out the unique capabilities of the NextCODE analysis platform to investigate spaces outside the normal coding areas of genes. The aim is as pioneering as the technology: to identify not just just single SNPs but also structural variants that conventional approaches have not been able to search for systematically and link to CMTX. With dedicated research minds and the latest technology, the program aims to better understand this disease and potentially find novel targets for the development of therapies. This is one great example of the many opportunities to improve lives that are being generated by insights gained through the rapidly evolving field of genome sequencing.

Four Factors for Improving Genomic Data for Personalized Medicine

advancing the use of genomic data for personalized medicine

The pace of progress has been astounding with advances in the use of genomic information to provide faster, more accurate, and more in-depth information to enable personalized patient care.

We’ve come a long way in improving the way that a patient’s genome sequence data is analyzed and interpreted to realize the full potential of personalized medicine. Here are four factors helping to overcome barriers and achieve new milestones for using genomic data to provide faster, more accurate, and more in-depth information to guide clinicians in delivering personalized care for patients.

Factor #1: Fast database query of the genome

Problem: Relational database architectures make it possible to store large quantities of sequencing data, but querying whole genome data can be time-consuming and take days to weeks.

Solution: The GOR (Genomic Ordered Relations) database is able to query whole-genome sequences in real time. The reason is that GOR understands the genome in terms of chromosomes, its natural structure, rather than as a continuous string of sequence. That’s both intuitive and innovative. When searching for a variant, tools in the GOR architecture don’t have to scan each individual’s entire sequence; they retrieve the variant straight from its location. Annotation data – information on what diseases or conditions variants have been linked to – are also stored in the same way. The GOR database was pioneered a decade ago by deCODE genetics, one of the first organizations to manage truly large genetic datasets, and is now being used by NextCODE for clinical applications of genomic data.

Factor #2: Fast, reliable identification of disease-causing variants

Problem: Many sequencing analysis pipelines are only powered to process data in a compressed format called Variant Call Format (VCF) files. These comprise only a tiny fraction of the genome, and being working only with VCF files makes it difficult to correct common alignment and allele-calling errors. That can result in both false positive and negative results, or to missing the key causative variants altogether.

Solution: The foundation for improved sensitivity and specificity is the ability to use VCF data on top of the raw sequence data from which it was derived. NextCODE’s pipeline and clinical interfaces, powered by GOR, give users the ability to go back to and visualize raw sequence data at a click. This approach enables genomic analysis and interpretation by seeking out disease-causing genetic variants, either in specific patients, or for research studies in a clinical setting.

Factor #3: Patient genomic information at the fingertips of the clinician

Problem: Many of today’s genomic interpretation tools are too complex and difficult to use by clinicians who may have minimal experience with genetic informatics tools.

Solution: All of the complex informatics required by a clinical analysis tool should disappear at the fingertips of a clinician. It starts by having a robust foundation to the informatics platform, and using the GOR database architecture enables rapid cycling between personal sequence data and broad clinical knowledge. The result is the Clinical Sequence Analyzer (CSA) in which clinicians can simply type in a patient’s symptoms, and CSA will search the patient’s whole genome for variants that may be relevant.

Factor #4: Applying the full power of whole-genome sequencing to cancer tumor analysis

Problem: Many of today’s approaches to the analysis of cancer genomes only look at the immediate next step for a course of treatment, an important capability but only part of a holistic view of a the genetic profile of a patient’s cancer and what can be done to fight it.

Solution: The Tumor Mutation Analyzer makes a more holistic approach possible, analyzing a whole exome or whole genome sequence from a patient’s own genome and from tumor cells. Comparing the two it is possible to isolate the variants likely to be cancer drivers. The distinguishing feature of TMA is the depth of the data it stores and the unprecedented level of detail it provides to more accurately identify variations. This level of detail is especially important in cancer genetics, where the chances of finding previously unknown variants are very high, and even if a mutation is successfully targeted with a course of treatment, another potential driver is often waiting in the wings.

The pace of progress has been astounding with advances in the use of genomic information for patient care. How will the path continue in the future? Stay tuned.


 

Early Adopters of Sequencing in the Clinic

early adopters of sequencing in the clinic

Leaders in the medical community are actively enhancing their facilities with DNA sequencers and supercomputers—steps toward the routine sequencing of patient genomes that will inform the full spectrum of care decisions.

It is increasingly evident that sequencing and analyzing genomic information can contribute to more informed healthcare decisions, and major research institutions and medical centers around the world seem to agree.

Leaders in the medical community are actively enhancing their facilities with DNA sequencers and supercomputers, recognizing the efficiencies of having this advanced technology at their disposal for innovative research programs. And as they look to the future, they are taking steps toward the routine sequencing of patient genomes that will inform the full spectrum of care decisions, from defining risk, to diagnosing disease, to defining the ideal course of treatment for the best possible outcome.

Just a few examples of the major advances in the use of sequencing technologies that have been announced recently…

From medical centers:

  • Mount Sinai Medical Center in New York initiated a program in which 24,000 patients participate in a biobank to include their DNA sequence and research over their lifetimes. The program, called BioMe™, is among the largest in the United States.
  • Memorial Sloan-Kettering Cancer Center researchers are active in a range of collaborations that seek to understand the molecular changes that characterize cancer, the largest of which is The Cancer Genome Atlas (TCGA), a project jointly funded by the NCI and the National Human Genome Research Institute. MSK currently houses one of TCGA’s Genome Data Analysis Centers.
  • Phoenix Children’s Hospital launched a new molecular and personalized medicine research institute that will “bring genomics research to the forefront of pediatrics.” The infrastructure will include a range of capabilities, such as a biospecimen repository, DNA sequencing and analysis, and a CLIA lab for genomic profiling.

And research institutions:

  • The Wellcome Trust Sanger Institute is dramatically upgrading its storage and data management capacity.  The Institute already operates 30 DNA sequencers, each of which generates roughly a terabyte of data every day. New upgrades will double their capacity and improve data management and organization software.
  • Harvard Medical School’s Center for Biomedical Informatics, conducts informatics research with a strong emphasis on translational science informed by innovative computational strategies; the research staff use mathematical modeling to predict when genetic information could lead to more effective treatment.

By members of industry:

  • Google is jumping into the genomics industry with the launch of “Calico,” a new company that will focus on genomic sequencing and advanced analytics to identify solutions for some of the most challenging diseases today.
  • “N-of-One” is a company offering personalized cancer treatment strategies as a new employee benefit tool for innovative, health-minded employers. Through the service, the company provides interpretation of molecular profiling to employees, their family members fighting cancer and their physicians to help inform treatment decisions.

And even the U.S. government:

  • The National Institutes of Health is one of the greatest proponents of genomic sequencing for research purposes. In fact, a recently initiated program is funding research teams to examine whether sequencing newborn genomes or exomes may provide useful information beyond what is currently captured in newborn screening programs.
  • Further, in the fight against infectious diseases and “super-bugs,” the National Institute of Allergy and Infectious Diseases established the Genomic Sequencing Centers for Infectious Diseases (GSCID) to sequencing priority pathogens, microorganisms responsible for emerging and re-emerging infectious diseases and related organisms.

With such a broad array of innovative research underway within the halls of the world’s leading institutions, there is no doubt sequencing is on the verge of delivering exciting breakthroughs in medicine. In fact, we’re seeing evidence of this with NextCODE, which has engaged with several “early adopter” organizations around the globe.  Check it out here.