Domain Expertise: Jumpstarting Artificial Intelligence in Biomedicine

Is artificial intelligence the “single most transformative technology in modern history?” That’s the view of Tom Chittenden, who leads WuXiNextCODE’s AI program. And Tom is not alone in his enthusiasm, as numerous analysts are predicting this technology will be one of the fastest growing fields in the world.

In recent talks at Boston’s BioIT World and the EmTech conference in Hong Kong, Tom described some of the strides we’ve been making with our DeepCODE AI tools. Their power is in part thanks to a novel, causal statistical-learning method and deep-learning classification strategy. But another advantage is that they were built on—and are extending the reach of—our global platform for genomic data. That means that Tom’s team has that rare combination of both of the key ingredients to AI making an impact in biomedicine: cutting-edge algorithms AND deep domain expertise and access to the biggest datasets.

Tom—who also holds appointments at Harvard, MIT, and Boston Children’s Hospital—and his growing team have the former in spades; our platform and expertise in genomics provide a key edge in the latter. Our platform has been built over more than 20 years and today underpins the majority of the world’s largest genomics efforts and includes all major global reference databases. It stores, manages, and integrates any type of genomic data and correlates it with phenotype, ‘omics’, biology, outcome, and virtually any other type of data that may be relevant to a particular medical challenge.

That means that we can routinely train and test our AI tools on some of the most comprehensive data sets in the world, such as that in The Cancer Genome Atlas (TCGA). “Today we can take ‘omics data and clinical information and map those to curated resources such as SNOMED CT and biomedical ontologies, and then use AI to identify patterns that lead us to novel findings,” Tom says.

This is a powerful approach to tease out which of hundreds of genetic variants are really involved in a particular disease, based on which ones are actually associated with aberrant expression pathways. You may find hundreds of genetic mutations in a single type of breast cancer tumor, for example, but it is determining which ones are drivers of the disease that matters.

Put simply, AI can lead us to both better diagnoses and easier discovery of more and better drug targets, by taking a range of genomic data and marrying it to clinical information and scientific knowledge. AI is not just going to better match patients to the right drugs, it is going to help further our understanding of the relationships between genes and complex molecular signaling networks, one of the most challenging arenas in our field and the most sought-after starting point for discovering validated pathways and targets.

Valuable insights in real-world medical challenges are already emerging from this AI effort uniquely developed on and applied to the genomic and medical data that counts.

WuXi NextCODE  recently presented preliminary data from analyses using our novel AI technology to diagnose subtypes of tumors. Our DeepCODE tools were validated on six patient-derived tumor xenografts from mouse models, and then tested against approximately 8,200 human tumors from a collection of 22 cancer types in The National Cancer Institute’s TCGA collection. That study included five ‘omics data types. We achieved 98% accuracy overall, and our analyses of human breast and lung cancer subtypes were accurate in 96% and 99% of cases, respectively. This points to an improvement over current methods for matching patients to treatments for their particular cancer, and we have refined that accuracy further still. This capability is also going to be central to the development of liquid biopsies.

In another oncology study, using the same multi-omics data, DeepCODE identified a signal predictive of survival across 21 cancers, pointing to novel and holistic pathways for developing broad oncotherapies.

A recent study published in Nature, meanwhile, describes a potential new role for a well-known growth factor. That report, led by Yale University scientist Michael Simons, looked at blood vessel growth regulation—a crucial process in some very common conditions, including cardiovascular disease and cancer. Our Shanghai team provided RNA sequencing for this study. Our Cambridge AI team drove some of the key insights pointing to novel disease mechanisms.

Simons’ team studied knockout mice, whose fibroblast growth factor (FGF) receptor genes were turned off. They proved, for the first time, that FGFs have a key role in blood vessel growth, uncovering some metabolic processes that were “a complete surprise,” according to scientists on the team. Further, they mapped out pathways that could help provide new drug leads.

Our AI team is just getting started. We’re looking forward to many more intriguing findings from this group as they leverage their expertise and massive amounts of the relevant data to improve medicine and healthcare.


As Cancer Databases Grow, A Global Platform Leaps the Big Data Hurdle

cancer databases

As massive cancer databases like The Cancer Genome Atlas (TCGA) proliferate and expand worldwide, WuXi NextCODE expects to see—and to drive—a boom in discoveries of cancer biomarkers that will advance our ability to treat cancer and improve outcomes for patients.

One of the fastest-growing areas in medicine today is the creation of massive cancer databases. Their aim is to provide the scale of data required to unravel the complexity and heterogeneity of cancer—the key to getting patients more precise diagnoses faster, and to getting them the best treatments for their particular disease.

In short, this data has the potential to save lives.

Such databases are not new, but they are now proliferating and expanding at an unprecedented pace. Driven by governments, hospitals, and pharmaceutical companies, they catalogue a growing range of genetic data and biomarkers together with clinical information about their effects on disease, therapy, and outcomes.

Only with such data can we answer the key questions: Does a certain marker suggest that a cancer will be especially aggressive? Does it signal that the tumor responds best to particular treatments? Are there new pathways involved in particular cancers that we can target to develop new drugs?

It’s the cutting edge of oncology, but to be powered to answer these questions, these databases have to be very, very big. They have to bring together whole-genome sequence data on patients and their tumors as well as a host of other ‘omics and biological data. One of the biggest challenges to realizing this potential is to manage and analyze datasets of that scale around the world. It’s one we are addressing in a unique manner through our global platform.

One of the most renowned and widely used of these is The Cancer Genome Atlas, a collaboration between the National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI). TCGA data is freely available to those who qualify, and there is a lot of it. It already comprises 2.5 petabytes of data describing tumor tissue and matched normal tissues for 33 tumor types from more than 11,000 patients. Researchers all over the world can apply to use this data for their own studies, and many have.

Yet asking questions of TCGA alone can take months for most groups and requires sophisticated tools. At Boston’s recent Bio-IT World conference, WuXi NextCODE’s director of tumor product development, Jim Lund, explained how we have put TCGA on our global platform—providing a turnkey solution with integrated analytics to transform the data into valuable findings.

Jim and his team have imported into WuXi NextCODE’s cloud platform virtually all key TCGA data: raw whole exome sequence data from patients and tumors, as well as variant calls using MuTect2 and Varscan2; RNA and microRNA sequence and expression data; and data on copy number variation, methylation arrays, and some 150 different clinical attributes. But this data isn’t just hosted in the cloud: it can all now be queried directly and at high speed online, enabling researchers to quickly ask and answer highly complex questions without having to download any data or provide their own bioinformatics software.

To demonstrate the power of this approach, Jim’s team decided to run the same queries in a recent published study that looked at sequence data from the exons of 173 genes in 2,433 primary breast tumors (Pereira et al., Nature 2016). They were specifically looking for driver mutations of cancer’s spread and growth. In a matter of minutes, rather than months, they were able to replicate key mutations identified in the study. That analysis was then extended to all cancer genes, and additional driver genes were found. More important, because they were able to correlate these mutations with clinical outcomes data, they were also able to begin systematically matching specific mutation patterns to patient outcomes.

Next, Jim’s team looked at the genomics of lung adenocarcinoma, the leading cause of death from cancer worldwide. Following up on the findings in another published study (Collison et al., Nature 2014), they profiled the 230 samples examined in the paper and immediately made several observations. Eighteen genes were mutated in a significant number of samples; EGFR mutations (which are well known) were more common in samples from women; and RBM10 mutations were more common in samples from men. These results were extended to 613 samples and shown to be robust. But because they had a wide range of data including mRNA, microRNA, DNA sequencing, and methylation, Jim’s team was further able to suggest some actual biological processes that may be fueling the origin and growth of lung adenocarcinomas.

What’s making this type of research possible? It’s our global platform for genomic data. The platform spans everything required to make the genome useful for helping patients around the world, from CLIA/CAP sequencing to the world’s most widely used system for organizing, mining, and sharing large genomic datasets. At its heart is our database—the Genomically Ordered Relational database (GORdb). Because it references sequence data according to its position on the genome, it makes queries of tens of thousands of samples computationally efficient, enabling the fast, online mining of vast datasets stored in multiple locations.

That’s how we are making the TCGA—and every major reference dataset in the world—available and directly minable by any researcher using our platform. Those users can combine all that data with their own to conduct original research at massive scale.

These breast and lung cancer studies are just two of more than a thousand that have been carried out so far on TCGA data. As more such datasets become available, we expect to see—and to drive—a boom in discoveries of cancer markers that will advance our ability to treat cancer and improve outcomes for patients. For those who want to go further still, our proprietary DeepCODE AI tools offer a means of layering in even more datasets to drive insights even deeper into the biology of cancer and other diseases. And that’s a topic I’ll return to in the weeks ahead.

Genomics in Cancer: Continuing to Push the Leading Edge

genomics in cancer - hannes smarason

Genomics is helping to prevent and treat cancer at an accelerating rate, supporting the goal of oncologists to dramatically improve cancer patient outcomes.

The progress in the use of genomics to help prevent and treat cancer continues to grow at a pace that is impressive. Indeed, there is expanded use of genomics to drive patient care and improve outcomes across an ever-expanding number of cancers by a growing number of oncologists.

Genomic Knowledge Can Clearly Drive Better Care

Applying genomics to cancer treatment is a powerful clinical application, as genomics can provide a window into how to best treat a patient’s particular cancer as it:

  1. may help better understand the genetics of the tumor itself, and
  2. can provide insight into how cancerous tumors may grow and spread over time.

With a genomic-based approach to cancer care, oncologists can more personally tailor anti-cancer treatments to an individual tumor’s mutations, thus molecularly targeting the specific cancer’s Achilles heel. Already, there are well-documented successes of molecularly targeted anti-cancer agents, such as cancer drugs that target certain genes—HER2, EFGR, ALK, and others.

In 2015, the pace of adoption of genomics in clinical oncology has advanced significantly. Recent evidence of the accelerating use of genomics to help fight cancer includes:

  • Evolving from ‘why’ to ‘how’ to use genomics at leading cancer centers. At the top cancer care facilities, genomics has become part of the programmatic approach to provide certain cancer patients with optimal care—care that is fundamentally designed to lead to the best outcomes. The question for leading medical centers globally has evolved over the last few years from “do we need genomics?” to “for which cancer types and at what stages of cancer treatment and diagnosis can we best use genomic sequencing and analysis?”—an evolution from “why?” to “how?” at a very fundamental level. The accelerating use and deployment of genomics by leading medical facilities validates that they are deriving significant value from genomics, and that value is resulting ultimately in meaningfully advancing better care for cancer patients.
  • Expanding potential applications of genomics within different types of cancers, broadening the types of cancers and tumors that can potentially benefit from genomics. Researchers and clinicians continue to publish a wealth of information validating the potential of genomics to improve outcomes in certain types of cancer patients. In 2015 alone, highlights of these advancements include certain prostate cancers, brain cancers, rare types of pediatric kidney cancers, and even potential targets in certain non-small cell lung cancers.
  • Broadening acceptance in cancer prevention. Driven in part by the education of oncologists and physicians generally and in part by the empowerment of knowledgeable patients, people are seeking and benefiting from genetic tests that reveal their personal risk for certain tumors (such as BRCA for breast or ovarian cancers). The idea of using genomic analysis to predict an individual’s cancer risk by comparing their genome with databases of confirmed genetic mutations linked to disease is—for certain individuals with specific family histories and genetics—driving appropriate medical decisions for patients who may be at high risk for certain cancers.
  • Powering clinical trials with genomics. The use of genomics in cancer clinical trials – whether for inclusion in data-gathering or even screening of patients—has gone from rare to commonplace over recent years, and is improving knowledge around the safety and efficacy of drugs in cancer and beyond. Two large-scale cancer trials have been initiated in 2015 with the bold goal of substantially advancing the understanding and use of genomics in cancer care. The anti-cancer treatments being tested in both trials were selected for their activity on a specific molecular target, independent of tumor location and histology. The two trials are actively enrolling and are (1) an American Society of Clinical Oncology (ASCO)-sponsored study, called TAPUR (Targeted Agent and Profiling Utilization Registry) and National Cancer Institute (NCI) and is called NCI-MATCH (Molecular Analysis for Therapy Choice). These trials and any subsequent follow-on trials will doubtless provide insightful information to drive the growing use of genomics in improving cancer care.

In summary, genomics is helping to prevent and treat cancer at an accelerating rate, supporting the goal of oncologists to dramatically improve cancer patient outcomes. There are at least four frontiers where we can see substantial progress in the use of genomics in cancer care, including expanded use in leading medical centers, increased potential applications within cancer, widespread acceptance in cancer prevention, and an increase in the use of genomics within clinical trials. I am personally committed to continue to drive and accelerate this genomic revolution to continue to bring true progress in improving cancer care to patients in need globally.

2015: An Inflection Point for Genomics Adoption Around the Globe

2015 genomics hannes smarason

2015 is shaping up to be a significant year in the advancement and adoption of genome sequencing and personalized medicine around the globe.

The year 2015 is shaping up to be an inflection point in the advancement and adoption of genome sequencing and personalized medicine.  While private initiatives are often the centerpiece of media coverage, leading governments clearly have advanced a number of important initiatives this year.  Indeed, many governments around the globe are actively promoting widespread utilization of genomics, supporting academic research, establishing industry guidelines, and raising public awareness.

Governments Serving as Catalysts for Genomics Progress

The efforts of officials worldwide to engage with and support the private sector’s tremendous potential have helped to make 2015 a significant year for expanding the use of genomics in clinical care.  A few highlights of 2015 include:

— In the U.S., President Obama made precision health one of the centerpieces of his State of the Union address in January. Obama’s administration kicked this effort off by requesting a $215M investment in a Precision Medicine Initiative with the following key attributes:

  • The cornerstone of Obama’s proposal is the plan to collect and analyze genomic data from a million or more volunteers;
  • The initiative further supports genomics through expanded research into the genetic mutations that drive cancer;
  • Additional funding is earmarked to maintain databases and develop industry standards.

— Germany and the U.K. expanded eligibility for government-funded genetic testing for breast cancer patients.

— Israel announced its intent to establish a government-sponsored genetic database.

— Through the National Institutes of Health and the National Cancer Institute, the U.S. federal government proposed dozens of new funding opportunities to support research in genetic sequencing and analysis.

— Japan launched an Initiative on Rare and Undiagnosed Diseases to provide genomic analysis and expert consultation for up to 1,000 individuals with childhood onset of undiagnosed conditions.

— Through Genomics England (which I described in further detail here), the U.K. Department of Health tapped WuXi NextCODE and others to begin interpretation in its groundbreaking 100,000 Genomes Project.

In news today, the trend toward globalization of genomics continues, as private sector leaders aligned to meet the needs of the forward-looking government health initiatives of Qatar:

— WuXi NextCODE and the Sidra Medical and Research Center partner to power population genomics and precision medicine in Qatar. Our partnership will:

  • Facilitate clinical diagnostics;
  •  Accelerate research; and
  • Support the Qatar Genome Project.

As I have discussed in an earlier post, large-scale population studies are an essential step in harnessing the power of genomics to improve health worldwide.  Since WuXi NextCODE’s foundational heritage as part of deCODE Genetics’ landmark analysis of Icelanders, we have always developed the tools to help translate sequence data into precision medicine on a large scale.  In our work with Genomics England, our collaboration with Fudan Children’s Hospital to diagnose rare diseases in China, and now our partnership with Sidra, the team at WuXi NextCODE is leading the effort to realize the potential of genomics on a truly global scale. The increasing interest in supporting those efforts shown by leading governments across the globe is helping to drive the successful use and application of genomics worldwide.