WuXi NextCODE Takes on Cancer: Breakthroughs and Innovation in Sequencing using TCGA and AI

Hannes Smarason NextCODE TCGA cancer

Sequencing reads of a sample prepared by the traditional whole-genome sequencing workflow for fresh-frozen samples and data generated using the SeqPlus whole-genome FFPE method. The green and purple indicate reads sequenced in the forward and reverse directions, respectively, and yellow represents bases with non-reference sequence. The center of the image shows a C to A mutation in each of the tumor samples.

Cancer is one of the most active fields in genomics, spurring mountains of research papers and scores of clinical trials. WuXi NextCODE (WXNC) is committed to pushing this field forward and so we had a special “Genomes for Breakfast” session devoted to this topic at the recent ASHG17 event. Featured talks addressed our pathbreaking work in how to extract impactful findings from the renowned TCGA dataset; get better sequencing results from FFPE samples; and apply deep learning to drug discovery, drug repurposing, and identifying subtypes for diagnostics and clinical trials.

The Cancer Genome Atlas (TCGA) is one of the most useful public genomic cancer databases available and has already led to numerous critical discoveries, including entirely new drug targets as well as better insights into tumor origination, development, and spread. It includes data from approximately 11,000 patients and covers 33 cancer types. Data types include WES, RNA-Seq, mi RNA, CNV, Methylation array, and clinical sample data. The data is big and complex, and can include multiple samples from one patient, which is crucial to know when doing analyses.

During his ASHG talk, Jim Lund, WXNC’s Director of Tumor Product Development, shared some insights into how we put this rich data source to work in concert with our own unique data and analytical tools, in a process he dubs “multiomics analysis.” He described how we specially process the data and use our unique analytical platform to help scientists find just what they are looking for. Researchers can search the data by cancer type, age of diagnosis, sex, ethnicity, year of diagnosis, sample type (e.g. metastatic, new primary), and more.

Multiple pivotal studies using this dataset have already been published, including some examining the prevalence of specific mutations across human cancer types as well as in-depth profiling of specific tumors, such as breast cancer and lung adenocarcinoma. Layering different types of data, such as reads from DNA and RNA, allows much more accurate detection of features such as variants with allele-specific effects on gene expression. The user-friendly but sophisticated data interface makes it easier to see such findings. Over the years, our own database and our capabilities have both grown exponentially, creating a powerful tool for multiomics cancer research. You can see Jim putting the portal through its paces in a recent webinar.

In his talk, Shannon Bailey described how Whole Genome Sequencing (WGS) can be applied to formalin-fixed paraffin-embedded (FFPE) tumor samples, which are stored by the hundreds of thousands in repositories around the world. Shannon is the Associate Director of our Cancer Genetics division. He pointed out that while these samples are abundant and often paired with extensive clinical and outcome data, there are specific hurdles to using these for the type of large-scale retrospective studies many groups are eager to carry out.

For one thing the genetic material in such samples can be degraded, crosslinked, or in low quantities. Of all these problems, the biggest issue is getting sufficient quantity of quality DNA for sequencing. Numerous studies have found that these types of samples are difficult to work with and often provide very low success rates for gene sequencing studies. Clearly, fresh frozen samples provide much better results, but they are also much harder to obtain.

In response, our team has developed the WXNC SeqPlus FFPE extraction method, which provides substantially improved coverage compared to traditional methods and even approximates the results obtained with fresh frozen samples at 10X depth, with similar numbers of heterozygous and homozygous calls.

We tested SeqPlus in a study that comprised 516 tumor-normal pairs (i.e., 1,032 samples) that had been stored for 3 to 6 years. The targeted sequencing depth was 30X for the normal tissue and 70X for tumor tissue. The starting amount of DNA was 400 ng. The results were excellent, with SeqPlus delivering a coverage analysis just about 1% below what the fresh frozen control samples achieved. Further, a comparison of our analyses to results from the TCGA, using fresh frozen samples, showed striking similarity. These study results give us confidence that SeqPlus is a new “power tool” for FFPE sequencing studies. This webinar describes the process.

Sequencing reads of a sample prepared by the traditional whole-genome sequencing workflow for fresh-frozen samples and data generated using the SeqPlus whole-genome FFPE method. The green and purple indicate reads sequenced in the forward and reverse directions, respectively, and yellow represents bases with non-reference sequence. The center of the image shows a C to A mutation in each of the tumor samples.

Another area of great interest at WXNC is artificial intelligence (AI). We have been pioneers in AI for pulling novel insights out of massive multiple datasets. Leading this effort is Tom Chittenden, our Vice President of Statistical Sciences, Founding Director of the Advanced AI Research Labs, and a Lecturer on Pediatrics and Biological Engineering at Harvard Medical School and MIT. He also spoke at the breakfast series.

Our AI capabilities improve the tools we have and expand their capabilities. For example, using our AI tools, we can improve functional annotation of missense variants to an accuracy of >99%, integrate multiple types of data to discover new genes and elaborate pathways, and improve tumor subtype and drug-response classification accuracy by combining DNA- and RNA-seq, among other data types. These tools can be used for such varied purposes as target discovery, drug repurposing, and defining responders and non-responders in clinical trials.

We’ve already helped to develop breakthrough results, such as identifying an intriguing new target for both cardiovascular and cancer drug discovery. We’ve also classified breast and lung cancer subtypes with 97% to 100% accuracy, classified 8,200 tumors of 22 TCGA cancer types with >99% accuracy, and discovered a completely novel pan-cancer molecular survival signature.

The power of our deepCODE AI tools is in part thanks to a novel, causal statistical-learning method and deep-learning classification strategy. But another advantage is that they were built on our global platform for genomic data, which underpins the majority of the world’s largest genomics efforts and includes all major global reference databases. Our database stores, manages, and integrates any type of genomic data and correlates it with phenotype, ‘omics’, biology, outcome, and virtually any other type of data that may be relevant to a particular medical challenge.

If you want to know more, I recently gave an interview to WXpress outlining WXNC’s AI strategy. As we continue to deepen our commitment to this field, I’m sure we’ll have more exciting results to share.

email

News Flash: Drawing a “Molecular Portrait” of Mutations in Brain Disease

WuXi NextCODE‘s AI group is helping to advance cutting-edge applications across the breadth of our platform and with partners across the life sciences. Recently, they put some of their toolkit to work supporting exciting work by our colleagues at Boston Children’s Hospital and Harvard Medical School. Together, they have generated sequence data of unprecedented accuracy from single neurons, and we’ve been able to help with the analysis and the discovery of some very compelling mechanisms underlying neurodegenerative disease. Kudos to the BCH and HMS teams and to our AI group on this latest collaborative publication. That report is described below and on our new WuXi NextCODE blog.

WuXi NextCODE AI Team Helps to Draw Molecular Portrait of How Somatic Mutations May Contribute to Neurodegenerative Disease

  • Boston Children’s Hospital and Harvard Medical School-led study in Science leverages WXNC expertise in feature selection and pathway enrichment
  • Study shows how individual neurons accumulate mutations over time and how this process differs between normally aging people and those with early-onset disease

A study published yesterday provides the most direct and detailed picture to date of how single-letter mutations accumulate in the sequence of the DNA of neurons as we age, and how different this process looks in neurologically healthy individuals as well as those with early-onset neurodegenerative disease. Entitled “Aging and neurodegeneration are associated with increased mutations in single human neurons,” the study is published in the online edition of Science.

Led by scientists from Boston Children’s Hospital, Harvard Medical School, MIT, and the Howard Hughes Medical Institute, the study analyzed sequence data from 161 single neurons taken postmortem from 15 neurologically normal people, ranged in age from four months to 82 years, and nine individuals with early-onset neurodegenerative diseases, Cockayne syndrome and Xeroderma pigmentosum. A press release from Boston Children’s on the study and its impact is available here.

At a first level, this study utilizes important advances by the authors in techniques for accurately sequencing and reading mutations in the DNA of individual neuronal cells, a hurdle that has until now prevented directly testing the theory that such somatic mutations built up in neurons over time. With this data, the lead scientists were then able for the first time to observe directly in a substantial dataset the patterns of accumulation of somatic mutations in individual neurons in relation to age, region of the brain (in the prefrontal cortex and hippocampus), and disease state. From this they developed broad signatures for these three different types of variation.

The scientists’ next question was whether they could further tease apart the associational signature for early-onset disease to discover something further about the biological processes that were contributing to neurodegeneration. For that task, they called upon the expertise of their longtime collaborators at WuXi NextCODE’s Advanced Artificial Intelligence Laboratory. Tom Chittenden, WXNC’s vice president of statistical sciences, and Chandri Yandava and Pengwei Yang, senior bioinformatician and senior computational statistician, respectively, are co-authors on the study. They used techniques developed in our AI and deep-learning program to identify the most informative mutations from the vast original datasets, to map mutations onto the most informative genes, and to identify the biological pathways those genes are involved in.

“This extraordinary group, including Chris Walsh and Mike Lodato as well as their talented teams, has enabled us to take another step forward and to see better than ever before the progressive mutational burden in individual neurons,” said Tom Chittenden. “We’ve used our toolkit and functional enrichment models to identify the pathways being most impacted by these mutations. This has pointed the group to the importance of oxidative mutations affecting DNA repair and, particularly, in genes that are heavily transcribed.”

“What Tom’s group has done is helped us to model how, as the somatic mutation burden increases, the brain loses function. What we see is that the more genes are transcribed, the more likely they are to be damaged and lose function,” said Mike Lodato of Boston Children’s Hospital and Harvard Medical School, one of the six first authors on the paper. “At the same time, because genes interact through these pathways, linear increases in the number of mutations appears to lead to exponential loss of brain function. It is essentially a scenario of use it and lose it.”

The study authors note that the identification of these pathways and the apparently important role of oxidative mutations points to potential novel therapeutic approaches for neurodegenerative diseases. This study also paves the way for the group’s next challenge: to take these discoveries in severe early-onset neurodegenerative disease and apply them to improve our understanding of the mechanisms and pathways involved in other related conditions, including Alzheimer’s disease.

Tom Chittenden says this is a challenge that is going to call on his full arsenal of AI and deep-learning capabilities. “To address Alzheimer’s disease, we are looking not only at early-onset disease but at subtler phenotypes around mild cognitive impairment. We are going to have to bring in not just sequence data but also methylation data, mRNA, and many other data types. The results we are presenting today are a step in the right direction, however—going from association to causal inference models to identify dysregulated pathways involved in disease. This is how AI is going to help to provide novel understanding of disease and progression.”

 

 

Domain Expertise: Jumpstarting Artificial Intelligence in Biomedicine

Is artificial intelligence the “single most transformative technology in modern history?” That’s the view of Tom Chittenden, who leads WuXiNextCODE’s AI program. And Tom is not alone in his enthusiasm, as numerous analysts are predicting this technology will be one of the fastest growing fields in the world.

In recent talks at Boston’s BioIT World and the EmTech conference in Hong Kong, Tom described some of the strides we’ve been making with our DeepCODE AI tools. Their power is in part thanks to a novel, causal statistical-learning method and deep-learning classification strategy. But another advantage is that they were built on—and are extending the reach of—our global platform for genomic data. That means that Tom’s team has that rare combination of both of the key ingredients to AI making an impact in biomedicine: cutting-edge algorithms AND deep domain expertise and access to the biggest datasets.

Tom—who also holds appointments at Harvard, MIT, and Boston Children’s Hospital—and his growing team have the former in spades; our platform and expertise in genomics provide a key edge in the latter. Our platform has been built over more than 20 years and today underpins the majority of the world’s largest genomics efforts and includes all major global reference databases. It stores, manages, and integrates any type of genomic data and correlates it with phenotype, ‘omics’, biology, outcome, and virtually any other type of data that may be relevant to a particular medical challenge.

That means that we can routinely train and test our AI tools on some of the most comprehensive data sets in the world, such as that in The Cancer Genome Atlas (TCGA). “Today we can take ‘omics data and clinical information and map those to curated resources such as SNOMED CT and biomedical ontologies, and then use AI to identify patterns that lead us to novel findings,” Tom says.

This is a powerful approach to tease out which of hundreds of genetic variants are really involved in a particular disease, based on which ones are actually associated with aberrant expression pathways. You may find hundreds of genetic mutations in a single type of breast cancer tumor, for example, but it is determining which ones are drivers of the disease that matters.

Put simply, AI can lead us to both better diagnoses and easier discovery of more and better drug targets, by taking a range of genomic data and marrying it to clinical information and scientific knowledge. AI is not just going to better match patients to the right drugs, it is going to help further our understanding of the relationships between genes and complex molecular signaling networks, one of the most challenging arenas in our field and the most sought-after starting point for discovering validated pathways and targets.

Valuable insights in real-world medical challenges are already emerging from this AI effort uniquely developed on and applied to the genomic and medical data that counts.

WuXi NextCODE  recently presented preliminary data from analyses using our novel AI technology to diagnose subtypes of tumors. Our DeepCODE tools were validated on six patient-derived tumor xenografts from mouse models, and then tested against approximately 8,200 human tumors from a collection of 22 cancer types in The National Cancer Institute’s TCGA collection. That study included five ‘omics data types. We achieved 98% accuracy overall, and our analyses of human breast and lung cancer subtypes were accurate in 96% and 99% of cases, respectively. This points to an improvement over current methods for matching patients to treatments for their particular cancer, and we have refined that accuracy further still. This capability is also going to be central to the development of liquid biopsies.

http://hannessmarason.com/blog/2017/04/04/a-perfect-pairing-ai-and-precision-medicine/

In another oncology study, using the same multi-omics data, DeepCODE identified a signal predictive of survival across 21 cancers, pointing to novel and holistic pathways for developing broad oncotherapies.

A recent study published in Nature, meanwhile, describes a potential new role for a well-known growth factor. That report, led by Yale University scientist Michael Simons, looked at blood vessel growth regulation—a crucial process in some very common conditions, including cardiovascular disease and cancer. Our Shanghai team provided RNA sequencing for this study. Our Cambridge AI team drove some of the key insights pointing to novel disease mechanisms.

Simons’ team studied knockout mice, whose fibroblast growth factor (FGF) receptor genes were turned off. They proved, for the first time, that FGFs have a key role in blood vessel growth, uncovering some metabolic processes that were “a complete surprise,” according to scientists on the team. Further, they mapped out pathways that could help provide new drug leads.

http://hannessmarason.com/blog/2017/05/15/bringing-artificial-intelligence-cardiovascular-medicine-cancer-genomics-action/

Our AI team is just getting started. We’re looking forward to many more intriguing findings from this group as they leverage their expertise and massive amounts of the relevant data to improve medicine and healthcare.

Bringing Artificial Intelligence to Cardiovascular Medicine and Cancer: Genomics in Action

WuXi NextCODE Nature

A Yale research team, with contributions from WuXi NextCODE’s artificial intelligence (AI) and sequencing teams, has discovered a novel mechanism regulating how blood vessels grow.

Artificial Intelligence (AI) can already catch a criminal and identify the right patients for certain types of surgery. But those challenges involve relatively few parameters compared to number of parameters or features involved in linking the 3 billion bases in the human genome with other ‘omics data and all the complexity of human biology. For that very reason, the promise of AI in genomics is as necessary as it is enticing, and WuXi NextCODE is committed to pushing the frontier of this emerging field.

This week, I am encouraged by results from a study published in the latest edition of Nature, which describes how a well-known growth factor may play a previously unknown role in some important diseases. That report, led by Yale University scientist Michael Simons, investigates blood vessel growth regulation—a crucial process in some very common conditions, including cardiovascular disease and cancer. Our Shanghai team provided RNA sequencing for this study. Our Cambridge AI team applied some of the most advanced statistic in their toolset to take the data analysis to the next level.

Simons’ team studied knock out mice, whose fibroblast growth factor (FGF) receptor genes were turned off. The scientists were able to prove, for the first time, that FGFs have a key role in blood vessel growth, uncovering some metabolic processes that were “a complete surprise,” according to scientists on the team. Further, they mapped out pathways that could help provide new drug leads.

It’s inspiring to see scientists from around the world using top-notch technology to collaborate on pivotal research questions. This study involved scientists in six different countries.

This FGF study also comes on the heels of our recent announcement about how our deepCODE approach classified 27 different tumor types with greater than 95% accuracy when applied across approximately 9,000 human tumors from The Cancer Genome Atlas (TCGA) collection. [LINK: https://www.wuxinextcode.com/highlights/posters-at-aacr/#/brief–using-ai]

With the rapid rate of progress, it’s not surprising that AI is finding success in genomics. Today’s informatics capabilities allow for assimilating larger and larger datasets with AI applications, and the field is evolving at a rapid pace. Google alone published more than 200 papers on AI in 2016. Like us, they use a deep learning approach.

From facial recognition to genomic solutions
Each AI problem has a different scale. In facial recognition, AI applications analyze relatively few features in the human face (about two dozen). Digital scans of the human eye that use AI techniques are able to segment patients before eye surgeries, and this entails algorithms that consider hundreds of features.

Genomics, of course, involves looking at any number of feature sets among billions of possibilities. It’s an immense challenge, but I think it’s perfectly suited to AI. And with deep-learning tools, we can fish out many more insights than with traditional analyses.

Our goal is to see how AI can help researchers achieve better results in identifying and evaluating new medicines, pinpointing risk factors and disease drivers, finding new combination therapies that work better than single drugs, and more. Our deepCODE tools comprise a novel, multinomial statistical-learning method and deep learning classification strategy. It’s an advanced approach to AI.

This week’s Nature paper is another encouraging sign.

Many of the stickiest problems in medicine are longstanding. The role of FGFs in blood vessel development was poorly understood until now. This group’s findings may help open new avenues of research.

Our team is always seeking to tackle problems with the latest approaches and technologies. Now, in the age of big data, it makes sense to start letting computers do more and more of the work, even some of the actual thinking. Certainly, we pick the questions and frame them. But then, let’s load the data and let the machines help us find the answers. If we can polish this process, and apply it to a growing number of problems, new answers and insights are sure to come.

A Perfect Pairing? AI and Precision Medicine

AI-and-precision-medicineLet’s start with one of the fastest-growing fields in science today: artificial intelligence (AI). Now, let’s apply another technology that has profound potential for improving patient care:  precision medicine. Some of us think the integration of these two arenas could be a “sweet spot” that leads to some of the decade’s biggest advances in healthcare.

As someone who has worked in genomics for two decades, I am a believer in the combined power of AI and precision medicine. And in my current work, I have the pleasure of pioneering both technologies.

Cancer has been one of the early beachheads for precision medicine, Now, AI is also following that path, with the aim of advancing individualized treatment.

For example, just today, WuXi NextCODE presented preliminary data from analyses using our novel AI technology at the American Association of Cancer Research annual meeting in Washington D.C. We tested the accuracy of our new deepCODE deep learning tools to diagnose subtypes of tumors. Our results suggest these tools do a better job than traditional approaches for classifying tumors and helping determine which patients will respond to which drugs.  Our new AI technology can incorporate all types of omic data, and can also help with drug discovery and finding the best uses for drugs.

How can AI technologies achieve better results in identifying precision treatments in cancer and other diseases?  In the case of our new deepCODE tools, it is in part thanks to a novel, multinomial statistical-learning method and deep learning classification strategy. This approach is designed to support dramatic improvements in drug discovery and development, as well as medical care. But we need to prove the technology’s potential by testing it on real problems in genomic medicine. So, that’s what we are doing.

The initial results are promising. Our deepCODE tools were validated on six patient-derived tumor xenografts from mouse models, and then tested against approximately 9,000 human tumors from a collection of 27 types in The National Cancer Institute’s Cancer Genome Atlas (TCGA) collection. (https://cancergenome.nih.gov/)  We achieved 95% accuracy overall in this test. In analyses of human breast and lung cancer subtypes, deepCODE was accurate in 96% and 99% of cases, respectively. That study included DNA- and RNA-seq data.

These findings are very encouraging.  Breast and lung cancer are both very common malignancies that are increasingly being “divided” into subtypes that have significantly different outcomes and need different treatment regimens. These preliminary data are by no means definitive, but they suggest that AI could bring new certainty to cancer diagnosis.

But why is it even so important to get a fast, accurate molecular diagnosis of a tumor?

Well, here’s the challenge: Today patients who have suspected cancers are typically biopsied.  A snip of the tumor is examined under a microscope and then may be tested for common biological receptors. It can take a while for that to occur. Next, the patient undergoes treatment, and whatever drugs they receive could actually change the tumor’s biology: After that, the drugs initially prescribed might not be the best option anymore.

So how can we know when to switch treatments, and what to switch to?

In the ideal world, anyone diagnosed with cancer would be followed up with an extensive molecular biopsy. In other words, once the initial diagnosis is made, the patient would undergo follow-up tests that involve relatively painless blood draws. From these blood samples (liquid biopsies), the tumor’s DNA would be read, and that would determine how to best monitor and prescribe for that particular patient going forward.

It is an exciting time to be working on integrating AI technology with the primary tools for improving precision medicine in cancer and other diseases.  We’re just at the start of this journey, and we’ll likely find many other ways that AI technology can impact patient healthcare.

Join us here as we follow this intriguing program’s progress.