Rice University computer scientist Todd Treangen has received a C3.ai Digital Transformation Institute (C3.ai DTI) Award for computational biology research applying artificial intelligence (AI) models to COVID-19 mitigation. Treangen is developing novel bioinformatics algorithms and driving comparative genomic analyses to determine how SARS COV-2 is changing over time.
C3.ai is a research consortium of universities and technology companies funding scientists in a coordinated effort to curb current and future pandemics. After a rigorous peer review process of more than 200 proposals from the world’s leading scientists, 26 projects were awarded more than $5.4 million in cash. Those favored included multidisciplinary and multi-institution projects taking a novel approach to their research. Scientists will also be given access to massive sets of unified coronavirus data, cloud, software and supercomputing resources from the National Center for Supercomputing Applications (NCSA).
The accepted proposal "Mining Diagnostic Sequences for SARS-CoV-2 Using Variation-Aware, Graph-Based Machine Learning Approaches Applied to SARS-CoV-1, SARS-CoV-2, and MERS Datasets” is a collaboration with researchers from the University of Illinois at Urbana-Champaign (UIUC). They include Nancy Amato, Computer Science Department Head and Professor of Engineering as well as Lawrence Rauchwerger, Professor of Computer Science. Rice and UIUC’s award is $225,000 to be used over a 12-month period. Rice will receive $75,000 of that grant.
Treangen’s research will focus on studying viral mutations within a single patient to see what they reveal. As he explains, "When a person is infected with a virion, infected cells are coerced into allowing the virus to rapidly begin copying itself within the host. A single infected cell can produce several hundred thousand copies of itself. From there, SARS-CoV-2 population is established containing a vast number of SARS-CoV-2 infected cells". The higher the population of cells, the higher the host’s viral load. Furthermore, as these cells replicate, they also can mutate.
"We want to understand what’s going on behind the scenes within a single COVID-19 positive patient,” Treangen said. To that end, his research approach will be unique because it focuses on the entire population of SARS-CoV-2 viruses rather than just what’s known as the “consensus genome” that can be thought of as an amalgamation of all of the SARS-CoV-2 copies within a single person. Treangen describes this as "an under-explored avenue for what might be driving biological differences of COVID-19 across different hosts.”
A question that Treangen seeks to answer is, “What is the combination of host and viral factors that are driving the biology behind COVID-19?” After exploring the intra-host comparative genomic analyses of SARS-CoV-2, he and his team will then compare findings to and conduct the same analyses on SARS-CoV-1 and MERS-CoV to determine how they differ.
While viruses change over time, most COVID-19 testing relies on exact matches of common signatures of SARS-CoV-2 for detection. Unfortunately, patients get false negative results sometimes when those tests don’t exactly match the regions targeted by tests due to changes in SAR-CoV-2. If two patients share unique genome mutations, that new information could determine whether they had direct transmission and help track and prevent additional spread. Research to expand the basic understanding of the virus’ biology could address high false negative rates, improve testing and potentially have an impact on the development of vaccines.
One of the computational challenges of the project, according to Treangen, is the vast amount of data involved. In March, there were only a few hundred SARS-CoV-2 genomes, but soon, there will be over 100,000 genomes and sequencing datasets to investigate. His research team is looking to combine machine learning with novel bioinformatic methods that scale up to the vast number of incoming SARS-CoV-2 genomes and provide efficient intra- and inter-host comparisons. These tools will allow researchers to do a deep dive into the intra-host population and track inter-host transmission.
Given the urgency of the pandemic, all results, science and findings from C3.ai awarded projects will be open-sourced and entered into the public domain. That way, other scientists can also build upon Treangen’s team’s discoveries.