NEW YORK – A team led by researchers at Weill Cornell Medicine has developed a machine learning algorithm that can identify patients likely infected with SARS-CoV-2 based on routine lab tests.
Detailed in a paper published last month in Clinical Chemistry, the method could prove useful as a tool for prioritizing patients for molecular SARS-CoV-2 testing, especially in situations where testing resources are limited or turnaround times are lengthy, said Sarina Yang, first author on the study and an assistant professor of pathology and laboratory medicine at Weill Cornell.
Yang, who is also assistant director of the hospital's central laboratory, noted that Weill Cornell Medicine is typically able to return SARS-CoV-2 test results within 24 hours, but that turnaround times at other hospitals can be in the two to three day range, while for facilities without onsite molecular testing capabilities, getting results back can take even longer.
Routine lab blood tests, on the other hand, take only a few hours, which led Yang and her colleagues to explore whether those tests might be useful in identifying patients likely to be infected with SARS-CoV-2.
"When patients come to the [emergency department] and the doctor orders several panels of routine lab [tests] and also the [SARS-CoV-2] RT-PCR test, generally the routine test results come back in a couple of hours," she said. "So we thought it could be useful to use the routine labs to predict whether the RT-PCR results would be positive or negative to improve the triage process."
Fei Wang, senior author on the study and assistant professor of healthcare policy and research at Weill Cornell Medicine, added that the algorithm could also help compensate for false negative results, which by some estimates could be as a high as 30 percent for molecular SARS-CoV-2 testing.
"We are thinking that those potentially false negative patients may demonstrate a different routine lab test profile that might be more similar to those that test positive," he said. "So it offers us a chance to capture those patients who are false negatives."
Much of Wang's research focuses on the use of machine learning and artificial intelligence in addressing various medical and healthcare questions, for instance, using data from electronic medical records to identify patients at risk of various conditions or applying machine learning to medical imaging.
"This is a general topic of interest to my lab," he said.
In the study, the researchers developed a model based on results from 27 routine lab tests along with patient age, sex, and race, using a cohort of 3,356 patients tested for SARS-CoV-2 with RT-PCR at New York Presbyterian Hospital/Weill Cornell Medicine between March 11 and April 29, 2020. Of the patients in the cohort, 1,402 were RT-PCR positive and 1,954 were negative.
The 27 tests were chosen from a total of 685 different routine tests ordered for the patients in the cohort. Criteria for selection included a significant difference in test results between PCR positive and negative patients and results for a test being available for at least 30 percent of the patients in the cohort.
Using a gradient-boosted decision tree approach, the Weill Cornell Medicine team developed a model that identified patient's SARS-CoV-2 infection status with 76 percent sensitivity and 81 percent specificity. Looking at emergency department patients, which comprised 54 percent of the cohort, the model performed with 80 percent sensitivity and 83 percent specificity.
The researchers also validated the model in an independent set of patients seen at New York Presbyterian Hospital/Lower Manhattan Hospital during the same time period. In that cohort of 496 RT-PCR positive and 968 negative patients, the model performed with 74 percent specificity at the same 76 percent sensitivity achieved in the initial cohort.
The model also showed potential utility for identifying positive patients with initial false negative results. For instance, 32 patients initially tested negative for SARS-CoV-2 but tested positive upon repeat testing within the next two days. The Weill Cornell Medicine model flagged 21 of those patients as likely positive for the virus.
Wang said that he and his colleagues are currently working to improve the robustness of the model, looking to test it across a variety of geographies and conditions.
"Our model in the paper was built on data from when New York was at its COVID peak," he said. "At that time, we were not doing wide PCR testing, and the patients who were getting tested were pretty sick."
Back then, the hospital was seeing test positivity rates in the 40 percent to 50 percent range, Wang said, noting that currently, the positivity rate is in the 2 percent to 3 percent range.
"So you can imagine the population risk has drifted a lot," he said.
He added that different geographies would also have different characteristics that could influence the model's performance.
"This model we built in a population in New York in a certain time period, so we can't guarantee that it will work well universally," he said.
Wang said that the researchers have continued to analyze patient data from the New York City area to see how the model responds to changing conditions there but said that they have yet to get patient data from other geographies.
Another key question, particularly with the flu and cold season coming, is how specific the model is to SARS-CoV-2 compared to other respiratory viruses.
"We are trying to see if using the model, we can separate COVID from other respiratory viruses," Yang said, though she noted that patient data that could be used to address this question is limited, given that Weill Cornell Medicine largely stopped running respiratory panels in March when the pandemic hit the US. She added, though, that it has recently resumed running the panels for some high-risk patients, particularly those who are immunosuppressed.