Skip to main content
Premium Trial:

Request an Annual Quote

Harvard Researchers Develop AI-Based Model for Cancer Dx, Prognosis

Premium

NEW YORK – Researchers from Harvard Medical School have developed a new artificial intelligence-based model for systematically diagnosing multiple cancer types and predicting patients' prognoses from digital pathology images.

Described in a paper published this week in Nature, the Clinical Histopathology Imaging Evaluation Foundation (CHIEF) model was developed to help clinicians evaluate pathology samples routinely collected from cancer patients, according to Kun-Hsing Yu, an assistant professor of biomedical informatics at Harvard and the lead author on the paper. The foundation model was developed using two different types of pretraining: self-supervised machine learning that used 15 million unlabeled pathology image tiles cropped from whole-slide images to learn tile-level microscopic features, and additional pretraining on the more than 60,000 whole-slide images to understand the context of the whole tissue.

The self-supervised pretraining required the model to identify repeating signals in the pathology images to determine what a typical pathology sample would look like and establish a general framework for cancer-related microscopic features, Yu said.

Meanwhile, the weakly supervised training then required the model to pick out those features in much larger, more complex, and higher resolution whole-slide images and characterize the similarities and differences between cancer types. The model continued to update its "knowledge about what typical pathology manifestations would look like across different cancer types and across samples collected from different hospitals," Yu added. By aggregating tissue signals from different regions within the same pathology sample, the model better understands visual context and can provide a "holistic evaluation for each patient," Yu said.

Foundation models are large, general-purpose AI models that can be tailored for different functions, Yu said, and the CHIEF model can be applied to a variety of different tasks, including cancer detection, tumor origin prediction, genomic profile identification, and survival prediction. Yu and his colleagues externally validated the model's ability to detect cancer with more than 13,000 whole-slide images, including those from public databases like the Clinical Proteomic Tumor Analysis Consortium and those from specific hospitals. The whole-slide images contained biopsy and surgical resection slides and encompassed 11 different primary cancer sites, such as the breast, skin, prostate, kidneys, and lungs, the researchers noted in the Nature paper.

They compared CHIEF to three other weakly supervised whole-slide image classification methods and CHIEF outperformed all three with an average area under the receiver operating characteristic curve (AUROC) of 0.94, 10 to 14 percent higher than the comparator methods. Yu noted that the model had near-perfect performance in many cancer types, such as colorectal and esophageal cancers, but that its performance in some cancers, such as kidney cancer, in some cohorts was lower, possibly due to the quality of the scanned slides and the different techniques used for collecting samples.

The team focused on common cancers because of the larger amount of data available to train and validate the model. As a result, the performance of CHIEF for detecting particularly rare cancers has not been validated. However, the researchers are considering approaches for diagnosing rarer cancers that rely on using fewer samples, Yu said.

Beyond cancer detection, the model was also tested on its ability to predict molecular profiles of cancer samples. The researchers focused on four specific tasks: systematic prediction of prevalent genetic mutations across cancer types, identification of mutations related to targeted therapies, isocitrate dehydrogenase status prediction for the classification of glioma, and microsatellite instability prediction for determining the benefits of immune checkpoint blockade in patients with colorectal cancer.

The team conducted the analysis using more than 13,000 whole-slide images across 30 types of cancer and 53 genes with the top-five highest mutation rates in each cancer type, the researchers noted in the paper. CHIEF was able to predict the mutation status of nine genes with AUROCs above 0.80. The researchers also used CHIEF to predict genes associated with US Food and Drug Administration-approved targeted therapies across 18 genes and 15 cancer types, and the model was able to predict the mutation status of all 18 genes with AUROCs above 0.60, with one as high as 0.96.

In addition, the researchers used CHIEF to establish stage-stratified survival prediction models, utilizing more than 9,000 whole-slide images in 17 datasets for seven cancer types. In all cancer types and study cohorts, CHIEF was able to distinguish patients with longer-term survival from those with shorter-term survival, the researchers wrote.

The team is also interested in investigating if the model can be used to predict patients who will respond to immunotherapies and predict potential side effects, Yu said.

Bias can be a significant problem in artificial intelligence models, which the Harvard team wanted to address. The researchers accounted for bias in their model by using data from multiple hospitals in multiple countries in pretraining and validation, and by evaluating CHIEF's performance without adjusting or fine-tuning the model when applied to different datasets, which ensures the model is not discriminatory or impacted by different types of sample preparation or collection procedures, Yu added. However, the researchers are investigating the model's performance in patients with different ancestry and have developed a method to address that potential bias that they plan to submit for publication soon, he said.

Now that CHIEF has been validated for multiple cancers and for multiple diagnostic tasks, Yu said the team is initiating a study to build clinical data for regulatory approval in the US. It also plans to partner with biotechnology companies to collect more samples and conduct further clinical trials, eventually packaging the model into a product or system that clinicians can use. The team is also focused on developing additional techniques to further improve the performance and reliability of CHIEF, Yu said.

Nigam Shah, a professor of medicine at Stanford University and chief data scientist for Stanford Health Care who was not involved in CHIEF's development, said that while foundation models have been around for a couple of years, what Yu's group has done is "pretty innovative."

However, he noted that evaluating AI models is not always the main issue when determining clinical utility. Instead, it is determining where they fit in a clinical workflow and the resource constraints of using them, as well as their sustainability.

To that end, other researchers, including a group from the University of Washington and Microsoft Research that published a paper in Nature in May, are also working on foundation models for digital pathology, and according to Shah, as more groups come up with foundation models or other AI-based tools and clinical use of these tools becomes more widespread, sustainability and reimbursement concerns will become an even bigger area of focus.

Addressing where the CHIEF model will fit in the clinical workflow, Yu said he sees the model "initially serving as an aid to pathologists, providing second opinions to support specialists in their final evaluations, thereby reducing the time needed to diagnose each case." However, in the long term CHIEF could evolve into an "autonomous pathology evaluation" system for routine cases, which would reduce pathologists' workload.

He added that resource and economic constraints "could actually facilitate the adoption of open-source models like CHIEF." As long as a hospital has digital pathology scanners and electronic medical record systems for pathology images, incorporating CHIEF or another similar model "would involve minimal costs, primarily for hiring one to two developers to integrate our tools into the existing medical record system and covering computer server expenses," he added. "These costs would be significantly lower than onboarding additional medical specialists and technicians to manage the growing clinical workload."

Yu also mentioned that his team is working on a separate project to enable low-cost digital pathology imaging for hospitals and clinics without standard pathology scanners.