Skip to main content
Premium Trial:

Request an Annual Quote

AI-Based Pathology Method Developed to Potentially Uncover Cancers of Unknown Primary Origin


CHICAGO – Computational pathologists at Brigham and Women's Hospital in Boston and the affiliated Harvard Medical School and the Broad Institute have developed an algorithm that they said reduces the need for genetic sequencing of tumors to help identify the potential original sources of cancers of unknown primary origin.

In an article published last week in Nature, the authors, led by Faisal Mahmood, who runs the pathology image analysis laboratory at Brigham and Women's, said that research into cancers of unknown primary origin lately has involved genomics and transcriptomics. But this can only work when genetic testing is performed, a rare occurrence in low-resource settings as well as even some affluent settings.

Further, they noted that the majority of cancer cases in which physicians cannot pinpoint a likely primary origin are treated with combination chemotherapies and generally have a poor prognosis.

The researchers thus created a deep-learning algorithm called Tumor Origin Assessment via Deep Learning, or TOAD, which they said can produce a differential diagnosis to identify a tumor as primary or metastatic and predict the site of origin by analyzing histology slides, which are routinely used for much cancer diagnostic work. They described TOAD as a high-throughput, interpretable, deep-learning-based software tool that analyzes hematoxylin and eosin whole-slide images to help identify the likely origin of primary tumors.

TOAD makes "reasonably accurate predictions," according to the authors, by assigning the top three or top five differential diagnoses to metastatic tumors. The top prediction from TOAD, they said, "can potentially be used to assign a primary differential" in areas where ancillary testing, high-resolution imaging, and pathologists are not widely available.

"The top three predictions can then be used to guide further tests, so you can use those predictions to order ancillary tests, reduce the sites that are sampled, and get to an answer for where the origin is much quicker and reduce the time it takes to make a diagnosis and improve the quality of the diagnostics in general," Mahmood said.

Metastatic cancers often are missed in lower-resource settings because clinicians lack access to advance imaging technologies, and "this [algorithm] can be used to assign the primary differential that can then be used for subsequent treatment," he said.

According to the Brigham and Women's team, TOAD can serve as an "assistive tool" for pathologists to evaluate difficult cases of metastatic cancers and unknown primary tumors which require multiple clinical and ancillary tests in order to narrow a differential diagnosis.

Mahmood noted that cancer therapeutic decisions tend to be based on the primary tumor. "Without knowing the primary tumor, the course of treatment for the patient might be quite unclear and they might also not be eligible for a lot of clinical trials because [they] often require that the primary is determined," he said.

The diagnostic process typically begins with biopsy at the location of the tumor, and the sample goes through pathology and immunohistochemistry analysis. If these steps cannot identify the source of the primary cancer, the patient might get additional biopsies, radiology imaging, and in a very small number of cases, next-generation sequencing assays to predict the origin of the cancer.

"What we have developed is an origin prediction classifier that uses conventional histology slides, regular H&E slides that have been the diagnostic standard for over a century," Mahmood said.

Notably, the predictive model incorporates what Mahmood called weakly supervised multitask learning that is trained on tens of thousands of multigigapixel images, with only H&E histology and the patient's sex as input. The study outlined in Nature analyzed whole-slide images averaging 32,537 gigapixels in resolution from both Brigham and Women's and public data repositories.

"We can use the data that has been collected for clinical purposes to train these models without having to manually annotate these large images," Mahmood explained. "That's weakly supervised because the link between the label and how much data there is in each file is weak."

The researchers said in the paper that the method "automatically locates regions in the slide that are of high diagnostic relevance and aggregates their information to make the final predictions."

According to Mahmood, deep-learning algorithms in pathology often have a programmer tell the computer where the tumor is, which requires all the slides to be annotated. Weakly supervised deep learning does not need someone to annotate the slides.

"You can use the whole-slide image and the slide-level label that was collected for diagnostic purposes, and you can harness that to train the model," he said.

Mahmood noted that earlier studies have shown how to predict genomic alterations in tumors. The paper cited a 2020 article published in Nature Cancer from German researchers.

He said that histology slides contain more information than pathologists typically need for diagnosing cancer, but some of that data has been impossible to get at without this kind of weakly supervised deep learning.

In part, the method used for the Nature study was enabled by another method called clustering-constrained-attention multiple-instance learning, or CLAM, for weakly supervised computational pathology that Mahmood and his team at Brigham and Women's described in a paper published in March in Nature Biomedical Engineering.

The cancer diagnostic method described in Nature essentially puts CLAM into practical use.

In the future, Mahmood expects TOAD to support sequencing data, he said.

John Tomaszewski, chair of pathology and anatomical sciences at the University of Buffalo Jacobs School of Medicine and Biomedical Sciences, called the Nature paper "important" because it not only mirrors the way pathologists think but also has an eye toward what should be the primary goal of finding the most effective therapies so patients can get better.

"This feels very much like the pathologist's approach to what is this tumor, where did it come from, and therefore, what's its likelihood of disease progression or how might it behave biologically," Tomaszewski said.

Tomaszewski, who co-chaired an online American Society for Investigative Pathology workshop last month, during which Mahmood presented some of his findings, said that the Brigham and Women's approach simplifies analysis by training computers to look for variations in colors within medical images, whether from histology slides or visualizations of genetic sequences.

"We've shown that one can model multigene arrays from pink and blue pixels," Tomaszewski said. Through machine learning, computational pathologists can create models that can capture 80 to 90 percent of variance that gene models pick up, he said, citing research from his own lab.

"The fundamental understanding is that embedded in this spectral data, embedded in the pink and blue pixels from a tissue slide, are the relationships that will allow you to model gene clusters," he said.

The Nature paper is also significant because it creates an "aha! moment" for readers, when they realize that the TOAD algorithm can produce risk scores in most cases where the primary tumor origin is unknown without the need for molecular testing. The research shows "that you can do a lot with these high-resolution cell and tissue images," Tomaszewski said. Mahmood "just happened to use standard histology, but you could do this with all kinds of microscopy."

Mahmood said that this method also could be applied to telepathology, since resource-poor settings often lack pathologists.

"We're working on a cloud-based setup where people would be able to upload their slides and get a diagnosis," Mahmood said. His group also has built a prototype of a small microscope with a built-in graphics processing unit that can run the TOAD algorithm to facilitate primary cancer diagnoses even in clinics without specialists or ready access to tumor sequencing.

The TOAD code has been released to the open-source community for other researchers to try. Mahmood's laboratory also plans on collaborating with other institutions to add many more cases of metastatic cancer to the database to refine the algorithm and ready the software for wide clinical use.

Additionally, his group wants to make TOAD multimodal, to include sequencing data, other kinds of lab test results, and patient and family histories, all with the goal of making the predictive model more accurate.

In the Nature article, the researchers added that TOAD "could be used in conjunction with or in lieu of ancillary tests and extensive diagnostic work-ups to reduce the occurrence of [cancers of unknown primary origin]."

They said that the algorithm should be able to help reduce the number of tests and tissue samples necessary but also produce more accurate predictions while saving time in diagnosing metastatic cancers.

They called the study a "proof of concept for developing large-scale, weakly supervised AI models for origin prediction" that opens the door to potential clinical trials to study artificial intelligence-based tumor origin prediction from conventional medical imaging.