Skip to main content

Artificial Intelligence Outperforms Humans in Cervical Cancer Screening

Premium

CHICAGO (360Dx) – An artificial intelligence-based algorithm has shown to be far more accurate at identifying early signs of cervical cancer in low-resource areas than conventional cytology or human image review, according to recently published research.

In a paper that appeared in the Journal of the National Cancer Institute last week, researchers from the National Institutes of Health and the Global Good fund found that their "automated visual evaluation" method identified cervical precancer with an accuracy-under-the-curve rate of .91. That compares to .69 for human review of archived images and .71 for traditional cytology.

"Here we have a technology that can do better than experts can do," said senior author Mark Schiffman of the National Cancer Institute's Division of Cancer Epidemiology and Genetics.

Cervical cancer guidelines happen to be risk-based, so machine learning seems ideal for this type of screening. "They're numeric. They're probabilities," Schiffman said. "So, here we have artificial intelligence giving a probability-like score of who has precancer that needs to be treated to prevent cancer."

Global Good is a collaboration between Bellevue Washington-based Intellectual Ventures Lab and the Bill & Melinda Gates Foundation. The organization is seeking to improve cervical screening and human papillomavirus vaccination in low- and middle-income countries, where 80 percent of cervical cancer cases and 90 percent of deaths from the disease occur worldwide, according to a 2012 study in The Lancet Oncology.

While those figures are several years old, more recent, anecdotal evidence suggests that the disparity problem remains about the same.

The NIH and Global Good team built an AI algorithm based on some 60,000 archived, digitized cervical images from more than 9,400 women in Guanacaste, Costa Rica, taken between 1993 and 2000. The images were captured with fixed-focus cameras, a now-obsolete technique called cervicography, as part of a previous NCI study.

For that earlier study, researchers also conducted cytology testing and HPV screening.

"The objective of this [new] study was to develop a 'deep learning'-based visual evaluation algorithm that automatically recognizes cervical precancer/cancer," the NIH and Global Good team wrote in the JNCI article.

They said that their method potentially could be groundbreaking in low-resource settings, where healthcare workers currently screen for cervical abnormalities with a technique called visual inspection with acetic acid, or VIA. It is far from perfect.

"VIA is not accurate in distinguishing precancer from much more common minor abnormalities, leading to both overtreatment and undertreatment," the researchers wrote. "Increasingly, it is recognized that the visual identification of precancer by health workers, even by experienced nurses and doctors using a colposcope, the reference standard visual tool, is too often unreliable and inaccurate. Thus, we currently still lack a practical, accurate visual screening approach."

Schiffman noted that current standards of care for detecting precancer of the cervix are maybe 60 to 70 percent accurate. "They're missing 30 to 40 percent on each time that it's done, whereas this [algorithm] was twice as sensitive than human interpretation of the image, given the same specificity," he said.

Indeed, Schiffman said, the most common result of cytology is atypical squamous cells of unknown significance, or ASCUS. In other words, he explained, "I don't know what this is."

The algorithm also caught more cases earlier than three different kinds of Pap smear, including a modern, liquid-based Pap test. "It approached the degree of sensitivity of an HPV test," Schiffman added.

The researchers said that the automated visual evaluation algorithm "could theoretically be used to triage women testing positive for HPV rather than for primary screening" as well.

Schiffman reported being somewhat shocked by the accuracy of the new technology.

"When we first saw the result, our first response, all of us in the group was, this is an error. There's a bias here," he said. "Over the last year, we've tried in every way to debunk it. … Now we have three different experiments and they all confirm that it works."

A 2016 article in Procedia Computer Science from researchers at Pondicherry University in India also found that applied artificial neural networks produced more accurate results than manual methods of screening for cervical cancer. That work, however, compared different types of algorithms rather than focusing on a longitudinal study of a specific population.

Now that the NIH algorithm has shown to be effective, Schiffman, a physician with a background in public health, is excited about the possibilities.

"Basically, we can now go anywhere and with a couple of dollars, we can provide an accurate assessment of whether treatment is necessary and then deliver it at point of care anywhere with a little mobile unit that it combines [with] treatment," Schiffman said. This mobile unit could be a smartphone, tablet, or a digital camera paired with a computer.

"We can go anyplace. You don't have to assemble reagents. All it needs is vinegar. It's just a profoundly convenient and pragmatic and accurate option [to address] current inaccessibility and cancer health disparities."

For now, Schiffman's team is controlling access to the algorithm, and plans on assembling a nonprofit consortium to develop open-source standards for automated visual evaluation. Members of that community will be able to create apps based on the core technology, following those standards.

The algorithmic quality will keep improving as we continue to collect. So we're setting up a collaboration, a consortium of interested parties that will be a nonprofit, good-for-all kind of public-health venture. And we're organizing an international consortium of open-source folks who will basically create standards.

There is room for improvement, though. While the technology could run through the camera on a cell phone, Schiffman noted that each model of phone processes images differently. "A lot of cell phones will take the forefront and then they'll blur the background or they'll change the color tone to make it hyper-realistic, things that would ruin the accuracy of the assessment," he said.

He said that his team at NCI is testing various phones as well as standalone camera elements that go into phones in search of a good match for the automated visual evaluation algorithm. Some researchers involved in this project are building their own devices with camera components, while others are experimenting with widely available phones using different lighting techniques or simply working to optimize the algorithm on specific phone models.