Skip to main content
Premium Trial:

Request an Annual Quote

AI Technology Can Classify Skin Cancer As Well As Dermatologists, Stanford Study Finds

Skin cancer screening

NEW YORK (360Dx) – A team of artificial intelligence researchers and dermatologists at Stanford University has shown that a deep learning algorithm can distinguish skin cancer from benign conditions with the same accuracy as dermatologists looking at the same skin images, paving the way for a smartphone app that could help detect skin cancer early.

For their study, published in Nature today, the scientists used almost 130,000 clinical images to train a deep convolutional neural network (CNN) and compared its performance against that of more than 20 board-certified dermatologists for differentiating between keratinocyte carcinomas and a benign condition, and between malignant melanomas and benign moles. They found that the algorithm performed at least as well as the doctors.

While further studies are needed to validate the technology for clinical use, the deep learning algorithm approach could be applied not only in dermatology but also in other medical specialties that rely on diagnostic imaging data, such as ophthalmology, otolaryngology, radiology, and pathology, the researchers wrote.

“It seems very promising,” said Ali Hendi, a board-certified dermatologist and clinical assistant professor of dermatology at Georgetown University Hospital, who was not involved in the study. Hendi said there have been previous attempts to automate image-based skin lesion analysis, but none of these have become clinically relevant so far.

One potential application he sees for the technology is for triaging. Patients or their primary care physicians often call in because they are concerned about a skin lesion, which turns out to be benign in the majority of cases, but appointments with a dermatologist are often booked out several months in advance, he said. If patients and their doctors had access to an app that could tell them how likely the lesion is to be cancerous, patients with a high probability for a malignancy could be prioritized for an appointment, he added.

“This is work that’s intended to extend the reach of medical practice, potentially outside of the clinic,” said Andre Esteva, a graduate student of Sebastian Thrun in Stanford’s Artificial Intelligence Laboratory and one of the lead authors of the study. He stressed that the technology is not intended as a diagnostic tool but as a screening tool that could help patients and healthcare providers decide whether a diagnostic biopsy is necessary.

Other groups have been working on developing confocal microscopy and multiphoton microscopy approaches to diagnose skin cancer, but those aim primarily at replacing invasive biopsies.

The hope is that the artificial intelligence technology could help detect skin cancer earlier, in particular melanoma, for which survival rates drop quickly as the cancer progresses. “So if you could use our technology to find melanoma in individuals sooner, you could save a lot of lives,” Esteva said.

It is also unlikely to make skin specialists redundant. “Far from replacing dermatologists, I think this technology will make them more effective,” said Roberto Novoa, a Stanford dermatologist and one of the authors of the study.

Roughly 5 million Americans are diagnosed with non-melanoma skin cancer every year, and most of them do well, although a subset goes on to develop incurable disease. Melanoma, on the other hand, accounted for only 80,000 cases of skin cancer last year but led to 10,000 deaths. “Given that skin cancer is so curable in its early stages, this technology could theoretically have an impact on the early diagnosis, but extensive study is required before making any such claims,” Novoa said. 

In a commentary published alongside the Nature paper, Sancy Leachman of the Department of Dermatology at Oregon Health and Science University and Glenn Merlino of the National Cancer Institute pointed out that although a smartphone app implementing the technology might increase the number of individuals that are assessed for skin cancer, artificial intelligence-driven diagnostics might also have “unintended adverse consequences,” for example because patients at risk of skin cancer may no longer visit dermatologists for a full body screen.

“Patients may be falsely reassured or falsely alarmed by an app that has been implemented in a suboptimal fashion,” Novoa said. “Furthermore, the lesions that are queried by the patients with an app may be the ‘wrong ones’, with patients worrying about their benign age spots while the unseen melanomas on their backs continue to grow.” Numerous articles demonstrate, for example, that dermatologists detect melanomas in patients who came to see them for a different reason. “Finally, even if an image is appropriately classified, it serves the patient not at all if he or she cannot obtain access to medical care for the lesion in question,” he added.

According to Estava, the study published today was initiated by Stanford dermatologists, who contacted the AI group after reading about a deep neural network algorithm that could distinguish between different breeds of dogs, a difficult classification task. “So they approached us and asked ‘What if we could do this for skin diseases as well?’”

While computer-based analysis of skin images is not novel – Estava said the literature on this probably goes back 20 years – the new Stanford study used state-of-the art artificial intelligence, as well as a much larger set of images than previous studies to train the algorithm.

Specifically, the researchers used a pre-trained version of the Google Inception v3 CNN architecture and trained it on a dataset of almost 130,000 images of skin lesions, including more than 3,000 dermoscopy images, which are taken by dermatologists using a special instrument to get a better view. In total, the images comprised more than 2,000 diseases that were divided into almost 800 disease classes.

The researchers then tested the algorithm on a set of skin images with biopsy-confirmed diagnoses, asking it to recognize keratinocyte carcinoma and melanoma — the latter either from ordinary photos or from dermoscopy images — and to distinguish these cancers from benign skin lesions. They compared the results with the classifications of more than 20 board-certified dermatologists, who were asked whether they would recommend a biopsy or other further treatment, or whether they would just reassure the patient.

As it turned out, the algorithm outperformed almost all of the dermatologists in classifying the images correctly. “Whilst we acknowledge that a dermatologist’s clinical impression and diagnosis is based on contextual factors beyond visual and dermoscopic inspection of a lesion in isolation, the ability to classify skin lesion images with the accuracy of a board-certified dermatologist has the potential to profoundly expand access to vital medical care,” the researchers wrote in their paper.

According to the commentary by Leachman and Merlino, the study does not address how well the algorithm handles “some thorny issues that can plague dermatologists,” for example distinguishing between some skin diseases that look very similar. Novoa said the team did look at its performance for three-way classifications — a smaller comparison reported in the supplementary material — and again found that it was as good as dermatologists. “Nevertheless, adequate evaluation of this and many other questions will require real-world studies,” he said. 

Regarding further research, Esteva said he and his colleagues are “still deliberating what the next steps should be.” He declined to comment on whether any aspects of the technology have been patented, and whether the group is considering commercial development.

To develop the algorithm for use in a clinical setting, they would need to build an app and test its accuracy in a clinical trial, he said. “But right now, we’re just reporting scientific results.”

At least one company, MetaOptima, already offers a device, called MoleScope, that attaches to a smartphone and allows patients to take photos of suspicious moles, which they can then “self-check” with an app or send to their dermatologist. The company also markets a software platform called DermEngine to medical professionals that includes “advanced algorithms for image analytics.”

Also, two years ago, the Federal Trade Commission challenged several marketers of mobile apps for claiming without scientific evidence that their programs could calculate melanoma risk from images of moles and other information.

The deep learning approach used by the Stanford team could also be applied to other types of medical images in different specialties. “The exciting thing about this technique is that it it’s rather agnostic to the data used,” Estava said. “It works very well with images, so if you have data at your disposal, you can apply this technique to a number of other imaging modalities” in order to make classifications, he said.

In a study published last month in the Journal of the American Medical Association, for example, researchers at Google applied a deep CNN for detecting diabetic retinopathy from photos of the back of the eye of patients.

There is also growing literature about the use of artificial intelligence to improve the detection of mitoses in cancer, Novoak said, which indicates cell division and can help with cancer prognosis.