NEW YORK (GenomeWeb) – A New York University-led team has established a deep-learning method for identifying non-small cell lung cancer subtypes and predicting a handful of common gene mutations based on histopathology imaging.
The researchers trained their convolutional neural network-based deep-learning network with histopathology images generated from whole slides of hematoxylin- and eosin-stained lung tissues prepared for the Cancer Genome Atlas project, demonstrating that it could discern between normal lung tissue, lung squamous cell carcinoma, and lung adenocarcinoma. They subsequently validated the approach in hundreds more lung cancer slides prepared from frozen, formalin-fixed, paraffin-embedded, or biopsy samples.
In the case of lung adenocarcinoma, the team also went on to train the deep-learning network to predict the presence or absence of mutations in 10 commonly altered genes based on imaging data. That approach produced promising results when it came to picking up mutations in half a dozen of the genes — including TP53, EGFR, and KRAS — based on images produced for pathology analyses.
"Overall, this study demonstrates that deep-learning convolutional neural networks could be a very useful tool for assisting pathologists in their classification of whole-slide images of lung tissues," corresponding authors Aristotelis Tsirigos, an applied bioinformatics and pathology researcher at NYU, and Narges Razavian, a population health researcher at NYU, and their colleagues wrote in a study published online today in Nature Medicine.
"This information can be crucial in applying the appropriate and tailored targeted therapy to patients with lung cancer," they added, "increasing thereby the scope and performance of precision medicine that aims at developing a multiplex approach with patient-tailored therapies."
The researchers began by tapping into a set of whole-slide images representing 609 lung squamous cell carcinoma tumors, 567 lung adenocarcinoma tumors, and 459 normal lung tissues. After splitting the slides into training, validation, and test sets, they tiled the slides into pixel windows and filtered out data from the backgrounds of the slides.
Then, using an inception v3 architecture from Google, the team trained its deep-learning model to discern normal from tumor tissue based on per-tile classifications, demonstrating that the tumor-normal distinctions achieved by this approach compared favorably to automated pathology approaches reported previously in non-small cell lung cancer.
The researchers also trained and validated the method to not only tease out normal samples, but also each of the two lung cancer subtypes considered. They found that this image-based computational approach produced classifications that were on par with those performed by hand by three pathologists looking at the same test set of slides.
Moreover, the authors noted that "around half of the TCGA whole-slide images misclassified by the algorithms have also been misclassified by the pathologists, highlighting the intrinsic difficulty in distinguishing [lung adenocarcinoma] from [lung squamous cell carcinoma] in some cases."
The team further validated the image-based classification scheme using another 98 slides prepared from frozen lung tumor samples, 140 FFPE-based slides, and 102 lung biopsy slides. They also trained a convolutional neural network to search for signs of mutations involving 10 recurrently altered genes in lung adenocarcinoma — an approach that showed promise for predicting mutations in the TP53, KRAS, STK11, EGFR, FAT1, and SETBP1 genes.
"This network has already been successfully adapted to other specific types of classifications like skin cancers and diabetic retinopathy detection," the authors wrote, noting that the code for the approach is available online.