Puntos claveEnglish 中文 (chinese) Pregunta
¿Las redes neuronales profundas están entrenadas en datos de una sola institución para la clasificación de pólipos colorrectales en diapositivas histopatológicas digitalizadas generalizables en múltiples instituciones externas?
Conclusiones
En este estudio pronóstico de una red neuronal profunda para clasificar los cuatro tipos de pólipos más comunes en diapositivas histopatológicas digitalizadas de una sola institución (conjunto de pruebas internas) y 24 instituciones de EE. UU. (conjunto de pruebas externas), la precisión promedio fue de 93.5 % en el conjunto de pruebas interno y 87.0 % en el conjunto de pruebas externo.
Significado
Las redes neuronales profundas pueden proporcionar un enfoque generalizable para la clasificación de pólipos colorrectales en las diapositivas histopatológicas digitalizadas.
Importance
Histologic classification of colorectal polyps plays a critical role in screening for colorectal cancer and care of affected patients. An accurate and automated algorithm for the classification of colorectal polyps on digitized histopathologic slides could benefit practitioners and patients.
Objective
To evaluate the performance and generalizability of a deep neural network for colorectal polyp classification on histopathologic slide images using a multi-institutional data set.
Design, Setting, and Participants
This prognostic study used histopathologic slides collected from January 1, 2016, to June 31, 2016, from Dartmouth-Hitchcock Medical Center, Lebanon, New Hampshire, with 326 slides used for training, 157 slides for an internal data set, and 25 for a validation set. For the external data set, 238 slides for 179 distinct patients were obtained from 24 institutions across 13 US states. Data analysis was performed from April 9 to November 23, 2019.
Main Outcomes and Measures
Accuracy, sensitivity, and specificity of the model to classify 4 major colorectal polyp types: tubular adenoma, tubulovillous or villous adenoma, hyperplastic polyp, and sessile serrated adenoma. Performance was compared with that of local pathologists’ at the point of care identified from corresponding pathology laboratories.
Results
For the internal evaluation on the 157 slides with ground truth labels from 5 pathologists, the deep neural network had a mean accuracy of 93.5% (95% CI, 89.6%-97.4%) compared with local pathologists’ accuracy of 91.4% (95% CI, 87.0%-95.8%). On the external test set of 238 slides with ground truth labels from 5 pathologists, the deep neural network achieved an accuracy of 87.0% (95% CI, 82.7%-91.3%), which was comparable with local pathologists’ accuracy of 86.6% (95% CI, 82.3%-90.9%).
Conclusions and Relevance
The findings suggest that this model may assist pathologists by improving the diagnostic efficiency, reproducibility, and accuracy of colorectal cancer screenings.
In the US, colorectal cancer was estimated to cause 51 020 deaths in 2019, making it the second most common cause of death due to cancer.1 This death rate, however, has decreased in the past several decades, likely because of successful cancer screening programs.2-5 Colonoscopy is the most common test in these screening programs in the US.6 During colonoscopies, practitioners excise colorectal polyps and visually examine them on histopathologic slides for neoplasia. Early detection of cancer at an early, curable stage and removal of preinvasive adenomas or serrated lesions during this procedure are associated with a reduced mortality rate.7-9 Furthermore, the numbers and types of polyps detected are associated with the risk of malignant tumors and are therefore used as the basis for subsequent screening recommendations.6 An algorithm for automated classification of colorectal polyps could potentially benefit cancer screening programs by improving efficiency, reproducibility, and accuracy as well as reducing the access barrier to pathological services.10
In recent years, a class of computational models known as deep neural networks has driven substantial advances in the field of artificial intelligence. Comprising many processing layers, deep neural networks take a data-driven approach to automatically learn the most relevant features of input data for a given task, markedly improving the state of the art in computer vision,11 natural language processing,12 and speech recognition.13 For medical image analysis in particular, deep learning has achieved considerable performance in classification of images, including chest radiographs,14 retinal fundus photographs,15 head computed tomography scans,16 lung histopathologic slides,17 and skin cancer images.18
This study evaluated the performance and generalizability of a deep neural network for colorectal polyp classification on histopathologic slide images using a multi-institutional data set. To our knowledge, this study is the first to comprehensively evaluate a deep learning algorithm for colorectal polyp classification and assess the generalizability of this model across multiple institutions.
This prognostic study used histopathologic slides from Dartmouth-Hitchcock Medical Center (DHMC), a tertiary academic care center in Lebanon, New Hampshire, to train a deep neural network for colorectal polyp classification. Internal and external data sets of hematoxylin and eosin–stained, formalin-fixed, paraffin-embedded colorectal polyp, whole-slide images were collected. Each of these slides could contain 1 or more tissue section or polyp. This study and the use of human participant data in this project were approved by the Dartmouth-Hitchcock Health Institutional Review Board with a waiver of informed consent. The conducted research reported in this article is in accordance with this approved Dartmouth-Hitchcock Health Institutional Review Board protocol and the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects.19 In addition, the study followed the Standards for Reporting of Diagnostic Accuracy (STARD) reporting guideline.20
The internal data set was collected from January 1, 2016, to June 31, 2016, at DHMC. This data set included 508 slides from the 4 most common polyp types according to local diagnoses parsed from pathology reports: tubular adenoma, tubulovillous or villous adenoma, hyperplastic polyp, and sessile serrated adenoma. The slides were scanned (Aperio AT2, Leica Biosystems) at 40× resolution (0.25-μm pixel−1) at DHMC. In this internal data set, each whole-slide image was from a different patient and colonoscopy procedure. We partitioned these slides into a training set of 326 slides, a validation set of 25 slides, and an internal test set of 157 slides. The distribution of polyp types was balanced in the validation and internal test sets, whereas slides were oversampled for hyperplastic polyps and sessile serrated adenomas in the training set to improve model training for these classes (Figure 1).
For the external data set, we collaborated with investigators from a randomized clinical trial on the effect of supplementation with calcium and/or vitamin D for the prevention of colorectal adenomas21 as well as their network of laboratories. Through this collaboration, we were given access to 1182 whole-slide images along with their diagnoses given by local pathologists. These slides were borrowed from various US pathology laboratories (eTable 1 in the Supplement) by one of us (E.L.B.) from January 1, 2016, to December 31, 2017, and digitized by scanners (Aperio AT2, Leica Biosystems) at 40× resolution at DHMC (similar to the internal data set) before they were returned to the original laboratories. We randomly sampled up to 95 of these slides for each of 4 polyp types as diagnosed by the local pathologist. Of note, 15 of these randomly selected slides were removed because of poor slide quality as determined by our study’s lead expert pathologist (A.A.S.). In total, the final external validation set comprised 238 slides from 24 different institutions in 13 US states. In this external test set, some of the slides corresponded to the same patients because the 238 slides came from 179 distinct patients. All slides from the internal and external test sets were excluded from model development until final evaluation of the model. Each slide in the data set was the most diagnostic slide for the corresponding patient, and slides from the same patient were not from the same lesion.
We did not include any slides with the diagnosis of high-grade dysplasia or adenocarcinoma because we did not have enough samples from these cases in the external validation set. We also did not include normal as a class for whole slides in our study because normal slides are not routinely scanned in the internal or multi-institutional data sets. Moreover, we also did not distinguish regeneration epithelial hyperplasia and inflammatory polyps from hyperplastic polyps and tubular adenomas because of the small number of these cases in our training set. All diagnoses made by DHMC pathologists were based on World Health Organization criteria as of April 2019.22
The annotation process involved 5 gastrointestinal pathologists (A.A.S., L.J.V., B.R., X.L., M.L.) from the Department of Pathology and Laboratory Medicine at DHMC: 3 (A.A.S., B.R., M.L.) with gastrointestinal pathology fellowship training and 2 (L.J.V., X.L.) who gained gastrointestinal pathology expertise through years of gastrointestinal pathology service. For 157 whole-slide images in the training set, 2 of the gastrointestinal pathologists (A.A.S. and L.J.V.) identified the polyps on the slides and used the Rectlabel23 annotation tool to manually annotate rectangular bounding boxes around polyps and normal tissue as regions of interest for model training. In total, 3848 regions of interest were identified and labeled as 1 of the 4 polyp classes.
We also collected a smaller number of annotations from 25 separate whole-slide images as the validation set for hypermetric tuning of the model. In this validation set, the same 2 pathologists (A.A.S. and L.J.V.) annotated nonoverlapping patches of 224 × 224 pixels (or 448 × 448 μm) of classic examples for each polyp type. Because this data set was used to guide model development, all fixed-size patches were confirmed with high confidence by both pathologists, and patches with disagreements were discarded.
For the internal test set, the 5 gastrointestinal pathologists independently and retrospectively made a diagnosis based on each slide as 1 of the 4 polyp types. For this internal set, the local diagnoses given at DHMC may have been from 1 of the 5 study gastrointestinal pathologists, but the original diagnosis and identity of the pathologist at the point of care were hidden during the retrospective annotation phase.
For the external test set, the 5 gastrointestinal pathologists from DHMC also retrospectively made diagnoses based on all slides in the test set in the same fashion as for the internal test set. In total, 5 complete sets of diagnoses from gastrointestinal pathologists and the diagnoses given by local pathologists at the point of care were recorded. For both the internal and the external test sets, ground truth diagnoses were assigned by taking the majority vote of the 5 gastrointestinal pathologists. Figure 1 depicts the data flow for the study design. eFigure 1 in the Supplement shows the statistics on polyp types, number of patches, and slide sizes for the internal and external test sets.
In this study, we implemented the deep residual network (ResNet), a neural network architecture that significantly outperformed all other models on the ImageNet and Common Objects in Context image recognition benchmarks.24 For model training, we applied a sliding window method to the 3848 variable-size regions of interest labeled by pathologists in the training set, extracting approximately 7000 fixed-size 224 × 224-pixel patches per polyp type. Then, we initialized ResNet with the MSRA (Microsoft Research Asia) weight initialization11 and trained the neural network for 200 epochs with an initial learning rate of 0.001, which decayed by a factor of 0.9 every epoch. Throughout training, we applied standard image augmentation techniques, including rotations and flips as well as color jittering on the brightness, contrast, saturation, and hue of each image. For our final model, we used an ensembled model that comprised 5 ResNets of 18, 34, 50, 101, and 152 layers. Overall, training these networks took approximately 96 hours using a single graphics processing unit (NVIDIA Tesla K40c). Once the model was trained, there was no further modification of the model based on the pathologists’ examination of the results.
For the deep learning model to infer the overall diagnosis of a whole-slide image, we designed a hierarchical classification algorithm to match the nature of the classification task. Each slide was initially broken down into many patches using a sliding window algorithm, and each patch was classified by the neural network.
Using the predicted diagnoses by the neural network for all patches in a given slide, the model first determined whether a polyp was adenomatous (tubular, tubulovillous, or villous) or serrated (hyperplastic or sessile serrated) by comparing the number of predicted patches for the adenomatous and serrated types. Adenomatous polyps with more than a certain amount of tubulovillous or villous tissue (>30%) were classified as overall tubulovillous or villous adenoma, whereas the remaining polyps were classified as tubular adenoma. For serrated polyps, the algorithm classified polyps with above a certain amount of sessile serrated patches (>1.5%) as overall sessile serrated adenomas and the remaining polyps as hyperplastic. All thresholds were determined using a grid search over the internal training set. The hierarchical nature of the inference heuristic allowed us to imitate the schema used by pathologists for this classification task without training a separate machine learning classifier.
For final evaluation, we compared the performance of the model with that of local pathologists originally made at the point of care on the internal test set and the multi-institutional external test set. Local pathologist performance measures were averaged over all samples because information about individual pathologists’ performances were anonymized. To assess the quality of annotations in our study, we measured the agreement of our gastrointestinal pathologists in terms of multiclass Cohen κ. The application of the final model on a whole-slide image in the test sets took less than a mean of 60 seconds using a single graphics processing unit (NVIDIA Tesla K40c). For the model’s classifications, we calculated accuracy, sensitivity, and specificity in comparison with ground truth diagnoses and compared these metrics with those of local pathologists. Furthermore, we calculated confusion matrixes for local pathologists and the model and conducted appropriate error analysis.
The algorithms in this study were implemented in Python software, version 3.6 (Python Software Foundation). We used OpenSlide software, version 3.4.1 (Carnegie Mellon University School of Computer Science) to convert the digitized image format and PyTorch software, version 0.4 (Facebook’s AI Research Lab) for training the deep neural network models. The statistical analysis and 95% CIs were calculated using the Statistics, version 3.4 library in Python. The source code for this study is publicly available.25
We used a 2-tailed t test for proportions with a significance level of 2-sided P ≤ .05 to compare the performance of local pathologists and the model on the internal and external test sets. R, version 3.3.3 (R Foundation for Statistical Computing) was used for the statistical analysis in this study. Data analysis was performed from April 9 to November 23, 2019.
The Table gives the per-class and mean performance metrics of local pathologists and the proposed model for internal and external test sets. For the internal test set from DHMC, interobserver agreement, measured by Cohen κ, was in the substantial range of 0.61 to 0.80, with the 5 study gastrointestinal pathologists achieving a mean multiclass Cohen κ of 0.72 (95% CI, 0.64-0.80). The model achieved a mean accuracy (the unweighted mean of individual polyp type accuracies) of 93.5% (95% CI, 89.6%-97.4%) compared with local pathologists’ accuracy of 91.4% (95% CI, 87.1%-95.8%) on the internal data set. A 2-tailed t test for proportions revealed, however, that the differences in performance were not significant (pathologist, 91.4%; deep neural network, 93.5%; P = 0.50 for accuracy; pathologist, 80.7%; deep neural network, 86.8%; P = .14 for sensitivity; and pathologist, 95.1%; deep neural network, 95.7%; P = .80 for specificity).
Multi-institutional External Evaluation
The external data set had less agreement for pathologists and the model. The 5 study gastrointestinal pathologists achieved a mean multiclass Cohen κ of 0.67 (95% CI, 0.60-0.75). With an accuracy of 87.0% (95% CI, 82.7%-91.3%) on the external test set, the model performed at a similar level of accuracy, sensitivity, and specificity as local pathologists on this data set (pathologist, 86.6%; deep neural network, 87.0%; P = .90 for accuracy; pathologist, 78.4%; deep neural network, 77.7%; P = .86 for sensitivity; and pathologist, 91.6%; deep neural network, 91.6%; P = .99 for specificity). The Table gives the performance metrics for local pathologists and deep neural network for each polyp class on the internal and external test sets. eTable 2 in the Supplement gives the performance of local pathologists and the deep learning model stratified by the agreement of DHMC pathologists in determining ground truth labels.
Confusion Matrices and Error Analysis
Moreover, in Figure 2, we calculated confusion matrixes for local pathologists and the model on the external test set to determine which polyp types were the most challenging to diagnose. Local pathologists often classified tubular adenomas as tubulovillous or villous adenomas (46.3%) and hyperplastic polyps as sessile serrated adenomas (12.9%). The deep neural network similarly classified many tubular adenomas as tubulovillous or villous adenomas (23.2%) and hyperplastic polyps as sessile serrated adenomas (27.3%). For further analysis of the model’s errors, eFigure 2 in the Supplement shows violin plots for predicted percentage areas of each polyp type on slides.
The results of the model were visualized on digitized slides by highlighting the regions that contributed to the whole-slide classification. Figure 3 shows examples of slides with the lead gastrointestinal pathologist’s (A.A.S.) annotations, the heat map detected by the model, and the visualization of our model’s results.
To our knowledge, this study is the first to evaluate a deep neural network for colorectal polyp classification on a large multi-institutional data set with comparison with local diagnoses made at the point of care. On a test set of 238 images from 24 external institutions, the model achieved an accuracy of 87.0%, which was on par with the local pathologists’ accuracy of 86.6% at the α = .05 level. With regard to annotation agreement, the 5 study gastrointestinal pathologist annotators had a mean Cohen κ of 0.72 on the internal test set and 0.67 on the external test set, which were higher than the previously reported Cohen κ scores of 0.46,26 0.31,27 0.55,28 and 0.54.29 This difference in performance is likely attributable to differences in polyp type distributions in various data sets, interlaboratory variations in tissue processing and staining, and institutional biases in the polyp classification criteria. Of note, although including the external slides for training would likely improve the performance of the model on the external test set, the deep neural network was intentionally trained only on the internal data set to examine its generalizability to external institutions.
In terms of error analysis, the deep neural network made similar misclassifications as local pathologists, as shown by the similarities in their confusion matrixes. Both the model and the local pathologists distinguished adenomatous (tubular, tubulovillous, or villous) and serrated (hyperplastic or sessile serrated) polyps with high accuracy, whereas the model had a higher number of mistakes within those 2 categories. Of note, the model used a simple hierarchical heuristic based on the number of predicted patches to distinguish adenomatous and serrated polyps on a whole slide, which is not as nuanced as a pathologist’s line of thought in real-world settings. Further subclassification of adenomatous and serrated polyps was relatively more challenging for the model. We hypothesize that many of the mistakes occurred because thresholds for detection of tubulovillous or villous growths and of sessile serrated crypts vary among pathologists because the lead gastrointestinal pathologist’s manual inspection of discordances found that many of the errors made by the deep neural network were similar to mistakes made by pathologists in practice. For example, a common mistake made by both the model and the local pathologists was distinguishing hyperplastic polyps and sessile serrated adenomas, potentially reflecting the data imbalance of the sessile serated adenoma class in the training set.
This study not only showed the utility of a deep learning model for classification of colorectal polyps but also advances previous literature14-18,30-33 in terms of model evaluation and study design. A previous study on deep learning for colorectal polyp classification30,31 demonstrated good performance on an internal data set but used a simpler approach and did not include pathologist-level performance or local diagnoses. The present study, on the other hand, evaluated a deep neural network on a multi-institutional external data set and demonstrated a comparable diagnostic performance of deep neural networks compared with local pathologists at the point of care. Many previous studies14-18,32,33 demonstrated practitioner-level performance of deep neural networks on various medical classification tasks. All these studies,14-18,30-33 however, measured practitioner-level performance on a predetermined number of practitioners from a few medical institutions in a controlled setting. Although it is important to measure retrospective practitioner performance on classification tasks, we used diagnoses by local pathologists in clinical practice at the point of care in 24 external institutions for comparison against the deep neural network.
A deep learning model for colorectal polyp classification, if validated through clinical trials, has potential for widespread application in clinical settings. Our model could be implemented in laboratory information systems to guide pathologists by identifying areas of interest on digitized slides, which could improve work efficiency, reproducibility, and accuracy for colorectal polyp classification. Although expert practitioner confirmation of diagnoses will still be required, the model could help triage slides indicating diagnoses that are more likely to be preinvasive for subsequent review by pathologists. Because the US Preventive Services Task Force recommends that all adults aged 50 to 75 years undergo screening for colorectal cancer, an automated model for classification could be useful in relieving pathologists’ burden in slide review and ultimately reduce the barrier of access for colorectal cancer screening.
Moving forward, further work can be performed in deep learning for analysis of colorectal polyp images. Foremost, we plan to implement the model prospectively in a clinical setting to measure its ability to enhance pathologists’ classification of colorectal polyps and improve outcomes in a clinical trial. In terms of technical improvements to the model, more data can be collected and used for training to increase the model’s performance, especially for sessile serrated adenomas, and new less common classes, such as high-grade dysplasia, adenocarcinoma, regeneration epithelial hyperplasia, and inflammatory polyps. Moreover, related work has found that deep learning can identify hidden features in histopathologic images that can be used to detect gene mutations17 and predict patient survival,34-36 tasks that pathologists do not perform. To this end, we plan to collect more patient outcome data to train the model to predict polyp recurrence and patient survival in colorectal cancer.
This study has limitations. Although the model performed on par with local pathologists on the external test set, it did not perform as well as the internal evaluation. The results suggest that there is a higher level of variability among slides from various institutions and the model could be further improved by training on larger, diverse data sets. Furthermore, although the model identified the most common polyp types, the study was performed on well-sectioned, clearly stained slides and did not include less common classes, such as traditional serrated adenoma or sessile serrated adenoma with cytologic dysplasia. In addition, the model was not evaluated on entirely normal slides. Our team plans to collect further data and extend the model and its evaluation to these additional cases as future work. In addition, local pathologists might have had access to additional slides and patient information, such as patient colonoscopy history and polyp biopsy location, that may have influenced their diagnoses. Access to this additional information might explain some of the discrepancies between local diagnoses and ground truth labels, which were only based on digitized slides.
In this study, the performance of the deep learning model was similar to that of local pathologists on the internal and external test sets. If confirmed in clinical trials, this model could improve the efficiency, reproducibility, and accuracy of colonoscopy.
Accepted for Publication: February 19, 2020.
Published: April 23, 2020. doi:10.1001/jamanetworkopen.2020.3398
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2020 Wei JW et al. JAMA Network Open.
Corresponding Author: Saeed Hassanpour, PhD, Department of Biomedical Data Science, Dartmouth College, One Medical Center Dr, HB 7261, Lebanon, NH 03756 (Saeed.Hassanpour@dartmouth.edu).
Author Contributions: Dr Hassanpour and Mr Wei had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Wei, Suriawinata, Tomita, Abdollahi, Hassanpour.
Acquisition, analysis, or interpretation of data: Wei, Suriawinata, Vaickus, Ren, Liu, Lisovsky, Kim, Snover, Baron, Barry, Hassanpour.
Drafting of the manuscript: Wei, Tomita, Abdollahi, Hassanpour.
Critical revision of the manuscript for important intellectual content: Wei, Suriawinata, Vaickus, Ren, Liu, Lisovsky, Kim, Snover, Baron, Barry, Hassanpour.
Statistical analysis: Wei, Hassanpour.
Obtained funding: Barry, Hassanpour.
Administrative, technical, or material support: Suriawinata, Ren, Tomita, Abdollahi, Barry, Hassanpour.
Supervision: Suriawinata, Vaickus, Hassanpour.
Conflict of Interest Disclosures: Dr Suriawinata reported receiving grants from the National Library of Medicine, National Institutes of Health (NIH) during the conduct of the study. Dr Ren reported grants from NIH during the conduct of the study. Dr Snover reported receiving personal fees from Dartmouth Medical Center during the conduct of the study. Dr Baron reported receiving grants from the National Cancer Institute, NIH during the conduct of the study. Dr Barry reported receiving grants from the National Cancer Institute, NIH during the conduct of the study. Dr Hassanpour reported having a patent to Attention-Based Classification of High Resolution Microscopy Images pending and receiving grants from NIH during the conduct of the study. No other disclosures were reported.
Funding/Support: This work was supported by grants R01CA098286 (Dr Baron), R01LM012837 (Dr Hassanpour), and P20GM104416 (Dr Hassanpour) from the NIH, the Geisel School of Medicine at Dartmouth, and the Norris Cotton Cancer Center.
Role of the Funder/Sponsor: The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Additional Contributions: Thomas H. Cormen, PhD, and Lamar Moss, BA, Dartmouth College, provided feedback on this article; Leila Mott, MS, Dartmouth College, helped with the data set; and Minnesota Gastroenterology helped with data collection. These individuals were not compensated for their contribution.
3.Cress
RD, Morris
C, Ellison
GL, Goodman
MT. Secular changes in colorectal cancer incidence by subsite, stage at diagnosis, and race/ethnicity, 1992-2001.
Cancer. 2006;107(5)(suppl):1142-1152. doi:
10.1002/cncr.22011
PubMedGoogle ScholarCrossref 4.Edwards
BK, Ward
E, Kohler
BA,
et al. Annual report to the nation on the status of cancer, 1975-2006, featuring colorectal cancer trends and impact of interventions (risk factors, screening, and treatment) to reduce future rates.
Cancer. 2010;116(3):544-573. doi:
10.1002/cncr.24760
PubMedGoogle ScholarCrossref 9.Citarda
F, Tomaselli
G, Capocaccia
R, Barcherini
S, Crespi
M; Italian Multicentre Study Group. Efficacy in standard clinical practice of colonoscopic polypectomy in reducing colorectal cancer incidence.
Gut. 2001;48(6):812-815. doi:
10.1136/gut.48.6.812
PubMedGoogle ScholarCrossref 12.Jean
S, Cho
K, Memisevic
R, Bengio
Y. On using very large target vocabulary for neural machine translation. Proceedings of the Association for Computational Linguistics International Joint Conference on Natural Language Processing. December 5, 2014. Accessed May 12, 2019.
https://arxiv.org/abs/1412.2007 13.Mikolov
T, Deoras
A, Povey
D, Burget
L, Cernocky
J. Strategies for training large scale neural network language models. Proceedings of the Automatic Speech Recognition and Understanding Conference. March 5, 2011. Accessed May 12, 2019.
https://arxiv.org/abs/1512.04906 14.Irvin
J, Rajpurkar
P, Ko
M,
et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. Proceedings of the Association for the Advancement of Artificial Intelligence Conference. January 21, 2019. Accessed April 28, 2019.
https://arxiv.org/abs/1901.07031 22.Bosman
FT, Carneiro
F, Hruban
R, Theise
ND. WHO Classification of Tumours of the Digestive System. 4th ed. World Health Organization; 2010.
24.He
K, Zhang
X, Ren
S, Sun
J. Deep residual learning for image recognition. In: Proceedings of the Conference on Computer Vision and Pattern Recognition. December 10, 2015. Accessed May 19, 2019.
https://arxiv.org/abs/1512.03385 25.Wei JW, Tafe LJ, Linnik YA, et al. Pathologist-level classification of histologic patterns on resected lung adenocarcinoma slides with deep neural networks.
Sci Rep. 2019;9:3358.
https://github.com/BMIRDS/deepslide 26.Yoon
H, Martin
A, Benamouzig
R, Longchampt
E, Deyra
J, Chaussade
S; Groupe d’étude APACC. [Inter-observer agreement on histological diagnosis of colorectal polyps: the APACC study].
Gastroenterol Clin Biol. 2002;26(3):220-224.
PubMedGoogle Scholar 27.Terry
MB, Neugut
AI, Bostick
RM, Potter
JD, Haile
RW, Fenoglio-Preiser
CM. Reliability in the classification of advanced colorectal adenomas.
Cancer Epidemiol Biomarkers Prev. 2002;11(7):660-663.
PubMedGoogle Scholar