Diabetic macular edema (DME) is the primary cause of vision loss among individuals with diabetes mellitus (DM). We developed, validated, and tested a deep learning (DL) system for classifying DME using images from three common commercially available optical coherence tomography (OCT) devices.
We trained and validated two versions of a multitask convolution neural network (CNN) to classify DME (center-involved DME [CI-DME], non-CI-DME, or absence of DME) using three-dimensional (3D) volume scans and 2D B-scans, respectively. For both 3D and 2D CNNs, we used the residual network (ResNet) as the backbone. For the 3D CNN, we used a 3D version of ResNet-34 with the last fully connected layer removed as the feature extraction module. A total of 73,746 OCT images were used for training and primary validation. External testing was performed using 26,981 images across seven independent data sets from Singapore, Hong Kong, the U.S., China, and Australia.
In classifying the presence or absence of DME, the DL system achieved area under the receiver operating characteristic curves (AUROCs) of 0.937 (95% CI 0.920–0.954), 0.958 (0.930–0.977), and 0.965 (0.948–0.977) for the primary data set obtained from CIRRUS, SPECTRALIS, and Triton OCTs, respectively, in addition to AUROCs >0.906 for the external data sets. For further classification of the CI-DME and non-CI-DME subgroups, the AUROCs were 0.968 (0.940–0.995), 0.951 (0.898–0.982), and 0.975 (0.947–0.991) for the primary data set and >0.894 for the external data sets.
We demonstrated excellent performance with a DL system for the automated classification of DME, highlighting its potential as a promising second-line screening tool for patients with DM, which may potentially create a more effective triaging mechanism to eye clinics.
Introduction
Diabetic macular edema (DME) is the primary cause of vision loss among individuals with diabetes mellitus (DM), and can develop at any stage of diabetic retinopathy (DR) (1). Although substantial international guidelines and national programs for the screening of DR already exist for preventing vision loss among such individuals (2–7), these programs mostly run on two-dimensional (2D) retinal fundus photographs, which have demonstrated limited performance in screening for DME. Because DME is a 3D condition involving edematous thickening of the macula, screening for DME using 2D retinal fundus photographs has reportedly led to very high false-positive rates (e.g., >86% in Hong Kong and >79% in the U.K.), increasing the number of non-DME cases unnecessarily referred to ophthalmologists and straining clinical resources (8,9). Furthermore, there is an increasing awareness for differentiating eyes with center-involved DME (CI-DME), which are more likely to have visual impairments and require more timely management strategies (e.g., intravitreal injections of anti–vascular endothelial growth factor), compared with eyes with non-CI-DME, for which treatment needs may be less urgent—to obtain the most cost-effective outcomes for patients with DM (6).
Optical coherence tomography (OCT), particularly spectral domain or Fourier domain OCT, is a noninvasive technique for imaging 3D layered retinal structures within seconds. It has been proposed as an alternative screening tool for DME (10,11), particularly as a second-line screening tool for those who screen positive based on 2D retinal fundus photographs (12). However, the identification of DME from OCT images, as well as the classification into the CI-DME and non-CI-DME subtypes, still requires human assessment, either by ophthalmologists or professionally trained technicians and graders, who may need to manually review multiple cross-sectional OCT B-scan images from the volumetric scan slice by slice. While OCT viewing platforms have some built-in automated features (e.g., macular thickness, central subfield thickness, comparison of normative databases), it is not possible to compare across different commercial OCT devices because of their unique manufacturer algorithms and normative databases (13,14).
Over the past few years, several automated deep learning (DL) systems for DME detection and fluid segmentation from OCT images have been developed (15–20). Studies investigating these systems demonstrate that DL algorithms can accurately detect DME from OCT images and that they have the potential to enhance and speed up clinical workflows through automated image interpretation (21). However, several critical gaps remain. First, most of the proposed DL algorithms have been trained and tested on OCT images obtained from a single commercial device in a single center, with a lack of external data sets to test generalizability. Second, and perhaps more importantly, no studies to date have tested these algorithms on their classification of DME into CI-DME and non-CI-DME subgroups, which is important for triaging patients into timely referral intervals or specialized clinics such as retina clinics.
To address these gaps, we developed a novel multitask DL system, applying a segmentation-free classification approach, for the automated classification of DME from OCT images obtained from three common commercially available OCT devices (CIRRUS OCT, SPECTRALIS OCT, and Triton OCT). Specifically, according to the different scanning protocols for each device, we first trained a deep convolutional neural network (CNN) to screen for DME using 3D volume-scans from CIRRUS OCT, followed by another CNN using a series of 2D B-scans from SPECTRALIS OCT and Triton OCT. Second, we developed algorithms to classify DME cases into CI-DME and non-CI-DME subgroups. Third, we trained CNNs to simultaneously detect retinal abnormalities other than DME using images from all three OCT devices.
Research Design and Methods
Data Sets
Primary Development, Testing, and Validation Data Set
The primary data set for training, testing, and primary validation was retrospectively drawn from the Chinese University of Hong Kong-Sight Threatening Diabetic Retinopathy (CUHK-STDR) study from November 2015 to June 2019 (22). Briefly, this is an ongoing prospective, observational cohort study, aimed at identifying new risk factors for DR progression. Inclusion criteria were as follows: patients with type 1 or type 2 DM, age >18 years, and treatment naive at baseline. Exclusion criteria were eyes with prior retinal surgery, intravitreal injection, macular laser photocoagulation, or pan-retinal laser photocoagulation and eyes with pathology that interferes with imaging (e.g., dense cataract, corneal ulcer). We extracted macular OCT images, obtained with the following devices and protocols, from all participants in the CUHK-STDR study: 1) CIRRUS OCT (Carl Zeiss Meditec, Dublin, CA) with a 6 mm × 6 mm 3D macular cube (512 A-scans per B-scan; 128 B-scans over 1,024 samplings) scanning protocol, 2) SPECTRALIS OCT (Heidelberg Engineering, Heidelberg, Germany) with a high-resolution 6.3 mm × 6.3 mm (1,024 A-scans per B-scan, 25 B-scans) and high-speed 6.5 mm × 4.9 mm (1,024 A-scans per B-scan; 19 B-scans) scanning protocol; and 3) Triton OCT (Topcon Corp., Tokyo, Japan) with a high-resolution radial 9 mm × 30° (1,024 A-scans per B-scan; 12 B-scans) scanning protocol.
External Testing Data Sets
We identified seven independent, retrospectively collected data sets of macular OCT images of patients with DM from different centers to test the performance of the DL system in the classification of DME and non-DME retinal abnormalities. The images for this study were selected by the site investigators from a certain time period that represented their DM cohorts at each site. Data sets 1–3 (External 1, 2, and 3) were collected, respectively, from the Singapore Integrated Diabetic Retinopathy Program (SiDRP), Singapore; the Eye Clinic at Alice Ho Miu Ling Nethersole Hospital (AHNH); and the Byers Eye Institute at Stanford, Stanford University Medical Center. All three centers used CIRRUS OCT with the same scanning protocol for the primary data set. Data sets 4–6 (External 4, 5, and 6) were collected, respectively, from the Eye Clinic of the Aier medical group in Guangzhou, China; the Westmead Institute for Medical Research; and the Eye Clinic at United Christian Hospital. All three centers used SPECTRALIS OCT with the same scanning protocols for the primary data set. Finally, data set 7 (External 7) was collected from the Retina Clinic of Joint Shantou International Eye Center (JSIEC). The center used Triton OCT with the same scanning protocol for the primary data set.
Ground Truth Labeling
All anonymized OCT scans were labeled by well-trained graders (F.T. and Z.T.) on full-screen, high-resolution 27-inch monitors (Koninklijke Philips N.V.) in the CUHK Ophthalmic Reading Centre, following the reference standards and grading definitions listed below. F.T. and Z.T.’s intragrader Cohen κ of grading for presence or absence of DME was 0.947 and 0.924 and for presence or absence of non-DME retinal abnormalities was 0.940 and 0.923, respectively. Intergrader Cohen κ of grading for presence or absence of DME was 0.889, and grading for presence or absence of non-DME retinal abnormalities was 0.862. A panel of retina specialists adjudicated the positive cases during ground truth labeling.
A gradable scan was defined as being assessable for macular morphology, having good image quality, and being free of image artifacts. An acceptable scan was defined as being assessable for macular morphology and subsequent pathology labeling, despite having fair image quality because of artifacts (e.g., low signal strength). An ungradable scan was defined as not being assessable for macular morphology, having insufficient image quality because of artifacts. Systems training did not include ungradable scans.
The presence of DME was defined as either perceptible retinal thickening or the presence of DME features (e.g., intraretinal cystoid spaces, subretinal fluid, and hard exudates) in the macula. For eyes with DME, CI-DME was defined as either retinal thickening or the presence of DME features in the macula involving the central subfield zone (1 mm in diameter), whereas non-CI-DME was defined as retinal thickening or the presence of DME features in the macula not involving the central subfield zone. Retinal thickening was defined according to DRCR.net protocol–defined thresholds (≥320 µm for men and ≥305 µm for women on SPECTRALIS OCT; ≥305 µm for men and ≥290 µm for women on CIRRUS OCT) and the threshold of the Moorfields DME study (≥350 µm on a Topcon OCT) (23,24). The absence of DME is defined as absence of retinal thickening and any DME features. Finally, the presence of non-DME retinal abnormalities was defined as any abnormal appearance in the OCT scan other than DME (e.g., age-related macular degeneration, epiretinal membrane abnormalities, central serous chorioretinopathy, and macular holes).
The current study presented here was conducted in accordance with the 1964 Declaration of Helsinki and was approved by local research ethics committees. Because the study is a retrospective analysis of fully anonymized OCT images, the ethics committees waived the requirements for informed consent.
Development of the DL System
A detailed description of the development of the DL system can be found in Supplementary 1. Briefly, we built a 3D multitask CNN for analyzing 3D volume scans imaged by CIRRUS OCT and a 2D multitask CNN for analyzing a series of 2D B-scans imaged by SPECTRALIS OCT and Triton OCT (25). Fig. 1 illustrates the architecture for both versions of the CNN, which comprises three components: a shared feature extraction module, a “DME classification” module, and a “non-DME retinal abnormalities classification” module.
The proposed 3D multitask convolutional neural network (CNN) for 3D OCT volumetric scans (A) and 2D multitask CNN for 2D B-scan images (B). Both our networks have three components: a shared feature extraction module, a DME classification (no DME, non-CI-DME, and CI-DME) module, and an abnormality classification module. We used a 3D version of residual network (ResNet)-34 for the 3D CNN and ResNet-18 for the 2D CNN. We applied the following presence-based strategy to obtain per-scan (volume) level results for the 2D CNN: 1) If any B-scans are predicted as CI-DME, the whole scan is classified as CI-DME; 2) if 1 does not hold and at least one B-scan is predicted as non-CI-DME, the whole scan is classified as non-CI-DME; and 3) if both 1 and 2 do not hold, the whole scan is classified as non-DME. Conv, convolution; GAP, global average pooling; Max, maximum; Norm., normalization; ReLU, rectified linear unit.
The proposed 3D multitask convolutional neural network (CNN) for 3D OCT volumetric scans (A) and 2D multitask CNN for 2D B-scan images (B). Both our networks have three components: a shared feature extraction module, a DME classification (no DME, non-CI-DME, and CI-DME) module, and an abnormality classification module. We used a 3D version of residual network (ResNet)-34 for the 3D CNN and ResNet-18 for the 2D CNN. We applied the following presence-based strategy to obtain per-scan (volume) level results for the 2D CNN: 1) If any B-scans are predicted as CI-DME, the whole scan is classified as CI-DME; 2) if 1 does not hold and at least one B-scan is predicted as non-CI-DME, the whole scan is classified as non-CI-DME; and 3) if both 1 and 2 do not hold, the whole scan is classified as non-DME. Conv, convolution; GAP, global average pooling; Max, maximum; Norm., normalization; ReLU, rectified linear unit.
Statistical Analysis
Numerical data were analyzed with the Wilcoxon rank sum test. The χ2 test was used for analysis of categorical data, including the demographic characteristics of all participants and the data variances of different data sets. The discriminative performance of the DL system (classifying the presence or absence of DME, CI-DME vs. non-CI-DME, and the presence or absence of non-DME retinal abnormalities) was evaluated using area under the receiver operating characteristic curves (AUROCs), in addition to percentage rates for sensitivity, specificity, and accuracy. All statistical analyses were performed with RStudio (version 1.1.463, 2009–2018; RStudio, Inc.). Each external testing data set for the current study was weighted to make sure that with the sample sizes from the retrospective data sets there was sufficient power to estimate the performance of the DL system; there were positive case ranges of 2–92%, 57–94%, and 3–26% for any DME, CI-DME, and non-DME retinal abnormalities, respectively. To detect an AUROC of ≥0.7 with >80% power (α = 0.05), we estimated that 17–174, 36–252, and 17–23 cases would be required from each data set for each classification task.
Results
A total of 100,727 OCT images, representing 4,261 eyes from 2,329 subjects with DM, were used for development, primary validation, and external testing. These images include 7,006 volume scans from CIRRUS OCT, 48,810 B-scans from SPECTRALIS OCT, and 44,911 B-scans from Triton OCT. Table 1 shows the characteristics of the study participants in both the primary data set and the external testing data sets.
Summary of primary and external testing data sets for training, validating, and testing the multitask DL system
Data set . | OCT device . | . | Sample for DME classification module . | Sample for non-DME retinal abnormalities classification module . | |||
---|---|---|---|---|---|---|---|
Total sample (n = 100,727 images) . | No DME . | Non-CI-DME . | CI-DME . | Absence . | Presence . | ||
Primary | CIRRUS HD-OCT | No. of OCT volumes | 2,580 | 413 | 795 | 3,405 | 383 |
No. of eyes | 655 | 69 | 215 | 828 | 111 | ||
No. of subjects | 295 | 38 | 141 | 402 | 72 | ||
Male sex, n (%) | 136 (46.1) | 21 (55.3) | 82 (58.2) | 209 (52.0) | 30 (41.7) | ||
Age (years), mean (SD) | 60.8 (15.4) | 62.7 (11.6) | 63.7 (9.2) | 60.6 (13.8) | 68.0 (10.6) | ||
SPECTRALIS OCT | No. of OCT volumes | 864 | 203 | 328 | 1,040 | 355 | |
No. of OCT B-scans | 25,073 | 4,392 | 1,050 | 25,815 | 4,700 | ||
No. of eyes | 368 | 74 | 103 | 383 | 162 | ||
No. of subjects | 164 | 36 | 74 | 171 | 103 | ||
Male sex, n (%) | 84 (51.2) | 17 (47.2) | 50 (67.6) | 95 (55.6) | 56 (54.4) | ||
Age (years), mean (SD) | 59.5 (15.6) | 64.2 (10.1) | 62.2 (11.2) | 58.3 (14.9) | 65.4 (10.7) | ||
Triton OCT | No. of OCT volumes | 2,121 | 421 | 790 | 2,163 | 1,169 | |
No. of OCT B-scans | 31,639 | 2,171 | 5,633 | 30,280 | 9,163 | ||
No. of eyes | 575 | 89 | 296 | 538 | 427 | ||
No. of subjects | 267 | 37 | 186 | 263 | 227 | ||
Male sex, n (%) | 133 (49.8) | 16 (43.2) | 92 (49.5) | 117 (44.5) | 124 (54.6) | ||
Age (years), mean (SD) | 60.4 (14.9) | 63.5 (12.7) | 64.5 (12.7) | 61.9 (12.9) | 62.3 (13.6) | ||
External 1 (Singapore, SERI) | CIRRUS HD-OCT | No. of OCT volumes | 2,320 | 19 | 26 | 2,292 | 73 |
No. of eyes | 1,349 | 11 | 18 | 1,330 | 48 | ||
No. of subjects | 663 | 9 | 25 | 654 | 43 | ||
Male sex, n (%) | 395 (59.6) | 5 (55.6) | 11 (44.0) | 381 (58.3) | 30 (69.8) | ||
Age (years), mean (SD) | 59.5 (9.2) | 53.8 (6.1) | 58.3 (7.9) | 58.3 (9.1) | 61.6 (9.8) | ||
External 2 (Hong Kong, AHNH) | CIRRUS HD-OCT | No. of OCT volumes | 412 | 69 | 178 | 486 | 173 |
No. of eyes | 355 | 64 | 154 | 422 | 151 | ||
No. of subjects | 141 | 34 | 117 | 186 | 106 | ||
Male sex, n (%) | 86 (61.0) | 23 (67.6) | 72 (61.5) | 116 (62.4) | 65 (61.3) | ||
Age (years), mean (SD) | 69.1 (9.81) | 61.2 (15.6) | 64.8 (9.0) | 65.5 (11.2) | 69.2 (8.4) | ||
External 3 (U.S., Stanford) | CIRRUS HD-OCT | No. of OCT volumes | 96 | 42 | 56 | 151 | 43 |
No. of eyes | 96 | 42 | 56 | 151 | 43 | ||
No. of subjects | 48 | 18 | 40 | 71 | 35 | ||
Male sex, n (%) | 22 (45.8) | 6 (33.3) | 16 (40.0) | 24 (33.8) | 20 (57.1) | ||
Age (years), mean (SD) | 60.44 (12.0) | 62.8 (10.7) | 65.5 (10.8) | 65.9 (11.3) | 68.8 (9.2) | ||
External 4 (China, Aier) | SPECTRALIS OCT | No. of OCT volumes | 172 | 3 | 27 | 172 | 30 |
No. of OCT B-scans | 3,352 | 309 | 81 | 3,541 | 201 | ||
No. of eyes | 172 | 3 | 27 | 172 | 30 | ||
No. of subjects | 104 | 3 | 21 | 101 | 25 | ||
Male sex, n (%) | 61 (58.7) | 1 (33.3) | 10 (47.6) | 58 (57.4) | 14 (56.0) | ||
Age (years), mean (SD) | 57.2 (10.0) | 69.8 (5.41) | 63.0 (11.6) | 58.2 (10.2) | 61.2 (13.1) | ||
External 5 (Australia, WIMR) | SPECTRALIS OCT | No. of OCT volumes | 46 | 36 | 121 | 178 | 25 |
No. of OCT B-scans | 3,578 | 1,151 | 361 | 4,588 | 502 | ||
No. of eyes | 46 | 36 | 121 | 178 | 25 | ||
No. of subjects | 11 | 11 | 81 | 87 | 16 | ||
Male sex, n (%) | 6 (54.5) | 5 (45.5) | 53 (65.4) | 55 (63.2) | 9 (56.3) | ||
Age (years), mean (SD) | 54.2 (14.7) | 54.5 (5.7) | 62.2 (8.4) | 59.5 (9.7) | 65.6 (6.9) | ||
External 6 (Hong Kong, UCH) | SPECTRALIS OCT | No. of OCT volumes | 53 | 50 | 296 | 327 | 72 |
No. of OCT B-scans | 3,980 | 4,484 | 999 | 8,361 | 1,102 | ||
No. of eyes | 53 | 50 | 296 | 327 | 72 | ||
No. of subjects | 11 | 14 | 191 | 160 | 56 | ||
Male sex, n (%) | 5 (45.5) | 9 (64.3) | 119 (62.3) | 84 (52.5) | 35 (62.5) | ||
Age (years), mean (SD) | 66.5 (11.6) | 61.7 (9.1) | 67.2 (9.8) | 65.9 (10.5) | 69.4 (7.6) | ||
External 7 (China, JSIEC) | Triton OCT | No. of OCT volumes | 36 | 23 | 394 | 116 | 337 |
No. of OCT B-scans | 1,295 | 507 | 3,666 | 4,872 | 596 | ||
No. of eyes | 36 | 23 | 394 | 116 | 337 | ||
No. of subjects | 29 | 17 | 280 | 54 | 272 | ||
Male sex, n (%) | 11 (37.9) | 4 (23.6) | 128 (45.7) | 20 (37.0) | 120 (44.1) | ||
Age (years), mean (SD) | 57.2 (12.8) | 60.0 (7.7) | 58.9 (8.8) | 59.0 (10.4) | 58.7 (9.0) |
Data set . | OCT device . | . | Sample for DME classification module . | Sample for non-DME retinal abnormalities classification module . | |||
---|---|---|---|---|---|---|---|
Total sample (n = 100,727 images) . | No DME . | Non-CI-DME . | CI-DME . | Absence . | Presence . | ||
Primary | CIRRUS HD-OCT | No. of OCT volumes | 2,580 | 413 | 795 | 3,405 | 383 |
No. of eyes | 655 | 69 | 215 | 828 | 111 | ||
No. of subjects | 295 | 38 | 141 | 402 | 72 | ||
Male sex, n (%) | 136 (46.1) | 21 (55.3) | 82 (58.2) | 209 (52.0) | 30 (41.7) | ||
Age (years), mean (SD) | 60.8 (15.4) | 62.7 (11.6) | 63.7 (9.2) | 60.6 (13.8) | 68.0 (10.6) | ||
SPECTRALIS OCT | No. of OCT volumes | 864 | 203 | 328 | 1,040 | 355 | |
No. of OCT B-scans | 25,073 | 4,392 | 1,050 | 25,815 | 4,700 | ||
No. of eyes | 368 | 74 | 103 | 383 | 162 | ||
No. of subjects | 164 | 36 | 74 | 171 | 103 | ||
Male sex, n (%) | 84 (51.2) | 17 (47.2) | 50 (67.6) | 95 (55.6) | 56 (54.4) | ||
Age (years), mean (SD) | 59.5 (15.6) | 64.2 (10.1) | 62.2 (11.2) | 58.3 (14.9) | 65.4 (10.7) | ||
Triton OCT | No. of OCT volumes | 2,121 | 421 | 790 | 2,163 | 1,169 | |
No. of OCT B-scans | 31,639 | 2,171 | 5,633 | 30,280 | 9,163 | ||
No. of eyes | 575 | 89 | 296 | 538 | 427 | ||
No. of subjects | 267 | 37 | 186 | 263 | 227 | ||
Male sex, n (%) | 133 (49.8) | 16 (43.2) | 92 (49.5) | 117 (44.5) | 124 (54.6) | ||
Age (years), mean (SD) | 60.4 (14.9) | 63.5 (12.7) | 64.5 (12.7) | 61.9 (12.9) | 62.3 (13.6) | ||
External 1 (Singapore, SERI) | CIRRUS HD-OCT | No. of OCT volumes | 2,320 | 19 | 26 | 2,292 | 73 |
No. of eyes | 1,349 | 11 | 18 | 1,330 | 48 | ||
No. of subjects | 663 | 9 | 25 | 654 | 43 | ||
Male sex, n (%) | 395 (59.6) | 5 (55.6) | 11 (44.0) | 381 (58.3) | 30 (69.8) | ||
Age (years), mean (SD) | 59.5 (9.2) | 53.8 (6.1) | 58.3 (7.9) | 58.3 (9.1) | 61.6 (9.8) | ||
External 2 (Hong Kong, AHNH) | CIRRUS HD-OCT | No. of OCT volumes | 412 | 69 | 178 | 486 | 173 |
No. of eyes | 355 | 64 | 154 | 422 | 151 | ||
No. of subjects | 141 | 34 | 117 | 186 | 106 | ||
Male sex, n (%) | 86 (61.0) | 23 (67.6) | 72 (61.5) | 116 (62.4) | 65 (61.3) | ||
Age (years), mean (SD) | 69.1 (9.81) | 61.2 (15.6) | 64.8 (9.0) | 65.5 (11.2) | 69.2 (8.4) | ||
External 3 (U.S., Stanford) | CIRRUS HD-OCT | No. of OCT volumes | 96 | 42 | 56 | 151 | 43 |
No. of eyes | 96 | 42 | 56 | 151 | 43 | ||
No. of subjects | 48 | 18 | 40 | 71 | 35 | ||
Male sex, n (%) | 22 (45.8) | 6 (33.3) | 16 (40.0) | 24 (33.8) | 20 (57.1) | ||
Age (years), mean (SD) | 60.44 (12.0) | 62.8 (10.7) | 65.5 (10.8) | 65.9 (11.3) | 68.8 (9.2) | ||
External 4 (China, Aier) | SPECTRALIS OCT | No. of OCT volumes | 172 | 3 | 27 | 172 | 30 |
No. of OCT B-scans | 3,352 | 309 | 81 | 3,541 | 201 | ||
No. of eyes | 172 | 3 | 27 | 172 | 30 | ||
No. of subjects | 104 | 3 | 21 | 101 | 25 | ||
Male sex, n (%) | 61 (58.7) | 1 (33.3) | 10 (47.6) | 58 (57.4) | 14 (56.0) | ||
Age (years), mean (SD) | 57.2 (10.0) | 69.8 (5.41) | 63.0 (11.6) | 58.2 (10.2) | 61.2 (13.1) | ||
External 5 (Australia, WIMR) | SPECTRALIS OCT | No. of OCT volumes | 46 | 36 | 121 | 178 | 25 |
No. of OCT B-scans | 3,578 | 1,151 | 361 | 4,588 | 502 | ||
No. of eyes | 46 | 36 | 121 | 178 | 25 | ||
No. of subjects | 11 | 11 | 81 | 87 | 16 | ||
Male sex, n (%) | 6 (54.5) | 5 (45.5) | 53 (65.4) | 55 (63.2) | 9 (56.3) | ||
Age (years), mean (SD) | 54.2 (14.7) | 54.5 (5.7) | 62.2 (8.4) | 59.5 (9.7) | 65.6 (6.9) | ||
External 6 (Hong Kong, UCH) | SPECTRALIS OCT | No. of OCT volumes | 53 | 50 | 296 | 327 | 72 |
No. of OCT B-scans | 3,980 | 4,484 | 999 | 8,361 | 1,102 | ||
No. of eyes | 53 | 50 | 296 | 327 | 72 | ||
No. of subjects | 11 | 14 | 191 | 160 | 56 | ||
Male sex, n (%) | 5 (45.5) | 9 (64.3) | 119 (62.3) | 84 (52.5) | 35 (62.5) | ||
Age (years), mean (SD) | 66.5 (11.6) | 61.7 (9.1) | 67.2 (9.8) | 65.9 (10.5) | 69.4 (7.6) | ||
External 7 (China, JSIEC) | Triton OCT | No. of OCT volumes | 36 | 23 | 394 | 116 | 337 |
No. of OCT B-scans | 1,295 | 507 | 3,666 | 4,872 | 596 | ||
No. of eyes | 36 | 23 | 394 | 116 | 337 | ||
No. of subjects | 29 | 17 | 280 | 54 | 272 | ||
Male sex, n (%) | 11 (37.9) | 4 (23.6) | 128 (45.7) | 20 (37.0) | 120 (44.1) | ||
Age (years), mean (SD) | 57.2 (12.8) | 60.0 (7.7) | 58.9 (8.8) | 59.0 (10.4) | 58.7 (9.0) |
Aier, Aier School of Ophthalmology; SERI, Singapore Eye Research Institute; Stanford, Byers Eye Institute at Stanford; UCH, United Christian Hospital; WIMR, Westmead Institute for Medical Research.
Table 2 shows the discriminative performance of the DL system in DME classification (presence vs. absence of DME) for the primary validation and exte-rnal testing data sets at volume scan level. For the primary data set, the DL system achieved AUROCs of 0.937 (95% CI 0.920–0.954), 0.958 (95% CI 0.930–0.977), and 0.965 (95% CI 0.948–0.977) among images obtained from the CIRRUS, SPECTRALIS, and Triton OCTs, respectively, with sensitivities of 87.4%, 92.7%, and 94.3%; specificities of 100%, 98.9%, and 98.6%; and accuracies of 96.4%, 96.3%, and 96.9%. For classifying CI-DME and non-CI-DME among eyes with any DME, the DL system achieved AUROCs of 0.968 (95% CI 0.940–0.995), 0.951 (95% CI 0.898–0.982), and 0.975 (95% CI 0.947–0.991) among images obtained from the CIRRUS, SPECTRALIS, and Triton OCTs, with sensitivities of 95.8%, 92.3%, and 98.9%; specificities of 97.8%, 97.9%, and 96.2%; and accuracies of 96.3%, 94.4%, and 98.0%. For the external data sets, the discriminative performance of the DL system with different OCT devices was similar to that for the primary data set. For the classification of any DME, the ranges for AUROCs, sensitivity, specificity, and accuracy were 0.906–0.956, 81.4–100.0%, 89.7–100.0%, and 92.6–99.5%, respectively. For the classification of CI-DME and non-CI-DME, the ranges for AUROCs, sensitivity, specificity, and accuracy were 0.894–1.000, 87.1–100.0%, 85.7–100.0%, and 91.3–100.0%.
Discriminative performance of the multitask DL system in the classification of DME and CI-DME across primary validation and external testing data sets
Classification task and OCT device . | Data set . | AUROC (95% CI) . | Sensitivity, % (95% CI) . | Specificity, % (95% CI) . | Accuracy, % (95% CI) . |
---|---|---|---|---|---|
Presence vs. absence of DME | |||||
CIRRUS HD-OCT | Primary | 0.937 (0.920–0.954) | 87.4 (82.7–91.6) | 100.0 (100.0–100.0) | 96.4 (95.1–97.6) |
External 1 | 0.906 (0.947–0.968) | 81.4 (69.8–93.0) | 99.8 (99.6–100.0) | 99.5 (99.2–99.7) | |
External 2 | 0.929 (0.907–0.947) | 89.5 (85.1–93.0) | 96.3 (93.9–97.9) | 93.6 (91.5–95.4) | |
External 3 | 0.930 (0.894–0.965) | 86.0 (78.5–92.5) | 100.0 (100.0–100.0) | 93.5 (90.0–96.5) | |
SPECTRALIS OCT | Primary | 0.958 (0.930–0.977) | 92.7 (86.9–96.4) | 98.9 (96.3–99.9) | 96.3 (93.7–98.1) |
External 4 | 0.956 (0.935–0.978) | 100.0 (100.0–100.0) | 91.3 (86.6–95.4) | 92.6 (88.6–96.0) | |
External 5 | 0.936 (0.879–0.994) | 97.6 (95.2–99.4) | 89.7 (75.9–100.0) | 96.4 (93.8–98.5) | |
External 6 | 0.949 (0.922–0.977) | 96.4 (94.2–98.4) | 93.4 (87.9–97.8) | 95.7 (93.7–97.5) | |
Triton OCT | Primary | 0.965 (0.948–0.977) | 94.3 (90.9–96.8) | 98.6 (96.9–99.5) | 96.9 (95.4–98.1) |
External 7 | 0.954 (0.930–0.971) | 99.3 (97.9–99.9) | 91.7 (77.5–98.3) | 98.6 (97.1–99.5) | |
CI-DME vs. non-CI-DME | |||||
CIRRUS HD-OCT | Primary | 0.968 (0.940–0.995) | 95.8 (92.3–95.6) | 97.8 (93.3–100.00) | 96.3 (93.1–98.9) |
External 1 | 0.939 (0.851–1.000) | 95.5 (86.4–100.0) | 92.3 (76.9–100.0) | 94.3 (85.7–100.0) | |
External 2 | 0.894 (0.847–0.931) | 87.1 (81.1–91.8) | 91.7 (81.6–97.2) | 88.3 (83.5–92.2) | |
External 3 | 1.000 (1.000–1.000) | 100.0 (100.0–100.0) | 100.0 (100.0–100.0) | 100.0 (100.0–100.0) | |
SPECTRALIS OCT | Primary | 0.951 (0.898–0.982) | 92.3 (84.0–97.1) | 97.9 (88.9–99.9) | 94.4 (88.9–97.7) |
External 4 | 0.929 (0.863–0.995) | 88.9 (71.0–97.6) | 100.0 (29.2–100.0) | 90.0 (0.735–0.979) | |
External 5 | 0.899 (0.851–0.947) | 94.0 (89.5–97.7) | 85.7 (76.2–93.7) | 91.3 (87.2–94.9) | |
External 6 | 0.934 (0.905–0.962) | 94.2 (91.4–96.6) | 92.5 (86.9–97.2) | 93.7 (91.5–96.0) | |
Triton OCT | Primary | 0.975 (0.947–0.991) | 98.9 (95.9–99.9) | 96.2 (89.2–99.2) | 98.0 (95.4–99.4) |
External 7 | 0.975 (0.955–0.988) | 100.0 (99.1–100.0) | 95.0 (75.1–99.9) | 99.8 (98.7–100.0) |
Classification task and OCT device . | Data set . | AUROC (95% CI) . | Sensitivity, % (95% CI) . | Specificity, % (95% CI) . | Accuracy, % (95% CI) . |
---|---|---|---|---|---|
Presence vs. absence of DME | |||||
CIRRUS HD-OCT | Primary | 0.937 (0.920–0.954) | 87.4 (82.7–91.6) | 100.0 (100.0–100.0) | 96.4 (95.1–97.6) |
External 1 | 0.906 (0.947–0.968) | 81.4 (69.8–93.0) | 99.8 (99.6–100.0) | 99.5 (99.2–99.7) | |
External 2 | 0.929 (0.907–0.947) | 89.5 (85.1–93.0) | 96.3 (93.9–97.9) | 93.6 (91.5–95.4) | |
External 3 | 0.930 (0.894–0.965) | 86.0 (78.5–92.5) | 100.0 (100.0–100.0) | 93.5 (90.0–96.5) | |
SPECTRALIS OCT | Primary | 0.958 (0.930–0.977) | 92.7 (86.9–96.4) | 98.9 (96.3–99.9) | 96.3 (93.7–98.1) |
External 4 | 0.956 (0.935–0.978) | 100.0 (100.0–100.0) | 91.3 (86.6–95.4) | 92.6 (88.6–96.0) | |
External 5 | 0.936 (0.879–0.994) | 97.6 (95.2–99.4) | 89.7 (75.9–100.0) | 96.4 (93.8–98.5) | |
External 6 | 0.949 (0.922–0.977) | 96.4 (94.2–98.4) | 93.4 (87.9–97.8) | 95.7 (93.7–97.5) | |
Triton OCT | Primary | 0.965 (0.948–0.977) | 94.3 (90.9–96.8) | 98.6 (96.9–99.5) | 96.9 (95.4–98.1) |
External 7 | 0.954 (0.930–0.971) | 99.3 (97.9–99.9) | 91.7 (77.5–98.3) | 98.6 (97.1–99.5) | |
CI-DME vs. non-CI-DME | |||||
CIRRUS HD-OCT | Primary | 0.968 (0.940–0.995) | 95.8 (92.3–95.6) | 97.8 (93.3–100.00) | 96.3 (93.1–98.9) |
External 1 | 0.939 (0.851–1.000) | 95.5 (86.4–100.0) | 92.3 (76.9–100.0) | 94.3 (85.7–100.0) | |
External 2 | 0.894 (0.847–0.931) | 87.1 (81.1–91.8) | 91.7 (81.6–97.2) | 88.3 (83.5–92.2) | |
External 3 | 1.000 (1.000–1.000) | 100.0 (100.0–100.0) | 100.0 (100.0–100.0) | 100.0 (100.0–100.0) | |
SPECTRALIS OCT | Primary | 0.951 (0.898–0.982) | 92.3 (84.0–97.1) | 97.9 (88.9–99.9) | 94.4 (88.9–97.7) |
External 4 | 0.929 (0.863–0.995) | 88.9 (71.0–97.6) | 100.0 (29.2–100.0) | 90.0 (0.735–0.979) | |
External 5 | 0.899 (0.851–0.947) | 94.0 (89.5–97.7) | 85.7 (76.2–93.7) | 91.3 (87.2–94.9) | |
External 6 | 0.934 (0.905–0.962) | 94.2 (91.4–96.6) | 92.5 (86.9–97.2) | 93.7 (91.5–96.0) | |
Triton OCT | Primary | 0.975 (0.947–0.991) | 98.9 (95.9–99.9) | 96.2 (89.2–99.2) | 98.0 (95.4–99.4) |
External 7 | 0.975 (0.955–0.988) | 100.0 (99.1–100.0) | 95.0 (75.1–99.9) | 99.8 (98.7–100.0) |
For identification of External 1, 2, 3, 4, 5, 6, and 7, see Table 1.
Table 3 shows the performance of the DL system in classifying the presence or absence of non-DME retinal abnormalities at volume scan level. For the primary data set, the AUROCs were 0.948 (95% CI 0.930–0.963), 0.949 (95% CI 0.901–0.996), and 0.938 (95% CI 0.915–0.960) for images obtained from the CIRRUS, SPECTRALIS, and Triton OCTs, respectively, with sensitivities of 93.0%, 93.1%, and 97.2%; specificities of 89.4%, 96.6%, and 90.3%; and accuracies of 89.9%, 96.3%, and 91.0%. The performance in external data sets remained excellent, with the ranges for AUROCs, sensitivity, specificity, and accuracy being 0.901–0.969, 84.2–99.6%, 80.6–98.8%, and 91.0–98.0%, respectively.
Discriminative performance of the multitask DL system in the classification of presence vs. absence of non-DME retinal abnormalities across primary validation and external testing data sets
OCT device . | Data set . | AUROC (95% CI) . | Sensitivity, % (95% CI) . | Specificity, % (95% CI) . | Accuracy, % (95% CI) . |
---|---|---|---|---|---|
CIRRUS HD-OCT | Primary | 0.948 (0.930–0.963) | 93.0 (86.8–96.9) | 89.4 (86.7–91.7) | 89.9 (87.6–92.0) |
External 1 | 0.969 (0.941–0.996) | 90.4 (83.6–97.3) | 97.9 (89.4–96.3) | 97.7 (89.6–99.3) | |
External 2 | 0.915 (0.891–0.935) | 91.0 (85.2–95.1) | 88.7 (85.4–91.3) | 89.2 (86.7–91.5) | |
External 3 | 0.898 (0.830–0.966) | 84.2 (71.1–94.7) | 92.6 (88.3–96.3) | 91.0 (87.0–94.5) | |
SPECTRALIS OCT | Primary | 0.949 (0.901–0.996) | 93.1 (82.8–100.0) | 96.6 (94.3–98.7) | 96.3 (94.2–98.2) |
External 4 | 0.940 (0.901–0.979) | 96.7 (0.900–100.0) | 91.3 (82.6–95.4) | 92.1 (85.2–95.5) | |
External 5 | 0.960 (0.912–1.000) | 93.1 (82.8–100.0) | 98.8 (97.0–100.0) | 98.0 (95.9–99.5) | |
External 6 | 0.901 (0.867–0.935) | 99.6 (98.9–100.0) | 80.6 (73.9–87.3) | 93.2 (90.5–95.5) | |
Triton OCT | Primary | 0.938 (0.915–0.960) | 97.2 (93.1–100.0) | 90.3 (87.8–92.6) | 91.0 (88.9–93.1) |
External 7 | 0.926 (0.897–0.955) | 90.5 (85.3–95.7) | 94.7 (92.0–96.7) | 93.6 (91.4–95.8) |
OCT device . | Data set . | AUROC (95% CI) . | Sensitivity, % (95% CI) . | Specificity, % (95% CI) . | Accuracy, % (95% CI) . |
---|---|---|---|---|---|
CIRRUS HD-OCT | Primary | 0.948 (0.930–0.963) | 93.0 (86.8–96.9) | 89.4 (86.7–91.7) | 89.9 (87.6–92.0) |
External 1 | 0.969 (0.941–0.996) | 90.4 (83.6–97.3) | 97.9 (89.4–96.3) | 97.7 (89.6–99.3) | |
External 2 | 0.915 (0.891–0.935) | 91.0 (85.2–95.1) | 88.7 (85.4–91.3) | 89.2 (86.7–91.5) | |
External 3 | 0.898 (0.830–0.966) | 84.2 (71.1–94.7) | 92.6 (88.3–96.3) | 91.0 (87.0–94.5) | |
SPECTRALIS OCT | Primary | 0.949 (0.901–0.996) | 93.1 (82.8–100.0) | 96.6 (94.3–98.7) | 96.3 (94.2–98.2) |
External 4 | 0.940 (0.901–0.979) | 96.7 (0.900–100.0) | 91.3 (82.6–95.4) | 92.1 (85.2–95.5) | |
External 5 | 0.960 (0.912–1.000) | 93.1 (82.8–100.0) | 98.8 (97.0–100.0) | 98.0 (95.9–99.5) | |
External 6 | 0.901 (0.867–0.935) | 99.6 (98.9–100.0) | 80.6 (73.9–87.3) | 93.2 (90.5–95.5) | |
Triton OCT | Primary | 0.938 (0.915–0.960) | 97.2 (93.1–100.0) | 90.3 (87.8–92.6) | 91.0 (88.9–93.1) |
External 7 | 0.926 (0.897–0.955) | 90.5 (85.3–95.7) | 94.7 (92.0–96.7) | 93.6 (91.4–95.8) |
For identification of External 1, 2, 3, 4, 5, 6, and 7, see Table 1.
Figure 2 and Videos 1–3 show examples of images for eyes with DME for each of the three OCT devices and their corresponding heat maps, demonstrating our DL system’s ability to pay attention to features related to DME identification. In additional analysis (Supplementary Material), as the volume scan–level results for SPECTRALIS OCT and Triton OCT were made by 2D CNNs at the B-scan level with a presence-based strategy, we further tested the performances for the classification of any DME (Supplementary Table 1) and non-DME retinal abnormalities (Supplementary Table 2) at the B-scan level. We also tested the performances when only one scan from one eye was included (Supplementary Tables 3 and 4) in the primary data set. Furthermore, we tested the performances for classifying any DME among eyes with non-DME retinal abnormalities (Supplementary Tables 5 and 7) and for classifying non-DME retinal abnormalities among eyes with DME (Supplementary Tables 6 and 8) in the primary data set. Performances similar to those among the entire primary data set were demonstrated. Heat maps of each false-negative and false-positive case at the per-scan level of SPECTRALIS OCT and Trion OCT in the primary testing data set in classifying DME were reviewed, and examples are presented in Supplementary Figs. 1 and 2.
Examples of eyes with DME and corresponding heat maps. DME was identified among images from CIRRUS HD-OCT (A), SPECTRALIS OCT (B), and Triton OCT (C). In the heat maps, the colored area indicates the gradient of discriminatory power for classifying the presence or absence of DME. An orange-red color indicates the greatest relative discriminatory power, whereas a green-blue color indicates the least relative discriminatory power.
Examples of eyes with DME and corresponding heat maps. DME was identified among images from CIRRUS HD-OCT (A), SPECTRALIS OCT (B), and Triton OCT (C). In the heat maps, the colored area indicates the gradient of discriminatory power for classifying the presence or absence of DME. An orange-red color indicates the greatest relative discriminatory power, whereas a green-blue color indicates the least relative discriminatory power.
Video 1. A case with DME on Cirrus OCT scan. Available from https://bcove.video/3kNBT8X.
Video 1. A case with DME on Cirrus OCT scan. Available from https://bcove.video/3kNBT8X.
Video 2. A case with DME on Spectralis OCT scan. Available from https://bcove.video/3iELLPv.
Video 2. A case with DME on Spectralis OCT scan. Available from https://bcove.video/3iELLPv.
Video 3. A case with DME on Triton OCT scan. Available from https://bcove.video/3y2eCnh.
Video 3. A case with DME on Triton OCT scan. Available from https://bcove.video/3y2eCnh.
Conclusions
Regular screening for DR remains a cornerstone of the management of diabetic eye disease, having been shown to reduce blindness at the population level. A major shift in the last decade has been the widespread use of 2D retinal fundus photography for DR screening. However, following the understanding that DME is the primary cause of vision loss among DM patients, OCT has been suggested to utilize for a timely identification and treatment of DME, particularly CI-DME, to prevent vision loss in patients with DM (3,7,26). In the current study, we developed and validated a novel DL system for the fully automated classification of DME based on both 3D and 2D OCT images from three commonly used devices, yielding volume scan–level results for each eye. We externally tested the DL system using diverse, independent data sets collected from different centers, across different racial/ethnic groups, and in different settings (i.e., community-based screening and tertiary care settings). We showed that the proposed DL system had an excellent discriminative performance in both classifying any DME and distinguishing CI-DME from non-CI-DME, as well as identifying other non-DME retinal abnormalities on OCT images captured by the three widely used devices.
Our study substantially extends existing research and other studies as follows. First, currently, most reported DL algorithms for DME detection on OCT have mainly been trained and tested using cross-sectional B-scans (16–20,27,28). In the current study, we trained a CNN model to detect DME using 3D volume scans obtained from CIRRUS OCT, achieving a comparable performance. Application of 3D volume scan has substantial merits for DL algorithms training (29,30). For instance, it is labored and time-consuming for experts to label numerous B-scans for conducting supervised system training. Using labeled images from the volume scan level to train the proposed CNN reduced the burden of labeling work while at the same time maintaining excellent performance. En face slab imaging is also available from OCT. However, en face images may not be informative for applying the DL system to detect DME, as DME is a 3D condition. It should be noted that the volume scan–level results for SPECTRALIS OCT and Triton OCT were obtained from predictions made for a series of B-scans according to the presence-based strategy described in Research Design and Methods, as only individual B-scans could be exported from the raw files. By contrast, entire volumetric cubes could be exported directly from CIRRUS OCT.
Our study’s second novel feature is the classification of DME into CI-DME and non-CI-DME subgroups by the DL system. This subgroup categorization of DME is clinically important for DR screening, as it determines the reexamination frequency, the necessity and timing for referral to ophthalmologists, and the treatment recommendations in different resource settings, according to the International Council of Ophthalmology (ICO) Guidelines for Diabetic Eye Care (6). For example, in low- to intermediate-resource settings, patients with CI-DME as identified by the DL system should be referred to ophtha-lmologists and considered for either intravitreal anti–vascular endothelial growth factor therapy or laser photocoagulation as soon as possible, while patients with non-CI-DME can be referred to ophthalmologists at less busy clinics (6). The subclassification of DME will help to triage patients more effectively in DR screening programs, reducing false positives, conserving resources—especially in low- or intermediate-resource regions or countries—and enabling ophthalmologists to prioritize patients who need prompt treatment for preventing vision loss, allowing for the better use of costly specialist care resources and shorter hospital wait times. However, it should be noted that DME is but one of many factors that determine whether a given patient requires treatment or what that treatment should be. The current DL system alone is not intended for making any therapeutic decisions. In addition to the subgroup classification of DME, we intend to further develop the DL system to include other outputs related to visual acuity and structure of retina (e.g., ellipsoid zone and the external limiting membrane). The system may further indicate the possibility of visual recovery and improve the prioritization of patients who need prompt treatment, making the system more valuable in screening and assisting clinicians.
Our third novelty is the ability of the DL system to detect non-DME retinal conditions. Most previously reported DL systems only focus on one disease besides DME (17,18,27,31). Our study goes beyond previously published work, with detection of non-DME retinal abnormalities among individuals with DM, enhancing the applicability of our DL system in real-world screening settings. Training such a system to achieve good performance, using OCT images with multiple diseases besides DME, may be difficult given that some diseases are uncommon and some ocular changes share similar features with DME. However, our current proposed approach to detecting multiple diseases is relevant and representative of the populations for DR screening. Our proposed DL system showed excellent performances in detecting DME, not only among all cases, but also among those with non-DME retinal abnormalities, and vice versa (detecting retinal abnormalities among eyes with DME [results shown in Supplementary Tables 3–6]). Both results suggest that the system has great potential real-world clinical utility.
The fourth novel aspect of our DL algorithm is its applicability to three commonly used OCT devices. Previous studies have focused on one or, at most, two commercial OCT devices (15,16,18,20,27,28,32). In this study, we trained CNNs to detect DME using images obtained from three comme-rcial OCT devices, making the screening more generalizable. A common research challenge is that while the Digital Imaging and Communications in Medicine (DICOM) standard ensures a reasonable consistency among OCT images from different manufacturers, OCT images are often stored in a compressed format that may result in loss of information. Therefore, we trained the DL system using raw data (i.e., IMG files from CIRRUS OCT, E2E files from SPECTRALIS OCT, and FDS files from Triton OCT) exported from each OCT manufacturer’s software. Our DL system thus represents a machine-agnostic platform applicable to a wider range of OCT modalities.
Finally, in most previous studies, DL systems were trained to detect retinal pathologies by focusing either on the segmentation of relevant pathological markers such as macular fluid (15,18,20) or on the classification of the presence of specific pathologies on OCT scans (16,17,19). Although the segmentation approach for fluid quantification provides visualized and quantitative outcomes, it requires a vast number of B-scan level ground truths (i.e., pathologies delineated from each B-scan) labeled by skilled technicians. Such detailed labels are often unavailable, limiting the usability of the approach. In the current study, we trained our DL system to classify DME and non-DME retinal abnormalities through applying a segmentation-free classification approach to reduce the time and personnel requirements for labeling work, making it possible to train and validate the DL system using large data sets (>100,000 OCT images) while maintaining performance standards.
Our DL system showed excellent performance in detecting DME in the primary validation data set and all the external data sets; all the AUROC values were >0.90, demonstrating high generalizability across the data sets. As in our study, sensitivities and specificities dropped slightly for certain external data sets. We identified two possible reasons for these decreases. First, drops might be due to inter–data set variation in OCT images, including differences in calibration and image intensity (e.g., background noise and brightness), although scanning protocols were kept the same. Second, discrepancies might arise due to the racial/ethnic diversity for the primary and external data sets. Previously reported ethnic variabilities in retinal structure (e.g., foveal architecture [33] and vascular morphology [34]) might influence the DL system in terms of its DME classification performance. Despite these issues, the performance of our proposed DL system, which has a high sensitivity value of >80%, should be adequate for scre-ening purposes (26,35).
The current study has several strengths. First, our DL system can analyze OCT images from three commercially available OCT devices and has been successfully tested using unseen multicenter data sets comprising images from different racial/ethnic backgrounds and geographical locations. Second, the DL system generated heat maps to visualize discriminative image regions, allowing for a better understanding of the features that the DL system used to perform classification tasks. Several limitations of the current study should also be considered. First, we trained and tested our DL system only on gradable OCT images. Indeed, an initial gradability assessment is essential prior to disease classification, as it combines both quality checks for the acquired images and decisions on image inclusion. We are beginning to develop a separate 3D CNN for the automated filtering of ungradable OCT volume scans obtained from CIRRUS OCT (36). In our preliminary analysis for a macular OCT scan, the CNN for CIRRUS OCT achieved an AUROC of 0.884, with an accuracy of 0.873 in distinguishing gradable OCT volumes from ungradable ones (unpublished data, A.R. Ran, Z.Q. Tang, F.Y. Tang, J. Shi, A.K. Ngai, V. Yuen, N. Kei, and C.Y. Cheung). We will incorporate this CNN into the next version of the DL system. Second, the types of non-DME retinal abnormalities considered were relatively limited, and their numbers were few in the primary training data set. In fact, the performance in external data sets slightly dropped. The DL system thus needs to be further expanded and refined to consider a greater variety of abnormalities. Third, distinguishing DME from macular edema caused by other retinal abnormalities based on OCT image alone may be difficult. Nevertheless, the aim of the proposed DL system was to be not a diagnostic tool but a second-line screening tool for more effective patient triage in DR screening programs. It is worth noting that all data sets were retrospectively collected from patients with DM in eye clinics or DR screening programs. We believe that the majority of macular edema was due to DM or DR and the confusion of the cause of macular edema would be minimized. Fourth, it should be noted that spectral domain OCT is still specialized equipment in most countries, and its availability in the community may limit the implementation of the DL system. Further evidence on how the DL system performs as a second-line screening tool, or even a first-line screening tool or diagnostic support tool, to improve clinical workflows significantly for eventual real-world clinical use in the next phase will be essential. In addition, future development to unify all types of OCT scans into one framework to achieve effective DME classification in a device-agnostic manner is needed.
In summary, we developed, validated, and externally tested a multitask DL system to identify any DME, the subtypes of CI-DME and non-CI-DME, and non-DME retinal abnormalities from images obtained using three commercial OCT devices. The system showed excellent performance across diverse study populations in different settings. Our study extends the promise of incorporating OCT into current retinal fundus photography–based DR screening programs as a second-line screening tool, allowing for the efficient and reliable detection of DME, which may lead to reductions in overreferrals and increased clinical use of DL systems.
This article contains supplementary material online at https://doi.org/10.2337/figshare.14710284.
Article Information
Acknowledgments. The authors thank the study participants and staff of the following institutes: Department of Ophthalmology and Visual Sciences, CUHK; Department of Computer Science and Engineering, CUHK; Hong Kong Eye Hospital; Department of Ophthalmology and Visual Sciences, Prince of Wales Hospital; AHNH; United Christian Hospital; Singapore Eye Research Institute; JSIEC, Aier School of Ophthalmology; Byers Eye Institute at Stanford; Department of Ophthalmology, Westmead Institute for Medical Research; and Macquarie University Hearing, Department of Linguistics, Macquarie University.
Funding. This study was funded by the Research Grants Council General Research Fund, Hong Kong (no. 14102418); and Innovation and Technology Fund, Hong Kong (MRP/056/20X); Research to Prevent Blindness; and National Institutes of Health (P30-EY-026877).
The funder had no role in study design, data collection, data analysis, data interpretation, or report writing.
Duality of Interest. No potential conflicts of interest relevant to this article were reported.
Author Contributions. F.T. and C.Y.C. contributed to literature search. F.T., C.P.P., C.C.T., and C.Y.C. contributed to the study design. X.W. developed and validated the DL system, under the supervision of H.C. and P.-A.H. F.T., A.-r.R., C.K.M.C., M.H., W.Y., A.L.Y., J.L., S.S., J.C., F.Y., R.W., Z.T., D.Y., D.S.N., L.J.C., M.B., V.C., K.L., T.H.T.L., G.S.T., D.S.W.T., H.H., H.C., J.H.M., T.L., S.K., S.S.M., R.T.C., G.L., B.G., T.Y.W., S.B.T., and C.Y.C. contributed to data collection. F.T. and A.-r.R. contributed to data analysis. F.T. and X.W. contributed to figure design. F.T., X.W., T.Y.Y.L., P.H.S., and C.Y.C. contributed to data interpretation. The manuscript was critically revised and approved by all authors. C.Y.C. obtained funding. C.Y.C. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.