BACKGROUND

Diabetic macular edema (DME) is the leading cause of vision loss in people with diabetes. Application of artificial intelligence (AI) in interpreting fundus photography (FP) and optical coherence tomography (OCT) images allows prompt detection and intervention.

PURPOSE

To evaluate the performance of AI in detecting DME from FP or OCT images and identify potential factors affecting model performances.

DATA SOURCES

We searched seven electronic libraries up to 12 February 2023.

STUDY SELECTION

We included studies using AI to detect DME from FP or OCT images.

DATA EXTRACTION

We extracted study characteristics and performance parameters.

DATA SYNTHESIS

Fifty-three studies were included in the meta-analysis. FP-based algorithms of 25 studies yielded pooled area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity of 0.964, 92.6%, and 91.1%, respectively. OCT-based algorithms of 28 studies yielded pooled AUROC, sensitivity, and specificity of 0.985, 95.9%, and 97.9%, respectively. Potential factors improving model performance included deep learning techniques, larger size, and more diversity in training data sets. Models demonstrated better performance when validated internally than externally, and those trained with multiple data sets showed better results upon external validation.

LIMITATIONS

Analyses were limited by unstandardized algorithm outcomes and insufficient data in patient demographics, OCT volumetric scans, and external validation.

CONCLUSIONS

This meta-analysis demonstrates satisfactory performance of AI in detecting DME from FP or OCT images. External validation is warranted for future studies to evaluate model generalizability. Further investigations may estimate optimal sample size, effect of class balance, patient demographics, and additional benefits of OCT volumetric scans.

Diabetic retinopathy (DR) is a neurovascular complication of diabetes (1,2). Diabetic macular edema (DME), which can develop at any stage of DR, is the primary cause of irreversible vision loss in people with diabetes (1,3), overtaking proliferative DR as the most frequent cause of visual impairment among people with diabetes in developed countries (4). As blindness from DME is preventable with timely treatment, it is important to precisely identify DME among people with diabetes. Given that the population with diabetes is projected to approach 600 million worldwide by 2035 (5), DME is likely to be responsible for substantial vision loss unless detected earlier and treated adequately in the future.

DR screening programs are currently implemented at the primary care level in many countries using two-dimensional (2D) nonstereoscopic digital fundus photography (FP) using fundus cameras (69). When signs of DME or other sight-threatening DR are identified, patients are referred to ophthalmologists for further clinical examination and management. However, the diagnosis of DME requires identification of retinal thickening, which is a three-dimensional (3D) concept that is difficult to be reliably diagnosed based on 2D FP. Notably, in screening settings, manual interpretation of FP images for DME has been reported with high false-positive rates (e.g., >86% in Hong Kong [10] and >79% in the U.K. [11]), causing unnecessary referral of suspected DME to ophthalmologists and leading to a substantial increase in medical costs and waiting time for patients.

Spectral domain optical coherence tomography (SD-OCT) is a noninvasive imaging modality that provides 3D volumetric scans of the layered retinal structures. It has been widely used as the gold standard for DME diagnosis in clinical settings and clinical trials (12,13), monitoring treatment response, and providing prognostic information. Its role as a screening tool for DME has been investigated in pilot studies (13,14), showing that implementation of OCT into DR screening programs provides a reduction in referrals for diabetic maculopathy by 40% (15). Nevertheless, commercially available OCT devices could be at least three times more expensive than fundus cameras. Therefore, further studies are required to demonstrate the feasibility and cost effectiveness of implementing OCT into DME screening. More importantly, one common and critical issue for using FP or OCT for DR screening is the requirement of professionals to review a tremendous number of images. Taking OCT for example, a volumetric data cube usually comprises >100 images per eye. Manual slice-by-slice assessments are required to ensure that no positive cases are missed. Considering the large population of individuals with diabetes, this is indeed a time- and labor-intensive task.

Artificial intelligence (AI), particularly deep learning, is a major area of research in medical image analysis (16). Recently, deep learning has been making remarkable breakthroughs in the field of DR for revolutionizing DME detection from FP and OCT images in an automated, convenient, and efficient fashion (17,18). Despite the excellent diagnostic performance, several gaps remain. First, whether FP-based AI can obtain satisfactory performance for screening DME has not yet been evaluated comprehensively. Second, although OCT is regarded as the gold standard for DME diagnosis, whether implementing OCT-based AI in DR screening provides significantly better performance requires validation studies. Third, factors that determine the AI’s discriminative performance in detecting DME also require further elucidation.

We conducted a meta-analysis to evaluate the synthesized discriminative performance of AI in detecting DME from FP or OCT images and to identify factors that affect a model’s performance. In addition, on the basis of current literature, we further discuss potential research directions, aiming to facilitate clinical translation of AI models to real-world practice for DR screening.

We conducted a systematic review and meta-analysis to assess the diagnostic performance of AI for detecting DME from FP and OCT images in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis guidelines (19). Institutional review board approval and informed consent were not required, as all data were extracted from past publications. All research adhered to the principles of the Declaration of Helsinki.

Eligibility Criteria

Studies that used AI algorithms to detect DME from FP or OCT macular scans from 1 January 1991 to 12 February 2023 were included in the current study. We excluded records given the following reasons: 1) did not specify patients’ diabetes status; 2) did not provide qualitative outcome of DME (i.e., presence or absence of DME); 3) used traditional computer vision methods instead of AI techniques; 4) did not report sensitivity, specificity, and area under the receiver operating characteristic (AUROC) curve; 5) used ocular images other than FP or OCT macular scan as input; 6) were conference abstracts, ongoing studies, reviews, meta-analyses, comments, editorials, book chapters, theses, non–English-language, and nonhuman studies; and 7) did not have full text available. We also excluded any studies of low quality using the QUADAS-2 tool (20).

Electronic Search Strategy

Two independent reviewers (C.L. and Y.L.W.) conducted the literature search in seven electronic libraries, including PubMed, Embase, Web of Science, Google Scholar, Scopus, the Cochrane Library, and CINAHL, using a hierarchical search strategy with combinations of keywords related to AI (e.g., artificial intelligence, AI, machine learning, deep learning, automated detection), the target condition (e.g., DME), and image modalities (e.g., FP, OCT) and Medical Subject Headings terms, as appropriate. Full details of the search strategy for each database are described in Supplementary Table 1.

Study Selection and Quality Assessment

We ran our search strategy through the electronic libraries and collected all search results using EndNote. Duplicates were removed before selection. The selection process was in three phases. First, two reviewers (C.L. and Y.L.W.) independently screened the titles and abstracts of the records and identified studies related to the topic. Second, the two reviewers (C.L. and Y.L.W.) independently assessed the full texts and excluded articles that met the exclusion criteria. Third, we evaluated the quality of included studies using the QUADAS-2 tool and eliminated studies of low quality. Each study was evaluated for risk of bias and applicability following four key domains: patient selection, index test, reference standard, and flow and timing. Reference lists of the selected studies were manually searched and screened. During the entire process, discrepancies between the two reviewers were resolved by discussions with a senior reviewer (Z.T.).

Data Collection

Relevant data were extracted from the included studies, including 1) study characteristics (author names, year of publication, and country), 2) data set characteristics (source of database, imaging modality and device, imaging protocol, image resolution, number of participants, eyes and images, number of participants and eyes with DME, number and experience of graders, and DME definition [i.e., FP, OCT, ophthalmoscopy]); 3) algorithm characteristics (types of networks; data splitting and data distribution in training, testing, and validation sets; and outcomes), and 4) performance metrics (AUROC; sensitivity, recall, or true-positive rate; specificity or true-negative rate; false-positive rate; accuracy; false-negative rate; precision or positive predictive value; negative predictive value; and other reported performance parameters).

Data Synthesis and Statistical Analysis

We constructed a 2 × 2 contingency table for each study in RevMan (version 5.4.1) with the built-in calculator using the collected study characteristics and performance parameters. We imported the 2 × 2 contingency tables into R (version 4.2.0) and used the mada package (21) and VassarStats (22) to perform statistical analyses. To demonstrate the discriminative performance of FP- and OCT-based AI for DME detection, respectively, we calculated the pooled AUROC using a hierarchical model. Pooled sensitivity and specificity were calculated by bivariate random-effects model to measure the AI’s ability to identify DME-positive or DME-negative eyes. Bivariate analysis produces informative summary measures in diagnostic reviews. As a single indicator, the diagnostic odds ratio (DOR) is used to evaluate how much greater the odds of a positive diagnosis of DME is for patients with test-positive versus test-negative results, providing a comprehensible overview of the models’ overall diagnostic performance (23). We generated heterogeneity parameters, forest plots, and summary receiver operating characteristic (SROC) curves. Subgroup analyses were performed according to the type of AI (machine learning vs. deep learning), data set size (smaller than median vs. larger than median), data set diversity (single vs. multiple), and testing sets (internal vs. external). Small developmental data set size was defined as smaller than the median number of images among studies included in this meta-analysis, and multiple training data sets were defined as data from different institutions or with systematically different population characteristics (18). We also created funnel plots to assess for any publication bias in our meta-analysis.

Study Selection

The selection process for 53 included studies is shown in Fig. 1. Initially, 4,776 search results were identified, and 1,338 duplicates were removed before selection. We screened the titles and abstracts of the 3,438 records, and 2,796 were excluded because of irrelevant topics. Of 642 studies for full-text review, 589 were excluded according to the exclusion criteria. We assessed the quality of 53 studies, and none were excluded for low quality (Supplementary Fig. 1). Finally, 53 studies were included in the meta-analysis (full citations provided in the Supplementary Material).

Figure 1

Flow diagram for literature selection. AUC, area under the curve; DM, diabetes mellitus.

Figure 1

Flow diagram for literature selection. AUC, area under the curve; DM, diabetes mellitus.

Close modal

Study Characteristics

Basic characteristics of the included studies are presented in Table 1. Among the 53 studies, 25 used FP as input images, while 28 used OCT B scans, among which 2 studies also included volumetric scans as input data for algorithm development. For FP studies, 8 developed machine learning algorithms, and 17 developed deep learning algorithms. For OCT studies, 3 studies developed machine learning algorithms, and 25 developed deep learning algorithms. Heidelberg Spectralis SD-OCT was most commonly adopted (21 studies), while Cirrus HD-OCT, Cirrus SD-OCT, Topcon Triton OCT, Topcon 1000 SD-OCT, Topcon 3D OCT-1 Maestro, and Optovue RTVue-XR Avanti were also adopted in other studies. Regarding publicly available data sets for training, testing, and validation, the Methods to Evaluate Segmentation and Indexing Techniques in the Field of Retinal Ophthalmology (MESSIDOR) and Kermany et al. (24) data sets were the most common for FP and OCT studies, respectively.

Table 1

Study characteristics

StudyYearTraining databaseImaging modality (device)Imaging protocolImage resolution, pixelsTotal imagesDME imagesOutcomeType of network (best result)AUROC
FP           
 Agurto et al. 2011 RIST Topcon TRC-50EX 45° mydriatic images centered on the macula, optic disc, and superior temporal region of the retina 1,888 × 2,224 238 174 Yes/no CSME ML PLS regression classifier 0.980 
   UTHSCSA Canon CF-60UV 45° mydriatic images centered on the macula, optic disc, and superior temporal region of the retina 2,048 × 2,392 323 207 Yes/no CSME ML PLS regression classifier 0.970 
 Akram et al. 2014 HEI-MED Not specified Not specified Not specified 169 54 Normal/CSME Hybrid GMM SVM 0.940 
   MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic centered between the macula and the disc 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 1,200 229 Normal/Non-CSME/CSME Hybrid GMM SVM 0.970 
 Bressler et al. 2022 EyePACS Topcon, Canon, CenterVue, Crystalvue, Zeiss, unassigned 45° mydriatic and nonmydriatic images centered on the macula and between the macula and the disc Not specified 32,049 15,595 Yes/no DME DL neural network 0.954 (EyePACS), 0.971 (MESSIDOR-2) 
 Chalakkal et al. 2021 MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic images centered between the macula and disc 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 1,187 150 Non-CSME/CSME CNN ResNet-50 0.962** 
   UoA-DR Zeiss VISUCAM 500 45° centered on macular and optic disc 2,124 × 2,056 200 74 Non-CSME/CSME CNN ResNet-50  
   IDRiD Kowa VX-10α 50° nonmydriatic images centered between the macula and the disc 4,288 × 2,848 516 243 Non-CSME/CSME CNN ResNet-50  
 Dai et al. 2021 SIM Canon CR-1 Mark II/CR-2, Topcon TRC-NW200, Zeiss VISUCAM 200 45° nonmydriatic images centered on optic disc and macula Not specified 666,383 3,926 Yes/no DME ResNet, Mask R-CNN 0.946 
 Deepak et al. 2012 HEI-MED Not specified Not specified Not specified 122 54 Normal/Hard Exudate (moderate/ severe) Gaussian data description, PCA data description 0.990 (DMED), 0.960 (MESSIDOR) 
 Gulshan et al. 2019 Aravind Eye Hospital/Sankara Nethralaya Topcon NM TRC, Forus 3nethra 40°–45° nonmydriatic images centered on the macula Not specified 140,853 20,002 Yes/no RDME* CNN Inception v4 0.984 
 He et al. 2019 IDRiD Topcon TRC-NW6 50° nonmydriatic images centered between the macula and the disc 4,288 × 2,848 516 284 0, 1, 2# CNN VGG-16, XGBoost classifier 0.964 
   MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic centered between the macula and the disc 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 1,200 226 0, 1, 2# CNN VGG-16, XGBoost classifier 0.982 
 Li et al. 2021 Shanghai First People’s Hospital, MESSIDOR-2 Shanghai: not specified
MESSIDOR-2: Topcon TRC-NW6 
Shanghai: 45° centered between the macula and the disc
MESSIDOR-2: 45° nonmydriatic images centered on the macula 
Shanghai: 1,488 × 1,488
MESSIDOR-2: 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 
45,806 5,158 NRDME/RDME CNN Inception v4 Shanghai: 0.994
MESSIDOR: 0.948 
 Li et al. 2020 MESSIDOR and 2018 ISBI IDRiD challenge data set MESSIDOR: Topcon TRC-NW6
IDRiD: Kowa VX-10α 
MESSIDOR: 45° mydriatic and nonmydriatic centered between the macula and the disc
IDRiD: 50° nonmydriatic images centered between the macula and the disc 
MESSIDOR: 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536
IDRiD: 4,288 × 2,848 
1,716 545 0, 1, 2# CNN ResNet-50 0.942 
 Li et al. 2018 LabelMe Topcon, Canon, CenterVue, Heidelberg Not specified 2,480 × 3,280, 576 × 768, 1,900 × 2,285, 1,958 × 2,588, 1,900 × 2,265, 1,956 × 2,448, and 1,944 × 2,464 71,043 14,598 Yes/no DME CNN Inception v3 0.986 
 Liu et al. 2022 Lerdsin and Rajavithi Hospital, Thailand; Moorfields Eye Hospital, U.K.; Alameda County Health System, U.S. Topcon DRI OCT Triton, Kowa VX-10, Canon CR-DGi CFP 45° centered between the macula and the disc Not specified 1,167,791 852,437 Yes/no thickness-based, IRF-based CI-DME, or thickness-based DME Deep CNN trained on TensorFlow 0.860–0.960 
 Mo et al. 2018 HEI-MED Not specified Not specified 2,196 × 1,958 169 54 Yes/no DME Convolutional residual network 0.971** 
   E-Ophtha EX Not specified 45° centered on the macula and between the macula and the disc 1,440 × 960 to 2,544 × 1,696 82 47 Yes/no DME Convolutional residual network  
 Mookiah et al. 2015 MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic centered between the macula and the disc 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 300 185 No DME, Non-CSME, CSME NB SVM-linear 0.969 
   Kasturba Medical College Topcon TRC-NW200 45° FOV 480 × 382 300 200 No DME, Non-CSME, CSME NB SVM-linear 0.975 
 Rajput et al. 2020 MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic centered between the macula and the disc 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 94 Not specified Yes/no DME Color edge detection and mathematical morphology 0.971 
 Raumviboonsuk et al. 2019 Nationwide screening program in Thailand 3nethra; Canon CR2; Kowa VX-10, VX-20; Nonmyd 7, Nonmyd WD, Nonmyd α-DIII 8300; Nidek AFC-210, AFC-230, AFC-300; Topcon TRC-NW8; Zeiss VISUCAM 200 45° centered on the macula 779 × 779 29,985 1,868 Yes/no DME CNN Inception v4 0.993 
 Sahlsten et al. 2019 Digifundus Ltd. Canon CR2 45° mydriatic images centered on the macula and optic disc 3,888 × 2,592 to 5,184 × 3,456 35,630 5,536 NRDME/RDME CNN Inception v3 0.992 
 Singh et al. 2020 IDRiD Kowa VX-10α 50° centered on the posterior pole 4,288 × 2,848 516 284 0, 1, 2# HE-CNN 0.965 
   MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic centered between the macula and the disc 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 1,200 226 0, 1, 2# HE-CNN 0.965 
 Stevenson et al. 2019 MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic centered between the macula and the disc 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 2,283 226 Normal/DME (/AMD, /DR, /RVO, /glaucoma) CNN Inception v3 0.746 
 Sundaresan et al. 2015 Local hospital Not specified Not specified Not specified 181 Not specified M0, M1, M2^ GMM 0.950 
 Tariq et al. 2012 MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic centered between the macula and the disc 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 1,200 226 Healthy, non-CSME, CSME SVM 0.967 
   STARE Topcon TRV-50 35° FOV with varying imaging settings 700 × 605 81 50 Healthy, non-CSME, CSME SVM 0.973 
 Tariq et al. 2013 MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic centered between the macula and the disc 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 1,200 226 Healthy, non-CSME, CSME Filter bank and GMM 0.961 
   STARE Topcon TRV-50 35° FOV with varying imaging settings 700 × 605 81 50 Healthy, non-CSME, CSME Filter bank and GMM 0.976 
 Varadarajan et al. 2020 Thailand Rajavithi Hospital Kowa VX-10 50° centered on the macula 4,288 × 2,848 7,072 1,990 Yes/no CI-DME CNN Inception v3 0.890 (Thailand), 0.840 (EyePACS) 
 Wang et al. 2022 3 Taiwan medical centers Zeiss VISUCAM 200; Nidek AFC-330; Canon CF-1, CR-DGI, CR2, CR2-AF 45° macula-centered on macula and between the macula and optic disc 724 × 722 to 4,288 × 2,848 35,001 14,001 DME/non-DME EfficientDet-D1, bidirectional feature pyramid network 0.981 (Taiwan), 0.952 (MESSIDOR-1), 0.958 (MESSIDOR-2) 
 Yu et al. 2022 MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic centered between the macula and the disc 1,440 × 960, 2,240 × 1,488, 2,304 × 1,536, 4,288 × 2,848, 1,716 462 Yes/no DME CNN + residual attention network 0.882 
   IDRiD Kowa VX-10α 50° centered on posterior pole and 4,288 × 2,848 516 284 Yes/no DME CNN + residual attention network 0.772 
OCT           
 Ai et al. 2022 Kermany data set Heidelberg Spectralis Not specified 256 × 256 62,488/2,000 (limited) 11,348/1,000 (limited) Normal/DME Inception v3 /Inception-ResNet v2 /Xception + convolutional block attention mechanism 0.773–1.000 
 Alqudah et al. 2020 Kermany data set Heidelberg Spectralis Not specified 256 × 256 62,489 11,599 Normal/DME CNN 19 layer 1.000 
 Altan et al. 2021 Kermany data set, SERI Heidelberg Spectralis, Cirrus SD-OCT Not specified 512 × 1,024 66,585 13,397 Normal/DME Lightweight CNN DeepOCT 0.983 
 Bhatia et al. 2020 Noor Eye Hospital Heidelberg SD-OCT Not specified 224 × 224 100 volumes 50 volumes Normal/DME CNN VGG16, 21 layers 0.980** 
   Ophthalmology and Microsurgery Institute Heidelberg SD-OCT Not specified 224 × 224 50 volumes 25 volumes Normal/DME CNN VGG16, 21 layers  
 Das et al. 2019 Kermany data set Heidelberg SD-OCT Not specified 256 × 256 38,163 11,598 Normal/DME CNN multiscale deep feature fusion-based classifier 0.990 
 Dash et al. 2018 Merry Eye Care, Puducherry Not specified Not specified Not specified 150 90 Normal/DME SVM 0.980 
 Fang et al. 2019 Kermany data set Heidelberg Spectralis Not specified 224 × 224 38,163 11,598 Normal/DME LACNN Inception v3 0.974 
 Hassan et al. 2020 Kermany data set Heidelberg Spectralis Not specified 256 × 256 62,489 11,599 Healthy/DME Recurrent Residual Inception Network 0.986 
 Hecht et al. 2019 Munk and Israel data set combined Heidelberg Spectralis Centered on macula, including B scan (a minimum of 10 frames 1,563 × 1,563 153 96 DME/PCME Decision tree 0.937 
 Hussain et al. 2018 Duke, CERA, NYU, Tian Heidelberg SD-OCT, EDI-OCT Not specified 512 × 1,024 11,662 2,940 Normal/DME (/AMD) Random forest 0.990 
 Hwang et al. 2020 Taipei Veterans General Hospital Cirrus HD-OCT 4000, Optovue RTVue XR Avanti Not specified 3,499 × 2,329, 2,474 × 2,777, or 948 × 879 3,495 Not specified DME/non-DME CNN (MobileNet, using Sigmoid Cross-Entropy and RMSprop) 0.960 
 Joshi et al. 2020 Kermany data set Heidelberg Spectralis Not specified 256 × 256 16,440 7,118 Normal/DME CNN 0.990 
 Kermany et al. 2018 Kermany data set Heidelberg Spectralis Not specified 256 × 256 62,489 11,599 Healthy/DME Inception v3 0.999 
 Khaothanthong et al. 2023 Rajavithi Hospital Heidelberg Spectralis Centered on macula, including radial scans from six lines per eye Not specified 6,356 1,455 Yes/no DME CNN ResNet-50, RelayNet/graph cut 0.980 
 Li et al. 2019 Kermany data set Heidelberg Spectralis Not specified 256 × 256 62,489 11,599 Normal/DME VGG16 0.999 
 Li et al. 2019 Shanghai Zhongshan Hospital and the Shanghai First People’s Hospital Heidelberg Spectralis Not specified Not specified 9,674 3,238 Normal/DME CNN multi-ResNet-50 ensembling 0.996 
 Liu et al. 2022 Multiple centers Topcon 3D OCT-1 Maestro 6 mm × 6 mm Not specified >20,000 Not specified Yes/no ME R-CNN 0.944 
 Perdomo et al. 2018 SERI Cirrus SD-OCT Not specified 1,024 × 512 4,096 2,048 Normal/DME VGG16 0.927 
 Perdomo et al. 2019 SERI + CUHK Cirrus SD-OCT Not specified 1,024 × 512 9,600 5,352 Normal/DME VGG inspired 0.860 
 Rajagopalan et al. 2021 Kermany data set Heidelberg Spectralis Not specified 224 × 224 6,000 3,000 Normal/DME CNN 0.960 
 Rasti et al. 2018 Not specified Topcon 1000 SD-OCT Not specified 650 × 512 7,680 3,840 Normal/DME Wavelet-based CNN random forests 0.993 
   Duke data set Heidelberg SD-OCT Not specified 512 × 496, 768 × 496 30 volumes 15 volumes Normal/DME Wavelet-based CNN random forests 0.993 
 Rastogi et al. 2019 Kermany data set Heidelberg Spectralis Not specified 128 × 128 62,489 11,599 Normal/DME DenseNet 0.992 
 Saraiva et al. 2020 Kermany data set Heidelberg Spectralis Not specified 150 × 150 62,489 11,599 Normal/DME CNN 0.990 
 Tang et al. 2021 CUHK-STDR Cirrus HD-OCT 6 mm × 6mm 1,024 × 512 × 128 3,788 volumes 1,208 volumes No DME/non-CI-DME/CI-DME CNN ResNet-34 0.964 
   CUHK-STDR Heidelberg Spectralis 6.3 mm × 6.3 mm and 6.5 mm × 4.9 mm 1,024 × 25, 1,024 × 19 30,515 5,542 No DME/non-CI-DME/CI-DME CNN ResNet-18 0.846 
   CUHK-STDR Topcon Triton OCT Radial 9 mm × 30° 1,024 × 12 39,443 7,804 No DME/non-CI-DME/CI-DME CNN ResNet-18 0.935 
 Togacar et al. 2022 Kermany data set, Duke data set, Noor data set Spectralis SD-OCT Not specified Not specified 91,969 13,803 Normal/DME 9 CNN models 1.000 
 Wang et al. 2020 Duke data set Spectralis SD-OCT Not specified 224 × 224 1,920 522 Normal/DME CNN VGG16 0.980 
 Wang et al. 2023 CUHK Eye Center Triton, Spectralis Radial 9 mm × 30° 1,024 × 992 × 12, 1,024 × 496 × 25 69,491 B scans, 4,644 volumes 2,910 volumes Yes/no DME Deep semisupervised multiple instance learning 0.934 and 0.963 for B scan; 0.926 and 0.950 for volume 
 Xu et al. 2021 Noor Eye Hospital (Tehran), Kermany data set Spectralis SD-OCT Not specified Not specified 87,738 9,720 Normal/DME Multibranch hybrid attention network 0.970 and 1.000, respectively 
StudyYearTraining databaseImaging modality (device)Imaging protocolImage resolution, pixelsTotal imagesDME imagesOutcomeType of network (best result)AUROC
FP           
 Agurto et al. 2011 RIST Topcon TRC-50EX 45° mydriatic images centered on the macula, optic disc, and superior temporal region of the retina 1,888 × 2,224 238 174 Yes/no CSME ML PLS regression classifier 0.980 
   UTHSCSA Canon CF-60UV 45° mydriatic images centered on the macula, optic disc, and superior temporal region of the retina 2,048 × 2,392 323 207 Yes/no CSME ML PLS regression classifier 0.970 
 Akram et al. 2014 HEI-MED Not specified Not specified Not specified 169 54 Normal/CSME Hybrid GMM SVM 0.940 
   MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic centered between the macula and the disc 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 1,200 229 Normal/Non-CSME/CSME Hybrid GMM SVM 0.970 
 Bressler et al. 2022 EyePACS Topcon, Canon, CenterVue, Crystalvue, Zeiss, unassigned 45° mydriatic and nonmydriatic images centered on the macula and between the macula and the disc Not specified 32,049 15,595 Yes/no DME DL neural network 0.954 (EyePACS), 0.971 (MESSIDOR-2) 
 Chalakkal et al. 2021 MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic images centered between the macula and disc 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 1,187 150 Non-CSME/CSME CNN ResNet-50 0.962** 
   UoA-DR Zeiss VISUCAM 500 45° centered on macular and optic disc 2,124 × 2,056 200 74 Non-CSME/CSME CNN ResNet-50  
   IDRiD Kowa VX-10α 50° nonmydriatic images centered between the macula and the disc 4,288 × 2,848 516 243 Non-CSME/CSME CNN ResNet-50  
 Dai et al. 2021 SIM Canon CR-1 Mark II/CR-2, Topcon TRC-NW200, Zeiss VISUCAM 200 45° nonmydriatic images centered on optic disc and macula Not specified 666,383 3,926 Yes/no DME ResNet, Mask R-CNN 0.946 
 Deepak et al. 2012 HEI-MED Not specified Not specified Not specified 122 54 Normal/Hard Exudate (moderate/ severe) Gaussian data description, PCA data description 0.990 (DMED), 0.960 (MESSIDOR) 
 Gulshan et al. 2019 Aravind Eye Hospital/Sankara Nethralaya Topcon NM TRC, Forus 3nethra 40°–45° nonmydriatic images centered on the macula Not specified 140,853 20,002 Yes/no RDME* CNN Inception v4 0.984 
 He et al. 2019 IDRiD Topcon TRC-NW6 50° nonmydriatic images centered between the macula and the disc 4,288 × 2,848 516 284 0, 1, 2# CNN VGG-16, XGBoost classifier 0.964 
   MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic centered between the macula and the disc 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 1,200 226 0, 1, 2# CNN VGG-16, XGBoost classifier 0.982 
 Li et al. 2021 Shanghai First People’s Hospital, MESSIDOR-2 Shanghai: not specified
MESSIDOR-2: Topcon TRC-NW6 
Shanghai: 45° centered between the macula and the disc
MESSIDOR-2: 45° nonmydriatic images centered on the macula 
Shanghai: 1,488 × 1,488
MESSIDOR-2: 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 
45,806 5,158 NRDME/RDME CNN Inception v4 Shanghai: 0.994
MESSIDOR: 0.948 
 Li et al. 2020 MESSIDOR and 2018 ISBI IDRiD challenge data set MESSIDOR: Topcon TRC-NW6
IDRiD: Kowa VX-10α 
MESSIDOR: 45° mydriatic and nonmydriatic centered between the macula and the disc
IDRiD: 50° nonmydriatic images centered between the macula and the disc 
MESSIDOR: 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536
IDRiD: 4,288 × 2,848 
1,716 545 0, 1, 2# CNN ResNet-50 0.942 
 Li et al. 2018 LabelMe Topcon, Canon, CenterVue, Heidelberg Not specified 2,480 × 3,280, 576 × 768, 1,900 × 2,285, 1,958 × 2,588, 1,900 × 2,265, 1,956 × 2,448, and 1,944 × 2,464 71,043 14,598 Yes/no DME CNN Inception v3 0.986 
 Liu et al. 2022 Lerdsin and Rajavithi Hospital, Thailand; Moorfields Eye Hospital, U.K.; Alameda County Health System, U.S. Topcon DRI OCT Triton, Kowa VX-10, Canon CR-DGi CFP 45° centered between the macula and the disc Not specified 1,167,791 852,437 Yes/no thickness-based, IRF-based CI-DME, or thickness-based DME Deep CNN trained on TensorFlow 0.860–0.960 
 Mo et al. 2018 HEI-MED Not specified Not specified 2,196 × 1,958 169 54 Yes/no DME Convolutional residual network 0.971** 
   E-Ophtha EX Not specified 45° centered on the macula and between the macula and the disc 1,440 × 960 to 2,544 × 1,696 82 47 Yes/no DME Convolutional residual network  
 Mookiah et al. 2015 MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic centered between the macula and the disc 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 300 185 No DME, Non-CSME, CSME NB SVM-linear 0.969 
   Kasturba Medical College Topcon TRC-NW200 45° FOV 480 × 382 300 200 No DME, Non-CSME, CSME NB SVM-linear 0.975 
 Rajput et al. 2020 MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic centered between the macula and the disc 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 94 Not specified Yes/no DME Color edge detection and mathematical morphology 0.971 
 Raumviboonsuk et al. 2019 Nationwide screening program in Thailand 3nethra; Canon CR2; Kowa VX-10, VX-20; Nonmyd 7, Nonmyd WD, Nonmyd α-DIII 8300; Nidek AFC-210, AFC-230, AFC-300; Topcon TRC-NW8; Zeiss VISUCAM 200 45° centered on the macula 779 × 779 29,985 1,868 Yes/no DME CNN Inception v4 0.993 
 Sahlsten et al. 2019 Digifundus Ltd. Canon CR2 45° mydriatic images centered on the macula and optic disc 3,888 × 2,592 to 5,184 × 3,456 35,630 5,536 NRDME/RDME CNN Inception v3 0.992 
 Singh et al. 2020 IDRiD Kowa VX-10α 50° centered on the posterior pole 4,288 × 2,848 516 284 0, 1, 2# HE-CNN 0.965 
   MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic centered between the macula and the disc 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 1,200 226 0, 1, 2# HE-CNN 0.965 
 Stevenson et al. 2019 MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic centered between the macula and the disc 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 2,283 226 Normal/DME (/AMD, /DR, /RVO, /glaucoma) CNN Inception v3 0.746 
 Sundaresan et al. 2015 Local hospital Not specified Not specified Not specified 181 Not specified M0, M1, M2^ GMM 0.950 
 Tariq et al. 2012 MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic centered between the macula and the disc 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 1,200 226 Healthy, non-CSME, CSME SVM 0.967 
   STARE Topcon TRV-50 35° FOV with varying imaging settings 700 × 605 81 50 Healthy, non-CSME, CSME SVM 0.973 
 Tariq et al. 2013 MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic centered between the macula and the disc 1,440 × 960, 2,240 × 1,488, and 2,304 × 1,536 1,200 226 Healthy, non-CSME, CSME Filter bank and GMM 0.961 
   STARE Topcon TRV-50 35° FOV with varying imaging settings 700 × 605 81 50 Healthy, non-CSME, CSME Filter bank and GMM 0.976 
 Varadarajan et al. 2020 Thailand Rajavithi Hospital Kowa VX-10 50° centered on the macula 4,288 × 2,848 7,072 1,990 Yes/no CI-DME CNN Inception v3 0.890 (Thailand), 0.840 (EyePACS) 
 Wang et al. 2022 3 Taiwan medical centers Zeiss VISUCAM 200; Nidek AFC-330; Canon CF-1, CR-DGI, CR2, CR2-AF 45° macula-centered on macula and between the macula and optic disc 724 × 722 to 4,288 × 2,848 35,001 14,001 DME/non-DME EfficientDet-D1, bidirectional feature pyramid network 0.981 (Taiwan), 0.952 (MESSIDOR-1), 0.958 (MESSIDOR-2) 
 Yu et al. 2022 MESSIDOR Topcon TRC-NW6 45° mydriatic and nonmydriatic centered between the macula and the disc 1,440 × 960, 2,240 × 1,488, 2,304 × 1,536, 4,288 × 2,848, 1,716 462 Yes/no DME CNN + residual attention network 0.882 
   IDRiD Kowa VX-10α 50° centered on posterior pole and 4,288 × 2,848 516 284 Yes/no DME CNN + residual attention network 0.772 
OCT           
 Ai et al. 2022 Kermany data set Heidelberg Spectralis Not specified 256 × 256 62,488/2,000 (limited) 11,348/1,000 (limited) Normal/DME Inception v3 /Inception-ResNet v2 /Xception + convolutional block attention mechanism 0.773–1.000 
 Alqudah et al. 2020 Kermany data set Heidelberg Spectralis Not specified 256 × 256 62,489 11,599 Normal/DME CNN 19 layer 1.000 
 Altan et al. 2021 Kermany data set, SERI Heidelberg Spectralis, Cirrus SD-OCT Not specified 512 × 1,024 66,585 13,397 Normal/DME Lightweight CNN DeepOCT 0.983 
 Bhatia et al. 2020 Noor Eye Hospital Heidelberg SD-OCT Not specified 224 × 224 100 volumes 50 volumes Normal/DME CNN VGG16, 21 layers 0.980** 
   Ophthalmology and Microsurgery Institute Heidelberg SD-OCT Not specified 224 × 224 50 volumes 25 volumes Normal/DME CNN VGG16, 21 layers  
 Das et al. 2019 Kermany data set Heidelberg SD-OCT Not specified 256 × 256 38,163 11,598 Normal/DME CNN multiscale deep feature fusion-based classifier 0.990 
 Dash et al. 2018 Merry Eye Care, Puducherry Not specified Not specified Not specified 150 90 Normal/DME SVM 0.980 
 Fang et al. 2019 Kermany data set Heidelberg Spectralis Not specified 224 × 224 38,163 11,598 Normal/DME LACNN Inception v3 0.974 
 Hassan et al. 2020 Kermany data set Heidelberg Spectralis Not specified 256 × 256 62,489 11,599 Healthy/DME Recurrent Residual Inception Network 0.986 
 Hecht et al. 2019 Munk and Israel data set combined Heidelberg Spectralis Centered on macula, including B scan (a minimum of 10 frames 1,563 × 1,563 153 96 DME/PCME Decision tree 0.937 
 Hussain et al. 2018 Duke, CERA, NYU, Tian Heidelberg SD-OCT, EDI-OCT Not specified 512 × 1,024 11,662 2,940 Normal/DME (/AMD) Random forest 0.990 
 Hwang et al. 2020 Taipei Veterans General Hospital Cirrus HD-OCT 4000, Optovue RTVue XR Avanti Not specified 3,499 × 2,329, 2,474 × 2,777, or 948 × 879 3,495 Not specified DME/non-DME CNN (MobileNet, using Sigmoid Cross-Entropy and RMSprop) 0.960 
 Joshi et al. 2020 Kermany data set Heidelberg Spectralis Not specified 256 × 256 16,440 7,118 Normal/DME CNN 0.990 
 Kermany et al. 2018 Kermany data set Heidelberg Spectralis Not specified 256 × 256 62,489 11,599 Healthy/DME Inception v3 0.999 
 Khaothanthong et al. 2023 Rajavithi Hospital Heidelberg Spectralis Centered on macula, including radial scans from six lines per eye Not specified 6,356 1,455 Yes/no DME CNN ResNet-50, RelayNet/graph cut 0.980 
 Li et al. 2019 Kermany data set Heidelberg Spectralis Not specified 256 × 256 62,489 11,599 Normal/DME VGG16 0.999 
 Li et al. 2019 Shanghai Zhongshan Hospital and the Shanghai First People’s Hospital Heidelberg Spectralis Not specified Not specified 9,674 3,238 Normal/DME CNN multi-ResNet-50 ensembling 0.996 
 Liu et al. 2022 Multiple centers Topcon 3D OCT-1 Maestro 6 mm × 6 mm Not specified >20,000 Not specified Yes/no ME R-CNN 0.944 
 Perdomo et al. 2018 SERI Cirrus SD-OCT Not specified 1,024 × 512 4,096 2,048 Normal/DME VGG16 0.927 
 Perdomo et al. 2019 SERI + CUHK Cirrus SD-OCT Not specified 1,024 × 512 9,600 5,352 Normal/DME VGG inspired 0.860 
 Rajagopalan et al. 2021 Kermany data set Heidelberg Spectralis Not specified 224 × 224 6,000 3,000 Normal/DME CNN 0.960 
 Rasti et al. 2018 Not specified Topcon 1000 SD-OCT Not specified 650 × 512 7,680 3,840 Normal/DME Wavelet-based CNN random forests 0.993 
   Duke data set Heidelberg SD-OCT Not specified 512 × 496, 768 × 496 30 volumes 15 volumes Normal/DME Wavelet-based CNN random forests 0.993 
 Rastogi et al. 2019 Kermany data set Heidelberg Spectralis Not specified 128 × 128 62,489 11,599 Normal/DME DenseNet 0.992 
 Saraiva et al. 2020 Kermany data set Heidelberg Spectralis Not specified 150 × 150 62,489 11,599 Normal/DME CNN 0.990 
 Tang et al. 2021 CUHK-STDR Cirrus HD-OCT 6 mm × 6mm 1,024 × 512 × 128 3,788 volumes 1,208 volumes No DME/non-CI-DME/CI-DME CNN ResNet-34 0.964 
   CUHK-STDR Heidelberg Spectralis 6.3 mm × 6.3 mm and 6.5 mm × 4.9 mm 1,024 × 25, 1,024 × 19 30,515 5,542 No DME/non-CI-DME/CI-DME CNN ResNet-18 0.846 
   CUHK-STDR Topcon Triton OCT Radial 9 mm × 30° 1,024 × 12 39,443 7,804 No DME/non-CI-DME/CI-DME CNN ResNet-18 0.935 
 Togacar et al. 2022 Kermany data set, Duke data set, Noor data set Spectralis SD-OCT Not specified Not specified 91,969 13,803 Normal/DME 9 CNN models 1.000 
 Wang et al. 2020 Duke data set Spectralis SD-OCT Not specified 224 × 224 1,920 522 Normal/DME CNN VGG16 0.980 
 Wang et al. 2023 CUHK Eye Center Triton, Spectralis Radial 9 mm × 30° 1,024 × 992 × 12, 1,024 × 496 × 25 69,491 B scans, 4,644 volumes 2,910 volumes Yes/no DME Deep semisupervised multiple instance learning 0.934 and 0.963 for B scan; 0.926 and 0.950 for volume 
 Xu et al. 2021 Noor Eye Hospital (Tehran), Kermany data set Spectralis SD-OCT Not specified Not specified 87,738 9,720 Normal/DME Multibranch hybrid attention network 0.970 and 1.000, respectively 

For the full citation of each study, see the Supplementary Material. AMD, age-related macular degeneration; CERA, Center for Eye Research Australia; CI-DME, center-involved diabetic macular edema; CNN, convolutional neural network; CUHK; Chinese University of Hong Kong; DL, deep learning; EDI-OCT, enhanced depth imaging optical coherence tomography; EyePACS, Picture Archive Communication System for Eye Care; FOV, field of view; GMM, Gaussian mixture model; HE-CNN, hierarchical ensemble of convolutional neural networks; HEI-MED, Hamilton Eye Institute Macular Edema Dataset (formerly DMED); IDRiD, Indian Diabetic Retinopathy Image Dataset; IRF, intraretinal fluid; ISBI, International Symposium on Biomedical Imaging; LACNN, lesion-aware convolutional neural network; ML, machine learning; NA, not available; NB, naive Bayes; NRDME, nonreferable diabetic macular edema; NYU, New York University; PCA, principal component analysis; PCME, pseudophakic cystoid macular edema; PLS, partial least squares; R-CNN, region-based convolutional neural network; RDME, referable diabetic macular edema; RIST, Retina Institute of South Texas; RVO, retinal vein occlusion; SERI, Singapore Eye Research Institute; SIM, Shanghai Integration Model; STARE, Structured Analysis of the Retina; STDR, sight-threatening diabetic retinopathy; SVM, support vector machine; UoA-DR, University of Auckland Diabetic Retinopathy; UTHSCSA, The University of Texas Health Science Center at San Antonio.

*

RDME is defined as hard exudates within 1 disc diameter of the macula.

#

Grade 0 is defined as no visible hard exudate, grade 1 as exudate distance >1 papilla diameter, grade 2 as exudate distance ≤1 papilla diameter.

Other pathologies detected by the models are listed in brackets.

^

M0 (nil DME) is defined as no visible hard exudate, M1 (observable DME) as the presence of hard exudate within a circular zone of 3 optic disc diameter around the macula, and M2 (RDME) as the presence of hard exudate within circular zone of 1 optic disc diameter around the macula.

**

AUROC based on data sets for all studies combined.

Performance of AI in DME Detection

Detailed statistical results are shown in Table 2. FP-based AI of 25 studies yielded a pooled AUROC of 0.964 (95% CI 0.964–0.964) with a sensitivity of 92.6% (95% CI 90.2–94.4%), specificity of 91.1% (95% CI 88.0–93.4%), and DOR of 147.3 (95% CI 94.0–230.8), while OCT-based AI of 28 studies yielded a pooled AUROC of 0.985 (95% CI 0.985–0.985) with a sensitivity of 95.9% (95% CI 94.1–97.2%), specificity of 97.9% (95% CI 96.6–98.6%), and DOR of 1,154.6 (95% CI 691.0-1,929.1). The overall pooled AUROC, sensitivity, specificity, and DOR of the 53 studies was 0.979 (95% CI 0.979–0.979), 94.6% (95% CI 93.1–95.7%), 95.8% (95% CI 94.3–96.9%), and 456.1 (95% CI 311.9–667.1), respectively. Forest plots and SROC curves of FP and OCT studies are shown in Fig. 2A–D.

Table 2

Statistical results of AI performance using FP and OCT images with subgroup analyses

Studies (validation data sets)Pooled AUROC (95% CI)PPooled sensitivity, % (95% CI)Pooled specificity, % (95% CI)DOR (95% CI)
Overall 53 (101) 0.979 (0.979–0.979)  94.6 (93.1–95.7) 95.8 (94.3–96.9) 456.1 (311.9–667.1) 
 FP 25 (45) 0.964 (0.964–0.964) <0.001 92.6 (90.2–94.4) 91.1 (88.0–93.4) 147.3 (94.0–230.8) 
 OCT 28 (56) 0.985 (0.985–0.985)  95.9 (94.1–97.2) 97.9 (96.6–98.6) 1,154.6 (691.0–1,929.1) 
Type of AI       
 Machine learning 11 (16) 0.966 (0.966–0.966) <0.001 96.7 (95.7–97.4) 94.5 (89.2–97.3) 723.7 (403.6–1,297.8) 
 Deep learning 42 (85) 0.979 (0.979–0.979)  94.2 (92.4–95.5) 96.0 (94.4–97.2) 421.1 (279.0–635.7) 
Developmental data set size#       
 Smaller than median (≤8,655) 27 (46) 0.975 (0.975–0.975) <0.001 93.6 (91.3–95.3) 95.5 (93.1–97.0) 353.4 (211.9–589.6) 
 Larger than median (>8,655) 28 (55) 0.981 (0.981–0.981)  95.3 (93.3–96.7) 96.1 (94.0–97.5) 575.0 (323.8–1,021.2) 
Studies with validation       
 Internal 10 (18) 0.985 (0.985–0.985) <0.001 96.9 (93.9–98.5) 97.5 (93.2–99.1) 1,089.5 (466.9–2,542.4) 
 External 13 (32) 0.967 (0.967–0.967)  91.7 (87.2–94.8) 93.4 (88.7–96.2) 162.6 (91.1–290.2) 
Data set diversity (performance upon external validation)       
 Single data set 8 (20) 0.965 (0.965–0.965) 0.002 90.8 (85.0–94.5) 93.7 (88.2–96.8) 158.3 (79.3–316.2) 
 Multiple data sets^ 5 (12) 0.971 (0.971–0.971)  93.3 (84.2–97.4) 92.7 (81.4–97.4) 187.9 (50.0–706.0) 
Studies (validation data sets)Pooled AUROC (95% CI)PPooled sensitivity, % (95% CI)Pooled specificity, % (95% CI)DOR (95% CI)
Overall 53 (101) 0.979 (0.979–0.979)  94.6 (93.1–95.7) 95.8 (94.3–96.9) 456.1 (311.9–667.1) 
 FP 25 (45) 0.964 (0.964–0.964) <0.001 92.6 (90.2–94.4) 91.1 (88.0–93.4) 147.3 (94.0–230.8) 
 OCT 28 (56) 0.985 (0.985–0.985)  95.9 (94.1–97.2) 97.9 (96.6–98.6) 1,154.6 (691.0–1,929.1) 
Type of AI       
 Machine learning 11 (16) 0.966 (0.966–0.966) <0.001 96.7 (95.7–97.4) 94.5 (89.2–97.3) 723.7 (403.6–1,297.8) 
 Deep learning 42 (85) 0.979 (0.979–0.979)  94.2 (92.4–95.5) 96.0 (94.4–97.2) 421.1 (279.0–635.7) 
Developmental data set size#       
 Smaller than median (≤8,655) 27 (46) 0.975 (0.975–0.975) <0.001 93.6 (91.3–95.3) 95.5 (93.1–97.0) 353.4 (211.9–589.6) 
 Larger than median (>8,655) 28 (55) 0.981 (0.981–0.981)  95.3 (93.3–96.7) 96.1 (94.0–97.5) 575.0 (323.8–1,021.2) 
Studies with validation       
 Internal 10 (18) 0.985 (0.985–0.985) <0.001 96.9 (93.9–98.5) 97.5 (93.2–99.1) 1,089.5 (466.9–2,542.4) 
 External 13 (32) 0.967 (0.967–0.967)  91.7 (87.2–94.8) 93.4 (88.7–96.2) 162.6 (91.1–290.2) 
Data set diversity (performance upon external validation)       
 Single data set 8 (20) 0.965 (0.965–0.965) 0.002 90.8 (85.0–94.5) 93.7 (88.2–96.8) 158.3 (79.3–316.2) 
 Multiple data sets^ 5 (12) 0.971 (0.971–0.971)  93.3 (84.2–97.4) 92.7 (81.4–97.4) 187.9 (50.0–706.0) 
#

The median number of images among studies included in this meta-analysis was 8,655.

^

Multiple data sets is defined as data from different institutions or with population characteristics that are systematically different.

Figure 2

A: Forest plot for pooled sensitivity and specificity of FP studies. For a total of 25 studies, the pooled sensitivity and specificity were 92.6% and 91.1%, respectively. B: Forest plot for pooled sensitivity and specificity of OCT studies. For a total of 28 studies, the pooled sensitivity and specificity were 95.9% and 97.9%, respectively. C: SROC curve showing pooled performance of AI models using FP images, with a pooled area under the curve (AUC) of 0.964. D: SROC curve showing pooled performance of AI models using OCT images, with a pooled AUC of 0.985. Aus, Australia; EyePACS, Picture Archive Communication System for Eye Care; FNR, false-negative rate; FPR, false-positive rate; HEIMED, Hamilton Eye Institute Macular Edema Dataset (formerly DMED); IDRiD, Indian Diabetic Retinopathy Image Dataset; IRF, intraretinal fluid; RIST, Retinal Institute of South Texas; SIM, Shanghai Integration Model; STARE, Structured Analysis of the Retina; Thai, Thailand; TNR, true-negative rate; TPR, true-positive rate; UCSD, University of California, San Diego; UTHSCSA, The University of Texas Health Science Center at San Antonio; SERI, Singapore Eye Research Institute.

Figure 2

A: Forest plot for pooled sensitivity and specificity of FP studies. For a total of 25 studies, the pooled sensitivity and specificity were 92.6% and 91.1%, respectively. B: Forest plot for pooled sensitivity and specificity of OCT studies. For a total of 28 studies, the pooled sensitivity and specificity were 95.9% and 97.9%, respectively. C: SROC curve showing pooled performance of AI models using FP images, with a pooled area under the curve (AUC) of 0.964. D: SROC curve showing pooled performance of AI models using OCT images, with a pooled AUC of 0.985. Aus, Australia; EyePACS, Picture Archive Communication System for Eye Care; FNR, false-negative rate; FPR, false-positive rate; HEIMED, Hamilton Eye Institute Macular Edema Dataset (formerly DMED); IDRiD, Indian Diabetic Retinopathy Image Dataset; IRF, intraretinal fluid; RIST, Retinal Institute of South Texas; SIM, Shanghai Integration Model; STARE, Structured Analysis of the Retina; Thai, Thailand; TNR, true-negative rate; TPR, true-positive rate; UCSD, University of California, San Diego; UTHSCSA, The University of Texas Health Science Center at San Antonio; SERI, Singapore Eye Research Institute.

Close modal

Subgroup Analyses

Regarding the type of AI, deep learning algorithms showed higher pooled AUROC (0.979) than machine learning (0.966, P < 0.001). Regarding the developmental data size, the median image number (8,655) was used to stratify algorithms into using smaller and larger data sets. For studies that proposed multiple algorithms or models that used separate developmental data sets, each model was evaluated separately. Results demonstrated that developmental data sets larger than the median (i.e., >8,655) were associated with better validation results with higher pooled AUROC, sensitivity, specificity, and DOR (0.981, 95.3%, 96.1%, 575.0, respectively) than those developed with data sets smaller than the median (0.975, 93.6%, 95.5%, 353.4, respectively).

Regarding testing data sets, 50 studies were internally validated, but only 13 studies performed external validation. Among externally validated studies, we compared their performances upon internal and external validation. Results demonstrated lower pooled AUROC, sensitivity, specificity, and DOR for the algorithms when validated externally (0.967, 91.7%, 93.4%, 1,089.5, respectively) than internally (0.985, 96.9%, 97.5%, 162.6, respectively). Moreover, upon external validation, models trained on multiple data sets showed slightly better performance than those trained on a single data set (pooled AUROC 0.971 and DOR 187.9 for multiple data sets and 0.965 and 158.3, respectively, for a single data set, P = 0.002).

As evidenced by Supplementary Fig. 2A and B, bias at the bottom-right from the asymmetrical funnel plot is shown, which could result from large DOR and SEs among certain studies. Studies with small SEs also distributed among a rather scattered range of DOR, which did not concentrate around the funnel. Therefore, the results should be interpreted with caution and indicate the need for further rigorous research to reduce bias.

The present meta-analysis investigated the performance of AI in detecting DME using FP or OCT images. Overall, our results indicate good discriminative performance for both FP- and OCT-based AI models in terms of pooled AUROC, sensitivity, specificity, and DOR. Potential factors that may increase model performance include the type of AI, sample size, and diversity in the developmental data set.

FP-Based AI

Among 25 FP-based AI studies, the pooled AUROC was 0.964, with sensitivity and specificity all >90%. This indicates the potential application of FP-based AI in primary care settings, considering the appreciable affordability and accessibility of FP. It may serve as an ideal tool to aid health care providers, especially in resource-restrained areas, to avoid intense human and time input. For example, as FP has already been used for DR screening, AI can be used as a first-pass tool and assist human graders’ subsequent grading in existing screening programs (6,7). This benefits the population with diabetes and the public health infrastructure by enhancing clinical workflow and referrals.

However, we noticed incomprehensive reporting in details of ground truth labeling for each study, i.e., whether it was based on FP alone or with additional clinical examinations (Supplementary Table 3). Since FP alone cannot serve as the gold standard for DME diagnosis, AI trained on these data may not overcome the previously reported high false-positive rate in current screening programs (10,11), despite better expertise (retinal specialists and ophthalmologists) for labeling. Thus, standardization of reporting in terms of the labeling gold standard and training materials would be recommended for future AI studies. Developing models with OCT-based diagnosis for the grading of its paired FP (25) may also improve the quality of ground truth labeling.

OCT-Based AI

The 28 OCT-based AI studies that detect retinal thickening or DME features, such as intraretinal or subretinal fluids, yielded a pooled AUROC of 0.985, with sensitivity and specificity >95%. Previous publications attributed the improved performance to OCT providing 3D scans of the whole retinal structure, which are more informative for AI models to detect eyes with DME, especially among cases with central subfield thickness <300 μm (26). Practically, cases alike could be difficult to differentiate from normal ones by using 2D FP alone. The high discriminative performance obtained by OCT-based AI demonstrates its potential to be applied in tertiary settings and eye hospitals, especially in more resourced settings where OCT machines are affordable, for not only detecting the presence of DME but also further classifying center-involved DME (CI-DME) and non-CI-DME. This is of clinical value as eyes with CI-DME require urgent intervention, such as anti–vascular endothelial growth factor injection, while for eyes with non-CI-DME, initial observation is an acceptable option. Moreover, Tang et al. (17) developed deep learning models that used whole 3D volumetric scans to not only detect DME with classification of CI-DME or non-CI-DME but also identify non-DME retinal abnormalities, such as epiretinal membrane and macular hole. This implies another advantage of OCT-based AI models for efficiently differentiating different retinal abnormalities based on comprehensive information provided by the volumetric scans, which may facilitate more personalized treatment plans for patients.

On the other hand, analysis of 3D volumetric OCT images may be useful for future algorithm development. In our meta-analysis, only two studies (17,27) developed OCT-based algorithms that detected DME from 3D volumetric scans directly. As the data remain insufficient for subgroup analysis, we have yet to evaluate and compare the accuracy between using OCT 2D B scans and 3D volumetric scans. Previous studies suggested substantial merits of deep learning training with 3D volumetric scans in terms of labeling effort (28,29). Compared with volume scan–level annotation, ground truth labeling of B scans is more labor intensive and time consuming (17). Model output of one result for one volumetric scan may also facilitate clinical application for graders with less expertise. However, analysis of volumetric scans requires higher computation power. Therefore, future meta-analyses may compare the performance of DME detection by AI using OCT B scan and volumetric scan and evaluate whether using volumetric scans as input provides additional benefits in terms of model development and clinical application.

Subgroup Analyses

We also conducted subgroup analyses to investigate how the type of AI, developmental data set size, validation methods, and number of training data resources affect the performance of AI algorithms.

Type of AI

Regarding AI type, we found that both deep learning and traditional machine learning algorithms obtained satisfactory performance with pooled AUROC, sensitivity, and specificity >90%, while deep learning algorithms demonstrated a slightly higher pooled AUROC. While traditional machine learning requires complex feature engineering (29), most included studies adopted deep learning, which allows automatic pattern recognition with high-volume modeling capability and may offer identification of insignificant markers of early-stage DME. Since deep learning requires higher computation power and larger training data sets (30), some studies further used transfer learning and semisupervised learning techniques. Transfer learning allows pretrained networks to be retrained on specific tasks with fewer training examples and less computational power, while semisupervised learning, such as generative adversarial networks, addresses the scarcity of labeled data. Self-supervised learning, a novel technique combining qualities of high label efficiency and strong generalization capacity, indicates the great potential of deep learning techniques in accelerating medical AI development.

Model Development

Our results show that developing AI with a larger data set provides better performance. This supports the outcomes of previous subsampling experiments, which showed that AUROC of an FP-based deep learning model for DME detection increases with sample size (25), and the performance of another FP-based deep learning model for referable DR increased with training data set size, plateauing at ∼60,000 images (18). However, we also noticed a discrepancy from individual studies. Ai et al. (30) developed two models for DME detection, a complete model trained on a large, but imbalanced data set of >60,000 normal and DME images and a limited model trained on a small, but balanced data set of 1,000 images only from each class. The limited model was shown to outperform the complete model despite a huge deficiency in number of training data. However, in our meta-analysis, we have yet to compare the effect of class balance and sample size on model development and performance.

Model Validation

Apart from data acquisition, we noticed deficiencies in external validation among studies included in our meta-analysis, which was performed in only 13 of 53 studies. External validation is important since overfitting is a common pitfall in AI training wherein models learn specific patterns in the training data and show strong performance in similar internal testing data, resulting in great discrepancies in performance between internal and external validation. This is supported by our subgroup analysis, where algorithms showed a significant decline in all pooled performance parameters when validated with external unseen data. Therefore, guidelines related to AI development and performance reporting emphasize the importance of external validation before clinical translation (31,32). Insufficiency in external validation among the included studies limits our assessment of the generalizability of algorithms in the real world and may lead to an overestimation of their performance in clinical settings. Future research should be directed toward external validation of algorithms.

Regarding training data diversity, we found better performance upon external validation among studies trained on data from multiple institutions or with systematically different population characteristics. Greater data diversity from multiple sources may enhance the resilience of algorithms over different baseline populations in unseen external data. This suggests that increasing developmental data diversity may enhance generalizability of AI upon clinical application to maintain its performance in detecting DME among patients with various baseline characteristics.

We also noticed discrepancies in baseline characteristics between validation data sets and real-world data. Epidemiological studies showed that DME prevalence is 1.4–12.8% among the population with type 2 diabetes worldwide (33,34). However, the proportion of DME images among validation data sets of included studies has a median of 27.5% (Supplementary Table 3), which is much higher than the general prevalence. Such discrepancy may hinder our understanding of the models’ realistic applicability in primary settings. Therefore, validation of AI using data sets with DME prevalence more resemblant of current epidemiology may provide more representative statistics. In view of the unrepresentative DME prevalence in retrospective data sets, it is also important to perform real-world prospective validation studies before these models can be used clinically with confidence.

Strengths and Limitations

Our meta-analysis has several strengths. First, our study demonstrated that machine learning and deep learning could be used to develop AI with high diagnostic performance for detecting and classifying DME from FP or OCT images. From a clinical and public health standpoint, these advancements can enhance clinical workflow and optimize identification of patients requiring specialists or even retina clinic referrals. FP-based AI could be incorporated into existing DR screening programs, while OCT-based AI could be used as a secondary tool to screen individuals with positive results based on FP and determine treatment options. Second, we identified factors that may enhance model performance, such as deep learning technique, a larger developmental data set, and greater training data diversity. Third, regarding gaps in existing research, we recognized insufficiencies in external validation and discrepancies between validation data and the realistic, primary care-level populations. Addressing these gaps may enhance clinical translatability of these AI models and accelerate their implementation in real-life settings.

Our study also has several limitations. First, the outcomes of algorithms were not standardized and were labeled based on different disease definitions. For example, we noticed that there are variable outcomes of algorithms, i.e., DME, clinically significant macular edema (CSME), nonclinically significant CSME (non-CSME), CI-DME, non-CI-DME, and referable DME, of which CSME and non-CSME were graded based on the exudate location on FP and retinal thickness or DME features on OCT, while CI-DME and non-CI-DME are graded based on retinal thickness on OCT. There is also no standard definition of referable DME. Clinically, this may cause difficulty in results interpretation and, thus, referral procedure. Future guidelines should standardize model outcome and disease definition to facilitate clinical translation and allow fair performance evaluation. Second, there are insufficient data for evaluating model performance on comparing OCT 2D B scans and 3D volumetric scans, OCT machines, and generalizability of models on external data sets. Third, the small sample size of some included studies may reduce representativeness of their AI performance. Fourth, few articles reported detailed demographics of their database, such as age, sex, and duration of diabetes, which provided insufficient data for meta-regression to evaluate their database diversity and their effect on AI development. Finally, there is also a vacuum for assessing whether AI performance might be affected by different DME severity. Future research based on DME grade is required to further elucidate the performance discrepancies in AI algorithms.

Future Directions

For future development of AI models, data acquisition remains a major challenge in terms of data diversity and sample size. Collaboration between multiple centers or integration of multiple public data sets with different baseline characteristics would be beneficial in increasing the diversity and size of developmental data. Data efficient techniques, such as generative adversarial networks (35), synthetic minority oversampling technique (36), few-shot transfer learning (37), and self-supervised learning, may aid algorithm development among a limited-source training sample. However, to date, there are no studies that have provided an estimation of optimal sample size and class balance for DME model development. Further studies on sample size determination methodologies are required to estimate the developmental sample size that balances diagnostic performance with resource availability (38). Standardization on reporting of labeling gold standard, patient demographics, such as age, sex, ethnicity, type and duration of diabetes, and settings of image acquisition, would also facilitate the assessment of the value of AI as a potential population screening tool for DME. Clinically, AI-relevant education of health care providers and development of a systematic approach for AI implementation by multiple stakeholders are crucial, including legislation addressing issues of accountability and data sharing (39).

In conclusion, AI algorithms show satisfactory discriminative performance in DME detection from either FP or OCT images. Potential factors that may increase model performance include the type of AI, sample size, and diversity in the developmental data set. There remain significant gaps in external validation among current studies to evaluate models’ generalizability for clinical translation. Further studies can evaluate the potential and added value of OCT volumetric scan analysis by AI models, estimate the optimal sample size and effects of class balance and patient demographics, and compare the relative effectiveness of AI in DME detection with human evaluation.

This article contains supplementary material online at https://doi.org/10.2337/figshare.24518287.

PROSPERO reg. no. CRD4202127609, https://www.crd.york.ac.uk/prospero/

Duality of Interest. No potential conflicts of interest relevant to this article were reported.

Author Contributions. C.L. and Y.L.W. researched data; performed the systematic search, study selection, data extraction, and statistical analyses; and wrote the manuscript. Z.T. and A.R.R. researched data, contributed to the discussion, and reviewed and edited the manuscript. X.H., T.X.N., D.Y., and S.Z. conducted the data extraction and reviewed and edited the manuscript. J.D., S.K.H.S., and C.Y.C. contributed to the discussion and reviewed and edited the manuscript. All authors approved the decision to submit for publication. A.R.R. and C.Y.C. are the guarantors of this work and, as such, had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

1.
Cheung
N
,
Mitchell
P
,
Wong
TY
.
Diabetic retinopathy
.
Lancet
2010
;
376
:
124
136
2.
Solomon
SD
,
Chew
E
,
Duh
EJ
, et al
.
Diabetic retinopathy: a position statement by the American Diabetes Association
.
Diabetes Care
2017
;
40
:
412
418
3.
Cheung
CY
,
Ikram
MK
,
Klein
R
,
Wong
TY
.
The clinical implications of recent studies on the structure and function of the retinal microvasculature in diabetes
.
Diabetologia
2015
;
58
:
871
885
4.
Tan
GS
,
Cheung
N
,
Simó
R
,
Cheung
GCM
,
Wong
TY
.
Diabetic macular oedema
.
Lancet Diabetes Endocrinol
2017
;
5
:
143
155
5.
International Diabetes Federation
.
IDF Diabetes Atlas
. 6th ed.
Brussels
,
International Diabetes Federation
,
2014
6.
Wang
LZ
,
Cheung
CY
,
Tapp
RJ
, et al
.
Availability and variability in guidelines on diabetic retinopathy screening in Asian countries
.
Br J Ophthalmol
2017
;
101
:
1352
1360
7.
Bhargava
M
,
Cheung
CY
,
Sabanayagam
C
, et al
.
Accuracy of diabetic retinopathy screening by trained non-physician graders using non-mydriatic fundus camera
.
Singapore Med J
2012
;
53
:
715
719
8.
Scanlon
PH
.
The English National Screening Programme for diabetic retinopathy 2003-2016
.
Acta Diabetol
2017
;
54
:
515
525
9.
Jiao
F
,
Fung
CS
,
Wan
YF
, et al
.
Effectiveness of the multidisciplinary Risk Assessment and Management Program for Patients with Diabetes Mellitus (RAMP-DM) for diabetic microvascular complications: a population-based cohort study
.
Diabetes Metab
2016
;
42
:
424
432
10.
Wong
RL
,
Tsang
CW
,
Wong
DS
, et al
.
Are we making good use of our public resources? The false-positive rate of screening by fundus photography for diabetic macular oedema
.
Hong Kong Med J
2017
;
23
:
356
364
11.
Jyothi
S
,
Elahi
B
,
Srivastava
A
,
Poole
M
,
Nagi
D
,
Sivaprasad
S
.
Compliance with the quality standards of National Diabetic Retinopathy Screening Committee
.
Prim Care Diabetes
2009
;
3
:
67
72
12.
Szeto
SK
,
Hui
VWK
,
Tang
FY
, et al
.
OCT-based biomarkers for predicting treatment response in eyes with centre-involved diabetic macular oedema treated with anti-VEGF injections: a real-life retina clinic-based study
.
Br J Ophthalmol
2023
;
107
:
525
533
13.
Olson
J
,
Sharp
P
,
Goatman
K
, et al
.
Improving the economic value of photographic screening for optical coherence tomography-detectable macular oedema: a prospective, multicentre, UK study
.
Health Technol Assess
2013
;
17
:
1
142
14.
Goh
JK
,
Cheung
CY
,
Sim
SS
,
Tan
PC
,
Tan
GS
,
Wong
TY
.
Retinal imaging techniques for diabetic retinopathy screening
.
J Diabetes Sci Technol
2016
;
10
:
282
294
15.
Meredith
S
,
Mourtzoukos
S
,
Rennie
C
, et al
.
First year of implementing OCT into a diabetic eye screening service-quantification of the reduction in hospital eye service referrals
.
Eye (Lond)
2022
;
36
:
1840
1841
16.
Yu
KH
,
Beam
AL
,
Kohane
IS
.
Artificial intelligence in healthcare
.
Nat Biomed Eng
2018
;
2
:
719
731
17.
Tang
F
,
Wang
X
,
Ran
AR
, et al
.
A multitask deep-learning system to classify diabetic macular edema for different optical coherence tomography devices: a multicenter analysis
.
Diabetes Care
2021
;
44
:
2078
2088
18.
Gulshan
V
,
Peng
L
,
Coram
M
, et al
.
Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs
.
JAMA
2016
;
316
:
2402
2410
19.
Browning
DJ
,
Glassman
AR
,
Aiello
LP
, et al.;
Diabetic Retinopathy Clinical Research Network
.
Relationship between optical coherence tomography-measured central retinal thickness and visual acuity in diabetic macular edema
.
Ophthalmology
2007
;
114
:
525
536
20.
Whiting
PF
,
Rutjes
AW
,
Westwood
ME
, et al.;
QUADAS-2 Group
.
QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies
.
Ann Intern Med
2011
;
155
:
529
536
21.
Doebler
P
,
Holling
H
,
Sousa-Pinto
B
.
Meta-Analysis of Diagnostic Accuracy with mada
.
Vienna
,
R Project
,
2015
22.
Lowry
R
.
VassarStats: website for Statistical Computation
.
Accessed 9 August 2022. Available from http://vassarstats.net
23.
Glas
AS
,
Lijmer
JG
,
Prins
MH
,
Bonsel
GJ
,
Bossuyt
PM
.
The diagnostic odds ratio: a single indicator of test performance
.
J Clin Epidemiol
2003
;
56
:
1129
1135
24.
Kermany
D
,
Zhang
K
,
Goldbaum
M
.
Large Dataset of Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images
.
Mendeley Data, 2018. Accessed 19 September 2023. Available from https://data.mendeley.com/datasets/rscbjbr9sj/3
25.
Varadarajan
AV
,
Bavishi
P
,
Ruamviboonsuk
P
, et al
.
Predicting optical coherence tomography-derived diabetic macular edema grades from fundus photographs using deep learning
.
Nat Commun
2020
;
11
:
130
26.
Hwang
D-K
,
Chou
YB
,
Lin
TC
, et al
.
Optical coherence tomography-based diabetic macula edema screening with artificial intelligence
.
J Chin Med Assoc
2020
;
83
:
1034
1038
27.
Wang
X
,
Tang
F
,
Chen
H
,
Cheung
CY
,
Heng
PA
.
Deep semi-supervised multiple instance learning with self-correction for DME classification from OCT images
.
Med Image Anal
2023
;
83
:
102673
28.
Ran
AR
,
Cheung
CY
,
Wang
X
, et al
.
Detection of glaucomatous optic neuropathy with spectral-domain optical coherence tomography: a retrospective training and validation deep-learning analysis
.
Lancet Digit Health
2019
;
1
:
e172
e182
29.
Ran
AR
,
Tham
CC
,
Chan
PP
, et al
.
Deep learning in glaucoma with optical coherence tomography: a review
.
Eye (Lond)
2021
;
35
:
188
201
30.
Ai
Z
,
Huang
X
,
Feng
J
, et al
.
FN-OCT: disease detection algorithm for retinal optical coherence tomography based on a fusion network
.
Front Neuroinform
2022
;
16
:
876927
31.
Collins
GS
,
Reitsma
JB
,
Altman
DG
,
Moons
KG
.
Transparent Reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement
.
Diabet Med
2015
;
32
:
146
154
32.
Moons
KGM
,
Kengne
AP
,
Grobbee
DE
, et al
.
Risk prediction models: II. External validation, model updating, and impact assessment
.
Heart
2012
;
98
:
691
698
33.
Lee
R
,
Wong
TY
,
Sabanayagam
C
.
Epidemiology of diabetic retinopathy, diabetic macular edema and related vision loss
.
Eye Vis (Lond)
2015
;
2
:
17
34.
Ding
J
,
Wong
TY
.
Current epidemiology of diabetic retinopathy and diabetic macular edema
.
Curr Diab Rep
2012
;
12
:
346
354
35.
Das
V
,
Dandapat
S
,
Bora
PK
.
A data-efficient approach for automated classification of OCT images using generative adversarial network
.
IEEE Sens Lett
2020
;
4
:
1
4
36.
Chalakkal
R
,
Hafiz
F
,
Abdulla
W
,
Swain
A
.
An efficient framework for automated screening of Clinically Significant Macular Edema
.
Comput Biol Med
2021
;
130
:
104128
37.
Ghosh
A
,
Hossain
ABMA
,
Raju
SMTU
.
Classification of diabetic retinopathy using few-shot transfer learning from imbalanced data
. In
2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS)
.
Coimbatore, India
,
2021
:
78
83
38.
Biau
DJ
,
Kernéis
S
,
Porcher
R
.
Statistics in brief: the importance of sample size in the planning and interpretation of medical research
.
Clin Orthop Relat Res
2008
;
466
:
2282
2288
39.
Petersson
L
,
Larsson
I
,
Nygren
JM
, et al
.
Challenges to implementing artificial intelligence in healthcare: a qualitative interview study with healthcare leaders in Sweden
.
BMC Health Serv Res
2022
;
22
:
850
Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. More information is available at https://www.diabetesjournals.org/journals/pages/license.