To measure the acceptability and diagnostic accuracy of commonly used depression screening measures to determine ideal cutoff scores that sensitively identify depressive disorders in adolescents with type 1 diabetes (T1D).
One hundred adolescents (12–17 years old) completed a reference standard, semistructured diagnostic interview and both long and short versions of five commonly used depression screening measures in the United States. To assess feasibility and acceptability, we used screener completion time and participant ratings, respectively. We used descriptive statistics, area under the receiver operating characteristic (ROC) curve analyses, and paired-sample area differences under the ROC curve to assess each measure’s diagnostic validity against our reference standard and to determine ideal cutoff scores for this sample.
Adolescents had a mean age of 15.0 ± 1.7 years, time since T1D diagnosis of 6.0 ± 4.1 years, and glycated hemoglobin (HbA1c) of 8.9 ± 1.8%. Sixty percent of adolescents were male, 15% endorsed a current depressive disorder, and 15% endorsed lifetime suicidality. Measures demonstrated low sensitivity (0.33–0.67) to detect current depressive disorders using preexisting cutoff scores. However, adjusted cutoff scores increased sensitivity and reduced false negatives. All depression screening measures demonstrated “good” to “excellent” predictive validity, and the Children’s Depression Inventory-2 Short version demonstrated significantly greater diagnostic accuracy than the Patient Health Questionnare-2 item version for adolescents.
Clinics should consider using screening measures with the greatest diagnostic accuracy as identified in this study and adjusting measure cutoff scores to increase sensitivity and reduce false negatives.
Introduction
Between 22 and 30% of adolescents with type 1 diabetes (T1D) report clinically elevated depressive symptoms (1–3), compared with 15% prevalence of depressive disorders in community samples of adolescents (4). While these rates are complicated by comparing multiple screening measures, variable cut points, and differences in clinical diagnosis rates, they generally indicate that rates of depression are higher for youth with T1D than their peers. Youth with T1D and elevated depressive symptoms demonstrate less frequent self-monitoring of blood glucose, impaired glycemic levels, and more frequent hospitalizations for acute T1D complications (e.g., severe hypoglycemia and diabetic ketoacidosis [DKA]) than youth with T1D who do not endorse elevated depressive symptoms (3,5–8). Moreover, elevated depressive symptoms relate to increased suicidal ideation and sustained deficits in quality of life in youth with T1D (5,6,9–11). Thus, depressive symptoms in adolescents with T1D increase risk for both concurrent and long-term psychosocial issues and suboptimal T1D-related treatment outcomes.
Current American Diabetes Association and International Society of Pediatric and Adolescent Diabetes guidelines recommend screening all adolescents for depressive symptoms at time of T1D diagnosis and during routine follow-up care using available screening tools (12,13). However, existing screening tools may create unique challenges for screening adolescents with T1D because: 1) many depression screening tools were initially developed for use in adults, 2) no screening tools were specifically designed for people with T1D, and 3) existing screening tools lack confirmation of their ability to accurately detect depression in adolescents with T1D, particularly in relation to T1D-specific symptoms that can overlap with depressive symptoms.
Existing screening measures frequently used in research among community adolescents include the Beck Depression Inventory (BDI), Center for Epidemiological Studies Depression Scale (CESD), Children’s Depression Inventory (CDI), Patient Health Questionnaire (PHQ), and Reynolds Adolescent Depression Scale. Pooled comparisons of these common depression screening measures demonstrate adequate internal reliability (0.89), sensitivity (0.80), specificity (0.78), and diagnostic accuracy (0.86) (14). However, it is notable that studies used inconsistent cut points and thus had variable sensitivity, specificity, and poor positive predictive value (PPV) for detecting depression across samples. In addition, many of these studies used other screening measures as reference standards rather than a diagnostic interview or clinical diagnosis to confirm true depression, which may affect their accuracy (see Supplementary Table 1 for detailed psychometrics of these screening measures) (14).
In adults with T1D and type 2 diabetes (T2D), the BDI and CESD are the most frequently used depression screeners, followed by the PHQ and a diabetes distress scale (15). Studies of depression screener validity in adults with T1D and T2D demonstrate wide ranges of sensitivity and specificity and high rates of false positives based on established cutoff scores (15). Studies comparing depression screeners against diagnostic interviews found that screeners consistently overidentified adults who did not have a depressive disorder, possibly due to how somatic and cognitive/affective symptoms may be differentially related to diabetes distress, glycemic levels, and depression (16–18). In adolescents with T1D, the CDI is the most frequently used depression screener, followed by the CESD and PHQ (5). Despite wide ranges of sensitivity and specificity and high false-positive rates of these screeners in adults with T1D and T2D, to date, no study has evaluated depression screening measures against a diagnostic interview to establish their psychometrics properties in adolescents with T1D.
Thus, our purpose in this study was to compare commonly used depression screening measures for adolescents with T1D against an empirically supported reference standard to examine their diagnostic accuracy and to identify ideal screening measure cutoff scores. We hypothesized that commonly used depression screening measures would demonstrate feasibility for use in adolescents with T1D, as characterized by completion times that fit within an average diabetes clinic wait time and high participant-reported acceptability. To adjust for possible effects related to somatic and cognitive/affective symptoms in T1D, we hypothesized that all measures would demonstrate improved sensitivity, specificity, and PPVs with numerically higher total cutoff scores than published cutoff scores in the general adolescent literature. We further predicted that there would be differences in the diagnostic accuracy of measures when compared with our diagnostic interview reference standard.
Research Design and Methods
Participants and Procedures
We recruited 100 adolescents with T1D from a network of tertiary care pediatric diabetes clinics in the Midwestern U.S. A parent/legal guardian provided written informed consent, and the adolescent provided assent prior to study participation. All procedures were approved by the local Institutional Review Board prior to study initiation. Eligibility criteria included a diagnosis of T1D for at least 6 months and youth between 12.00 and 17.99 years of age. We excluded adolescents who did not have T1D and who could not complete the screening tools independently or in English.
Study procedures involved a single visit during which parents/legal guardians completed a brief demographics questionnaire, while adolescents independently completed five depression screeners and the diagnostic interview. We used Research Electronic Data Capture (REDCap) (19,20) to administer the depression screeners to adolescents in a random order to reduce potential bias from order effects or fatigue. A clinic social worker was available in person to meet with adolescents who endorsed significant risk of harm to themselves or others. All adolescents who endorsed elevated depressive symptoms and/or suicidal ideation received local treatment resources at the end of their visit. Adolescents received $45 for participating in the study.
Measures
BDI-II
The BDI-II is a 21-item self-report measure of symptoms. Items are summed to yield a total score (range 0–63) suggesting minimal (0–13), mild (14–19), moderate (20–28), or severe (29–63) risk of depression (21).
CESD Revised
CDI-2
The CDI-2 is a 28-item self-report measure of symptoms experienced in the past 2 weeks (25). Responses are summed to a total score (range 0–56) with clinical cutoffs of 15, 20, and 25 indicative of mild, moderate, and severe depression risk, respectively. Included within the CDI-2 is a validated 12-item short version that has a total score ≥3 (range 0–24) as the proposed clinical cutoff (26).
PHQ for Adolescents
The PHQ for Adolescents (PHQ-9A) is a nine-item self-report measure of symptoms in the past 2 weeks, plus four summary items that assess symptom severity over the past year (27). Responses are summed to a total score (range 0–27), with proposed cutoffs: minimal (0–4), mild (5–9), moderate (10–14), moderately severe (15–19), and severe (20–27) depression risk (27). Included within the PHQ-9A is a brief screening measure consisting of the first two items (PHQ-2A; range 0–6). Scores ≥2 or ≥3 have been proposed as clinical cutoffs for depression risk using the PHQ-2A.
Short Mood and Feelings Questionnaire
The Short Mood and Feelings Questionnaire (SMFQ) is a 13-item self-report measure of symptoms in the past 2 weeks (28). Items are summed to yield a total score (range 0–26), with ≥11 proposed to indicate clinical depression risk.
Kiddie Schedule for Affective Disorders and Schizophrenia (DSM-5)
The Kiddie Schedule for Affective Disorders and Schizophrenia (K-SADS-PL) is a semistructured diagnostic interview for diagnosing depressive and bipolar-related disorders in youth (29). The K-SADS-PL has demonstrated high sensitivity and specificity and excellent concurrent validity for identifying depressive disorders in adolescents (30–33), and the semistructured format of the K-SADS-PL allows for inclusion of best clinical judgement to arrive at diagnoses by consensus. We used trained master’s level graduate students in clinical child psychology to administer the K-SADS-PL. To limit participant time burden, only screening interview items and the depressive and bipolar-related disorders supplement were administered. All interviewers received supervision from a licensed clinical psychologist and diabetes expert throughout the study, and all interview responses were discussed as a group to determine final ratings by consensus. Interviewers conducted additional reliability checks using recorded mock interviews at four times during data collection (at ∼0, 25, 50, and 75% of the sample collected) to maintain an interrater reliability of intraclass correlation coefficient ≥0.80 for the study (average 86.7% agreement).
Acceptability
We used a brief six-item study-specific self-report measure to assess acceptability (see Supplementary Material). Adolescents completed this measure after each depression screener or five times total. This acceptability measure was designed to yield a total score, with higher scores indicating greater acceptability.
Feasibility
Participants selected a timestamp button at the end of each depression screening measure to record completion times in REDCap.
Demographics
We used a study-specific form to help characterize the sample based on adolescent (e.g., sex, age, race/ethnicity, and mental health history) and family (e.g., Hollingshead rating of socioeconomic status [SES] and presence of other family members with diabetes) characteristics. The Hollingshead SES index combines information on parent/caregiver sex, marital status, education, and occupation (34). Scale values for education are multiplied by 3, scale values for occupation are multiplied by 5, and then total scores for each caregiver are summed, with higher total scores indicating higher SES (range 6–88). Total scores were further classified into social strata groupings from 1 (lowest; unskilled labor, menial work; total scores 8–19) to 5 (highest; major business, professional; total scores 55–66).
We used the electronic health record or the Children’s Mercy database on Type One Diabetes in Pediatrics (35) to collect adolescents’ date of T1D diagnosis, pump and/or continuous glucose monitor use, insulin dose/medications, listed mental health diagnoses, medical insurance status, body weight, BMI, HbA1c from their most recent clinic visit/standard of care measurement, number of hospitalizations in the last year due to DKA based on ICD-10 codes, family-reported severe hypoglycemic episodes, loss or consciousness, or coma, and most recent glucose data download.
Data Analyses
We used descriptive statistics to analyze our demographic data, depression screener, and interview results. We averaged adolescents’ satisfaction ratings to assess acceptability of each measure and used one-way ANOVA to explore differences in completion times and acceptability ratings by measure or child age. We used Pearson correlations to examine basic associations among demographics, T1D-related factors, and depressive scores. We calculated sensitivity, specificity, and PPVs for each screener compared with a dichotomous label of “depressed” (i.e., any current depressive disorder) versus “not depressed” based on results from the K-SADS-PL. The existing literature identified standards for high sensitivity and specificity at ≥0.80, respectively (36). In our analyses, we elected to optimize sensitivity ≥0.80 and applied more flexible specificity standards at ≥0.75 to fit use of these measures as first-line screeners and reduce the possibility of false negatives. To explore if one or more screeners showed greater diagnostic accuracy, we used area under the receiver operating characteristic (ROC) curve analyses and a paired-sample design to compare area under the ROC curves for each measure pair (e.g., BDI-II vs. CESD-R).
Results
Participants
We approached 324 adolescents, enrolled 128 adolescents, and 100 adolescents completed the study measures, yielding an enrollment rate of 39.5% and a completion rate of 78.1%. Adolescents had a mean age of 15.0 ± 1.7 years and time since diagnosis of 6.0 ± 4.1 years. Adolescents were 60% male, 87% White, 95% non-Hispanic, and 74% had a Hollingshead SES of ≥4 (range 1–5). Adolescents’ average HbA1c obtained from most recent clinic visit/standard of care measure was 8.9 ± 1.8%. For the T1D regimen, 72% used an insulin pump and 46% used continuous glucose monitoring. Fourteen adolescents were hospitalized for DKA during the year prior to study completion, and three experienced multiple DKA admissions. Eight adolescents reported severe hypoglycemic episodes the year prior to study completion, and four reported multiple severe hypoglycemic episodes. Ten adolescents reported a history of depression diagnosis, 15 reported a history of depression treatment (with or without a diagnosis), and 13 reported a history of diagnosis and treatment for another psychological disorder (e.g., anxiety, attention deficit hyperactivity disorder, anger/behavioral issues, and bipolar disorder).
Feasibility and Acceptability
Average screener completion times ranged from 1 to 3 min, and these data were available for 436 (of 500) individual measures, or 87.2%. Adolescents reported high acceptability for the screeners, particularly with respect to ease of understanding items and completion time. In contrast, adolescents reported lower scores for the item: “These items would be important to share with my doctor.”
Depressive Screener/Interview Outcomes
Based on the K-SADS-PL diagnoses, 15 adolescents met diagnostic criteria for any depressive disorder. We placed these 15 adolescents into the category “depressed,” leaving the remaining 85 adolescents in the category “not depressed.” Adolescents’ diagnostic category (i.e., “depressed” vs. “not depressed”) strongly and significantly associated with total scores on the depressive screeners as well as adolescent sex, self-reported history of depression or other mental health disorder, and Hollingshead SES (all P < 0.05) (Table 1). Table 2 provides descriptive statistics and cutoff scores for each depressive screener. Notably, based on a cutoff score ≥8 on the BDI-II, we could identify 14 of 15 adolescents in our “depressed” category, suggesting sensitivity of 0.93, specificity of 0.69, and PPV of 0.35. Applying a cutoff score ≥8 on the CESD-R, we could identify 12 of 15 adolescents in our “depressed” category, suggesting sensitivity of 0.80, specificity of 0.75, and PPV of 0.36. For the CDI-2 Long and using a cutoff score ≥11, we could identify 12 of 15 adolescents in our “depressed” category, suggesting sensitivity of 0.80, specificity of 0.76, and PPV of 0.38, while for the CDI-2 Short and using a cutoff score ≥5, we could identify 13 of 15 adolescents in our “depressed” category, suggesting sensitivity for CDI-2 short should be 0.87, specificity of 0.82, and PPV of 0.46. Using the PHQ-9A and applying a cutoff score ≥5, we could identify 13 of 15 adolescents in our “depressed” category, suggesting sensitivity of 0.87, specificity of 0.80, and PPV of 0.43, while using the PHQ-2A and applying a cutoff score ≥1, we could identify 13 of 15 adolescents in our “depressed” category, suggesting sensitivity of 0.87, specificity of 0.73, and PPV of 0.36. Finally, applying a cutoff score ≥12 on the SMFQ, we could identify 12 of 15 adolescents in our “depressed” category, suggesting sensitivity of 0.80, specificity of 0.80, and PPV of 0.41.
Correlations between K-SADS-PL diagnoses (any depressive disorder) and depression screening measure total scores
. | BDI-II . | CDI-2 Long . | CDI-2 Short . | CESD-R . | PHQ-9A . | PHQ-2A . | SMFQ . |
---|---|---|---|---|---|---|---|
K-SADS-PL | 0.611** | 0.584** | 0.652** | 0.559** | 0.601** | 0.529** | 0.657** |
BDI-II | — | 0.865** | 0.820** | 0.863** | 0.868** | 0.657** | 0.847** |
CDI-2 Long | — | — | 0.934** | 0.856** | 0.878** | 0.652** | 0.862** |
CDI-2 Short | — | — | — | 0.826** | 0.840** | 0.628** | 0.834** |
CESD-R | — | — | — | — | 0.885** | 0.629** | 0.874** |
PHQ-9A | — | — | — | — | — | 0.783** | 0.860** |
PHQ-2A | — | — | — | — | — | — | 0.664** |
SMFQ | — | — | — | — | — | — | — |
. | BDI-II . | CDI-2 Long . | CDI-2 Short . | CESD-R . | PHQ-9A . | PHQ-2A . | SMFQ . |
---|---|---|---|---|---|---|---|
K-SADS-PL | 0.611** | 0.584** | 0.652** | 0.559** | 0.601** | 0.529** | 0.657** |
BDI-II | — | 0.865** | 0.820** | 0.863** | 0.868** | 0.657** | 0.847** |
CDI-2 Long | — | — | 0.934** | 0.856** | 0.878** | 0.652** | 0.862** |
CDI-2 Short | — | — | — | 0.826** | 0.840** | 0.628** | 0.834** |
CESD-R | — | — | — | — | 0.885** | 0.629** | 0.874** |
PHQ-9A | — | — | — | — | — | 0.783** | 0.860** |
PHQ-2A | — | — | — | — | — | — | 0.664** |
SMFQ | — | — | — | — | — | — | — |
P < 0.001.
ROC Curve Analyses
Individual ROC curves suggested that all screeners differentiated between adolescents in our “depressed” versus “not depressed” categories at a clinically significant level. The CDI-2 Short, PHQ-9A, and SMFQ all demonstrated “excellent” predictive validity, whereas the BDI-II, CESD-R, CDI-2 Long, and PHQ-2A demonstrated “good” predictive validity (Table 3). Yet, based on ROC curve interpretations, it may be possible to optimize screener sensitivity by applying the following above adjusted cutoff scores: BDI-II ≥8, CESD-R ≥8, CDI-2 Long ≥11, CDI-2 Short ≥5, PHQ-9A ≥5 PHQ-2A ≥1, and SMFQ ≥4 (see Supplementary Material for full data on all cutoff scores for each screener). Using these adjusted cutoffs, 28–40% of the sample was identified as endorsing elevated depressive symptoms.
Descriptive statistics and cutoff scores for each depression screening measure (N = 100)
. | BDI-II . | CDI-2 Long . | CDI-2 Short . | CESD-R . | PHQ-9A . | PHQ-2A . | SMFQ . |
---|---|---|---|---|---|---|---|
Mean ± SD | 8.06 ± 8.01 | 8.58 ± 7.39 | 3.5 ± 3.34 | 7.2 ± 8.26 | 3.35 ± 3.64 | 0.69 ± 1.15 | 3.12 ± 4.17 |
Range | 0 to 40 | 0 to 35 | 0 to 14 | 0 to 34 | 0 to 17 | 0 to 5 | 0 to 20 |
ICC (α) | 0.91 | 0.90 | 0.81 | 0.91 | 0.81 | 0.73 | 0.90 |
Preexisting cutoff | ≥20 | ≥20 | ≥3 | ≥16 | ≥10 | ≥2 | ≥12 |
Sensitivity | 0.47 | 0.53 | 1.0 | 0.60 | 0.33 | 0.67 | 0.47 |
Specificity | 0.98 | 0.96 | 0.54 | 0.94 | 0.99 | 0.89 | 0.99 |
PPV | 0.78 | 0.73 | 0.28 | 0.64 | 0.83 | 0.53 | 0.88 |
Adjusted cutoff* | ≥8 | ≥11 | ≥5 | ≥8 | ≥5 | ≥1 | ≥4 |
Sensitivity | 0.93 | 0.80 | 0.87 | 0.80 | 0.87 | 0.87 | 0.80 |
Specificity | 0.69 | 0.76 | 0.82 | 0.75 | 0.80 | 0.73 | 0.80 |
PPV | 0.35 | 0.38 | 0.46 | 0.36 | 0.43 | 0.36 | 0.41 |
. | BDI-II . | CDI-2 Long . | CDI-2 Short . | CESD-R . | PHQ-9A . | PHQ-2A . | SMFQ . |
---|---|---|---|---|---|---|---|
Mean ± SD | 8.06 ± 8.01 | 8.58 ± 7.39 | 3.5 ± 3.34 | 7.2 ± 8.26 | 3.35 ± 3.64 | 0.69 ± 1.15 | 3.12 ± 4.17 |
Range | 0 to 40 | 0 to 35 | 0 to 14 | 0 to 34 | 0 to 17 | 0 to 5 | 0 to 20 |
ICC (α) | 0.91 | 0.90 | 0.81 | 0.91 | 0.81 | 0.73 | 0.90 |
Preexisting cutoff | ≥20 | ≥20 | ≥3 | ≥16 | ≥10 | ≥2 | ≥12 |
Sensitivity | 0.47 | 0.53 | 1.0 | 0.60 | 0.33 | 0.67 | 0.47 |
Specificity | 0.98 | 0.96 | 0.54 | 0.94 | 0.99 | 0.89 | 0.99 |
PPV | 0.78 | 0.73 | 0.28 | 0.64 | 0.83 | 0.53 | 0.88 |
Adjusted cutoff* | ≥8 | ≥11 | ≥5 | ≥8 | ≥5 | ≥1 | ≥4 |
Sensitivity | 0.93 | 0.80 | 0.87 | 0.80 | 0.87 | 0.87 | 0.80 |
Specificity | 0.69 | 0.76 | 0.82 | 0.75 | 0.80 | 0.73 | 0.80 |
PPV | 0.35 | 0.38 | 0.46 | 0.36 | 0.43 | 0.36 | 0.41 |
ICC, intraclass correlation coefficient.
Adjusted cutoff goal selected for ≥0.80 sensitivity and ≥0.75 specificity, with some flexibility in specificity to prioritize achieving sensitivity ≥0.80.
Area under the ROC curves for depression screening measures (N = 100)
. | AUC . | 95% CI lower bound . | 95% CI upper bound . | P value . | Qualitative description . |
---|---|---|---|---|---|
BDI-II | 0.898 | 0.825 | 0.971 | <0.001 | Good |
CESD-R | 0.855 | 0.753 | 0.957 | <0.001 | Good |
CDI-2 Long | 0.881 | 0.789 | 0.973 | <0.001 | Good |
CDI-2 Short | 0.930 | 0.868 | 0.992 | <0.001 | Excellent |
PHQ-9A | 0.900 | 0.825 | 0.975 | <0.001 | Excellent |
PHQ-2A | 0.849 | 0.733 | 0.965 | <0.001 | Good |
SMFQ | 0.900 | 0.814 | 0.986 | <0.001 | Excellent |
. | AUC . | 95% CI lower bound . | 95% CI upper bound . | P value . | Qualitative description . |
---|---|---|---|---|---|
BDI-II | 0.898 | 0.825 | 0.971 | <0.001 | Good |
CESD-R | 0.855 | 0.753 | 0.957 | <0.001 | Good |
CDI-2 Long | 0.881 | 0.789 | 0.973 | <0.001 | Good |
CDI-2 Short | 0.930 | 0.868 | 0.992 | <0.001 | Excellent |
PHQ-9A | 0.900 | 0.825 | 0.975 | <0.001 | Excellent |
PHQ-2A | 0.849 | 0.733 | 0.965 | <0.001 | Good |
SMFQ | 0.900 | 0.814 | 0.986 | <0.001 | Excellent |
AUC, area under the curve.
When all five measures, including long and short versions, were compared using paired-sample area differences under the ROC curve, the CDI-2 Short performed significantly better than the PHQ-2A for identifying adolescents in our “depressed” category (P = 0.032) and demonstrated a statistical trend toward higher predictive validity than the CDI-2 Long (P = 0.052) and CESD-R (P = 0.096). The SMFQ and PHQ-9A also demonstrated statistical trends toward higher predictive validity than the PHQ-2A (SMFQ, P = 0.069; PHQ-9A, P = 0.091) (Table 4).
Paired-sample area difference under the ROC curves, asymptotic (N = 100)
. | z . | P valuea . | AUC difference . | SE differenceb . | 95% CI lower bound . | 95% CI upper bound . |
---|---|---|---|---|---|---|
PHQ-9A vs. PHQ-2A | 1.692 | 0.091 | 0.051 | 0.303 | −0.008 | 0.110 |
PHQ-9A vs. CDI Long | 0.553 | 0.580 | 0.019 | 0.290 | −0.048 | 0.086 |
PHQ-9A vs. CESD-R | 1.185 | 0.236 | 0.045 | 0.298 | −0.029 | 0.120 |
PHQ-9A vs. BDI | 0.077 | 0.939 | 0.002 | 0.272 | −0.048 | 0.052 |
PHQ-9A vs. SMFQ | 0.000 | 1.000 | 0.000 | 0.283 | −0.042 | 0.042 |
PHQ-9A vs. CDI Short | −1.534 | 0.125 | −0.030 | 0.262 | −0.068 | 0.008 |
PHQ-2A vs. CDI Long | −0.638 | 0.524 | −0.032 | 0.319 | −0.131 | 0.067 |
PHQ-2A vs. CESD-R | −0.109 | 0.913 | −0.006 | 0.326 | −0.112 | 0.100 |
PHQ-2A vs. BDI | −1.174 | 0.240 | −0.049 | 0.302 | −0.131 | 0.033 |
PHQ-2A vs. SMFQ | −1.816 | 0.069 | −0.051 | 0.311 | −0.106 | 0.004 |
PHQ-2A vs. CDI Short | −2.140 | 0.032 | −0.081 | 0.293 | −0.155 | −0.007 |
CDI Long vs. CESD-R | 0.453 | 0.650 | 0.026 | 0.314 | −0.087 | 0.140 |
CDI Long vs. BDI | −0.560 | 0.575 | −0.017 | 0.287 | −0.076 | 0.042 |
CDI Long vs. SMFQ | −0.443 | 0.658 | −0.019 | 0.299 | −0.102 | 0.064 |
CDI Long vs. CDI Short | −1.945 | 0.052 | −0.049 | 0.277 | −0.098 | 0.000 |
CESD-R vs. BDI | −0.911 | 0.362 | −0.043 | 0.297 | −0.136 | 0.050 |
CESD-R vs. SMFQ | −0.962 | 0.336 | −0.045 | 0.307 | −0.137 | 0.047 |
CESD-R vs. CDI Short | −1.667 | 0.096 | −0.075 | 0.288 | −0.163 | 0.013 |
BDI vs. SMFQ | −0.063 | 0.950 | −0.002 | 0.282 | −0.063 | 0.059 |
BDI vs. CDI Short | −1.425 | 0.154 | −0.032 | 0.259 | −0.075 | 0.012 |
SMFQ vs. CDI Short | −1.029 | 0.303 | −0.030 | 0.272 | −0.087 | 0.027 |
. | z . | P valuea . | AUC difference . | SE differenceb . | 95% CI lower bound . | 95% CI upper bound . |
---|---|---|---|---|---|---|
PHQ-9A vs. PHQ-2A | 1.692 | 0.091 | 0.051 | 0.303 | −0.008 | 0.110 |
PHQ-9A vs. CDI Long | 0.553 | 0.580 | 0.019 | 0.290 | −0.048 | 0.086 |
PHQ-9A vs. CESD-R | 1.185 | 0.236 | 0.045 | 0.298 | −0.029 | 0.120 |
PHQ-9A vs. BDI | 0.077 | 0.939 | 0.002 | 0.272 | −0.048 | 0.052 |
PHQ-9A vs. SMFQ | 0.000 | 1.000 | 0.000 | 0.283 | −0.042 | 0.042 |
PHQ-9A vs. CDI Short | −1.534 | 0.125 | −0.030 | 0.262 | −0.068 | 0.008 |
PHQ-2A vs. CDI Long | −0.638 | 0.524 | −0.032 | 0.319 | −0.131 | 0.067 |
PHQ-2A vs. CESD-R | −0.109 | 0.913 | −0.006 | 0.326 | −0.112 | 0.100 |
PHQ-2A vs. BDI | −1.174 | 0.240 | −0.049 | 0.302 | −0.131 | 0.033 |
PHQ-2A vs. SMFQ | −1.816 | 0.069 | −0.051 | 0.311 | −0.106 | 0.004 |
PHQ-2A vs. CDI Short | −2.140 | 0.032 | −0.081 | 0.293 | −0.155 | −0.007 |
CDI Long vs. CESD-R | 0.453 | 0.650 | 0.026 | 0.314 | −0.087 | 0.140 |
CDI Long vs. BDI | −0.560 | 0.575 | −0.017 | 0.287 | −0.076 | 0.042 |
CDI Long vs. SMFQ | −0.443 | 0.658 | −0.019 | 0.299 | −0.102 | 0.064 |
CDI Long vs. CDI Short | −1.945 | 0.052 | −0.049 | 0.277 | −0.098 | 0.000 |
CESD-R vs. BDI | −0.911 | 0.362 | −0.043 | 0.297 | −0.136 | 0.050 |
CESD-R vs. SMFQ | −0.962 | 0.336 | −0.045 | 0.307 | −0.137 | 0.047 |
CESD-R vs. CDI Short | −1.667 | 0.096 | −0.075 | 0.288 | −0.163 | 0.013 |
BDI vs. SMFQ | −0.063 | 0.950 | −0.002 | 0.282 | −0.063 | 0.059 |
BDI vs. CDI Short | −1.425 | 0.154 | −0.032 | 0.259 | −0.075 | 0.012 |
SMFQ vs. CDI Short | −1.029 | 0.303 | −0.030 | 0.272 | −0.087 | 0.027 |
AUC, area under the curve.
Null hypothesis: true area difference = 0.
Under nonparametric assumption.
Conclusions
Despite best practice guidelines for routine depression screening of all adolescents with T1D, there is limited guidance on which screening tools perform best or optimal cutoff scores for this population. Consistent with our hypothesis, adolescents reported high acceptability (∼87% acceptability) and low time burden (1–3 min, on average) to complete commonly used depression screeners. This finding suggests that any of these screeners might fit within typical clinic wait times and be successfully implemented into routine clinic screening programs (12,13). Although acceptability ratings did not differ significantly, clinic decisions to implement screeners based on shortest time burden would suggest the CESD-R, PHQ-9A, or SMFQ might be optimal. However, it is noteworthy that we could not collect acceptability ratings and completion times for the CDI-2 Short and the PHQ-2A because these screeners share items with their longer versions. Thus, it will be necessary to examine acceptability and completion times for these short screeners in the future. Overall, our findings suggest that these depression screeners are feasible and acceptable for use with adolescents, which is consistent with previous implementation studies that demonstrated successful integration of routine depression screening in clinic settings (5,37).
Our next analyses expanded on previous literature by examining the diagnostic accuracy of five common depression screeners against a reference standard, the K-SADS-PL, in adolescents with T1D. Based on findings in adults with diabetes, we hypothesized that we may need to raise cutoff scores to account for adolescents with T1D reporting high somatic symptoms of depression (e.g., irritability or fatigue) that can overlap with physiological symptoms of T1D. Similar to existing research in adults with diabetes, our results revealed wide variability in the percentages of adolescents falling above the cutoff for elevated depressive symptoms using preexisting cut points (15). Subsequent examination of screener sensitivity and specificity identified optimal cutoff scores that were disparate from those in the general adolescent literature. Only the CDI-2 Short captured 100% of adolescents categorized as “depressed” using prepublished cutoff scores, and this occurred at the expense of low specificity (0.54). All other screeners demonstrated low sensitivity (0.33 to 0.67). Thus, contrary to our hypothesis, our results indicated the need to decrease cutoff scores for most depression screeners in adolescents with T1D to optimize sensitivity.
While we were surprised to find that cutoff scores should be decreased across most depression screeners to improve sensitivity, one possible explanation could be that adolescents in our sample underreported depressive symptoms on screeners. The clinic we recruited adolescents from routinely screens for psychosocial concerns (e.g., depression, anxiety, and disordered eating), suggesting that adolescents could be primed to underreport symptoms to complete measures faster or avoid having to remain in clinic to meet with a mental health professional. We recognize the historical context of negative stigma and mistrust associated with mental health diagnoses, particularly for underserved youth (38). Although we took steps to encourage honest disclosure of symptoms (e.g., measures and interview completed in a private room with only the interviewer present), we cannot rule out the possibility that participants underreported symptoms to avoid negative stigma. Similarly, adolescents with T1D generally demonstrate low treatment-seeking behavior and poor follow through on traditional depression treatment referrals, which may support the possibility that youth underreported symptoms to avoid treatment (39).
Another surprising finding was the percentage of adolescents in our sample categorized as “depressed” on the K-SADS-PL (15%) was substantially lower than the 22–30% of adolescents who endorsed clinically elevated depressive symptoms in previous studies (2,3,40). This may suggest our sample of recruited adolescents was substantially different from previous samples. However, it is equally possible this discrepancy may be due to previous studies overrelying on outcomes of single depression screeners and preexisting general adolescent cutoffs. Indeed, it is logical to assume that screening measures would identify more adolescents as “depressed” than adolescents who truly meet criteria for depression based on clinical diagnosis or diagnostic interview. Administering a self-report depression screener likely comprises only the first step in clinic-based screening in which a low false-negative rate is desirable. Accordingly, our K-SADS-PL results may indicate that true rates of depression in adolescents with T1D are lower than previously estimated and that clinics should expect to follow up with nearly twice as many adolescents as actually meet criteria for a depressive disorder based on screener results.
Finally, our area under the ROC curve analyses indicated that all screeners demonstrated good to excellent diagnostic accuracy for identifying “depressed” adolescents in this sample, contradicting our hypothesis that we would find differences. Though if we include trending differences, our results might suggest an order from highest to lowest diagnostic accuracy for screeners of CDI-2 Short, SMFQ, PHQ-9A, BDI-II, CESD-R, CDI-2 Long, and PHQ-2A, respectively. We tentatively offer evidence supporting continued use of the CDI-2 Short as an initial screener in diabetes clinics. For clinics needing a screener that is free for use, our results suggest the PHQ-9A or SMFQ may be the next best options based on their diagnostic accuracy in this sample. In contrast, the PHQ-2A may not be a viable option for screening purposes, as it demonstrated the lowest diagnostic accuracy in our sample and a very low cutoff point (≥1) to achieve adequate sensitivity would likely result in a high number of false positives.
We recognize there are many challenges in clinic implementation of routine depression screening in adolescents with T1D. In addition to selecting a feasible screening tool, clinics must consider the additional staff and time to follow up with adolescents who screen positive for depression in order to confirm the diagnoses and provide resources and/or treatment. While the results of this study tentatively suggest that clinics should decrease cutoff scores for most depression screeners in adolescents with T1D to optimize sensitivity, due to the current national shortage of pediatric medical subspecialists and limited access to mental health providers for many youth with T1D (41–43), we realize some clinics may not have the resources to tolerate a higher false-positive rate in their depression screening program. Consequently, future research and quality improvement projects are needed to better appreciate the costs and benefits of applying our adjusted cutoff scores in routine clinical screening. In addition, past research suggests that even subclinical symptoms can be significantly impairing and detrimentally impact T1D self-management (44,45). Therefore, future research into adaptive care models that provide multiple stages of intervention, such as primary education, brief interventions for positive screens that do not meet criteria for a depressive disorder, and more intensive interventions for those who meet full criteria for a depressive disorder, may be indicated to support the full range of screening outcomes for adolescents with T1D.
Our results should be interpreted while bearing in mind some limitations. First, our study had enrollment and completion rates of 39.5% and 78.1%, respectively. While our enrollment rate generally matches previously documented enrollment rates of 10–50% in clinical trials research with adolescents, we acknowledge that it is low (46). A potential barrier to participation was the requirement that youth complete the study visit in-person to ensure safety follow-up for any participants who endorsed suicidality. Future studies could achieve a higher enrollment rate by offering study visits via telehealth. Second, our study sample was homogeneous in terms of adolescents’ race/ethnicity and SES. Although our sample generally matched the demographics for youth in the clinics where we conducted the study (35), it will be important for future research to replicate our results in larger, more diverse samples. Greater diversity may be achieved by prescreening or stratified recruitment in future studies. Third, although there are well-documented biological sex differences in rates of depression among adolescents (i.e., females have about twice the prevalence of males), given our relatively small sample size, we did not examine sex differences in screening measures or cut points. Future studies with larger sample sizes are needed to examine sex differences in depressive symptoms in adolescents with T1D. Fourth, we acknowledge that both the depression screeners and K-SADS-PL were based on adolescent self-report. Currently, self-reported symptoms provide the best available evidence for diagnosing internalizing symptoms, and reliable, objective assessments are not available. However, self-reported symptoms may be improved by collecting confirmatory parent-report data in future studies. Related, it may be important for future studies to assess related constructs, such as diabetes distress. We did not screen for diabetes distress in this study, though it may be important to measure depressive symptoms and distress concurrently in adolescents with T1D to better understand overlap between these constructs. Fifth, few studies have examined divergent validity of the K-SADS-PL to differentiate depressive disorders from other psychological disorders. Given that multiple participants reported a history of other psychological disorders, future studies may consider administering the full K-SADS-PL and/or other broadband screening tools to better differentiate between depressive disorders and other psychological disorders in adolescents with T1D. Finally, we acknowledge that the CDI-2 and BDI-II were two of the self-report surveys used to establish convergent validity with the K-SADS-PL in validation studies (29,32), suggesting that it may be helpful to replicate this study using a different diagnostic interview. These limitations notwithstanding, we believe our study was scientifically rigorous with its use of a diagnostic interview reference standard, attention to interrater reliability for the K-SADS-PL, attempt to minimize order effects by randomly administering the screeners, and use of objective data collection (e.g., automatic timestamps for screeners) where possible.
Conclusion and Clinical Implications
In conclusion, this is the first known study to compare commonly used depression screening measures against a clinical diagnostic interview reference in adolescents with T1D. Our results preliminarily suggest clinics should consider adjusting depression screener cutoff scores to increase sensitivity and reduce false-negative rates. Specifically, based on our sample, suggested cutoff scores by measure are: BDI-II ≥8, CESD-R ≥8, CDI-2 Long ≥11, CDI-2 Short ≥5, PHQ-9A ≥5, PHQ-2A ≥1, and SMFQ ≥4. Moreover, with respect to diagnostic accuracy, our results suggest clinics should administer the CDI-2 Short form, if feasible, to use a measure with proprietary costs or the PHQ-9A and SMFQ if clinics are unable to routinely administer a proprietary measure due to cost or low reimbursement. In contrast, our results suggest clinics should avoid using the PHQ-2A alone, as this very short measure had low sensitivity for detecting depressive disorders in our sample. We recognize that next steps for this area of research are to examine these results in more diverse samples of adolescents. Quality improvement projects are also needed to better evaluate the costs and benefits of applying our adjusted cutoff scores that optimize sensitivity, which may increase the rate of false positives and clinic burden where mental health resources are already limited.
This article contains supplementary material online at https://doi.org/10.2337/figshare.20409522.
Article Information
Acknowledgments. The authors thank Heather Feingold and Lara Simon of Children’s Mercy—Kansas City for providing social work support to maintain the safety of the participants; graduates of the University of Kansas Clinical Child Psychology Program, Amy Noser and Alex Monzon, for administering diagnostic interviews and leading study visits; the additional members of A.M.M.’s University of Kansas dissertation committee, Ric Steele, Michael Roberts, Christopher Cushing, and D. Crystal Coles, for the critiques supporting the success of this project; and the adolescents with T1D and the families who graciously volunteered to participate in this project.
Funding. This project was funded by a University of Kansas Medical Center Practice Innovations Grant.
Duality of Interest. M.A.C. is Chief Medical Officer of Glooko, Inc. and receives nonfinancial research support from Dexcom and Abbott Diabetes Care. No other potential conflicts of interest relevant to this article were reported.
Author Contributions. A.M.M. wrote the manuscript and collected and analyzed data. S.R.P. mentored the project, secured funding, and assisted with writing the manuscript. A.E.E. provided supervision during data collection and reviewed and edited the manuscript. M.A.C. and R.J.M. provided medical oversight for the project and reviewed and edited the manuscript. A.M.M. and S.R.P. are the guarantors of this work and, as such, had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Prior Presentation. This study was presented in oral form at the 45th Annual Conference of the International Society for Pediatric and Adolescent Diabetes, Boston, MA, 30 October–2 November 2019 and the 79th Scientific Sessions of the American Diabetes Association, San Francisco, CA, 7–11 June 2019.