Background: The studies on unknown diabetes are usually based on small samples of population. Information from the entire population can be accessible through health insurance databases like the French National Health Data System (SNDS) comprising data on reimbursement of out-of-hospital dispensed health care and from public and private hospitals from the 66 million people living in France. The objective of our study was to develop an algorithm to identify unknown diabetes cases in the SNDS using artificial intelligence (AI).

Methods: Data from the Constances cohort were used to develop the algorithm since individual information recorded in self-administered questionnaire, medical examination and biological tests is linked with data from the SNDS. We applied a supervised machine learning method, involving eight steps. First, the reference database was selected (n=44,185 participants), after excluding known diabetes cases. Then, the unknown diabetes cases - fasting blood glucose ≥7 mmol/L- were identified as target positive (n = 655). The following steps were: codification of the SNDS variables, split the referenced database into training and testing databases, selection of variables and training, validation and selection of algorithms.

Results: Among the 3471 variables coded, 12 variables were selected based on their ability to discriminate the target: unknown diabetes cases versus no diabetes. The final algorithm is a logistic regression model based on the 5 most discriminating variables: age, sex and number of out-of-hospital reimbursements in the previous year for blood lipid profile tests, general practitioner consultations and blood glucose tests. The specificity, sensitivity and precision of the algorithm were 70%, 71% and 69%, respectively.

Conclusion: AI opens many perspectives in terms of diabetes prevention. Thus, unknown diabetes cases could be ascertained for development and evaluation of prevention policies at a national, regional or local level.


S. Fuentes: None. R. Hrzic: None. S. Kab: Employee; Spouse/Partner; Sanofi. R. Haneef: None. S. Fosse-Edorh: None. E. Cosson: None.

Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. More information is available at