Background: Nonalcoholic fatty liver disease (NAFLD) and its progressive form nonalcoholic steatohepatitis (NASH) are under recognized due to low awareness and lack of specific symptoms. Machine learning techniques allow the creation of algorithms derived from large databases with many interacting features to predict outcomes of interest. The objective of this study was to estimate the patients who might have NASH in electronic health records (EHR) using a NASH machine learning model (NASHmap).

Methods: NASHmap is a predictive extreme gradient boosting model built and validated in two databases in the USA. The model has a test area under the curve of 0.82 and 0.76 in the training and validation database, respectively. It uses 14 features, ranked by importance: HbA1c, AST, ALT, total protein, AST/ALT, BMI, triglycerides, height, platelets, WBC, hematocrit, albumin, hypertension, and gender. The model was applied to a cohort of patients in Optum EHR meeting predefined inclusion (e.g., common NASH comorbidities) and exclusion criteria (e.g., any other liver disease) and had the 14 features available.

Results: Out of 86 million (M) patients in the Optum EHR, 14M met the inclusion and exclusion criteria and 3M had all 14 features. Among them ∼23,000 patients had a diagnosis of NASH or NAFLD based on ICD codes with 73% having type 2 diabetes (T2D), 65% female and 28% above 65 years. For the patients with no recorded NASH diagnoses, 56% had T2D, 54% were female and 46% were above 65 years. Among the 3M patients, NASHmap predicted ∼902,000 as potentially having NASH and among them 70% had T2D, 52% were female and 43% were above 65 years; median HbA1c was 7.1 for diabetics and 5.8 for nondiabetics.

Conclusion: High-performing machine learning models can be used to alert on patients likely to have NASH. In clinical practice, this could help physicians to identify patients at risk and direct them to appropriate diagnostic and therapeutic interventions.


M. Docherty: Consultant; Self; Novartis Pharmaceuticals Corporation. A. Tietz: Employee; Self; Novartis AG. Stock/Shareholder; Self; Novartis AG. S.A. Regnier: Employee; Self; Novartis AG. Stock/Shareholder; Self; Novartis AG. M. Balp: Employee; Self; Novartis AG. G. Capkun: Employee; Self; Novartis AG. J. Loeffler: Employee; Self; Novartis Pharmaceuticals Corporation. Stock/Shareholder; Self; Novartis Pharmaceuticals Corporation. M. Pedrosa: Employee; Self; Novartis AG. J.M. Schattenberg: Consultant; Self; GENFIT, Gilead Sciences, Inc., Intercept Pharmaceuticals, Inc., Novartis Pharmaceuticals Corporation, Pfizer Inc., Roche Foundation.

Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. More information is available at