Table_1_Use of Non-invasive Parameters and Machine-Learning Algorithms for Predicting Future Risk of Type 2 Diabetes: A Retrospective Cohort Study of Health Data From Kuwait.DOCX

Objective: In recent decades, the Arab population has experienced an increase in the prevalence of type 2 diabetes (T2DM), particularly within the Gulf Cooperation Council. In this context, early intervention programmes rely on an ability to identify individuals at risk of T2DM. We aimed to build prognostic models for the risk of T2DM in the Arab population using machine-learning algorithms vs. conventional logistic regression (LR) and simple non-invasive clinical markers over three different time scales (3, 5, and 7 years from the baseline).

Design: This retrospective cohort study used three models based on LR, k-nearest neighbours (k-NN), and support vector machines (SVM) with five-fold cross-validation. The models included the following baseline non-invasive parameters: age, sex, body mass index (BMI), pre-existing hypertension, family history of hypertension, and T2DM.

Setting: This study was based on data from the Kuwait Health Network (KHN), which integrated primary health and hospital laboratory data into a single system.

Participants: The study included 1,837 native Kuwaiti Arab individuals (equal proportion of men and women) with mean age as 59.5 ± 11.4 years. Among them, 647 developed T2DM within 7 years of the baseline non-invasive measurements.

Analytical methods: The discriminatory power of each model for classifying people at risk of T2DM within 3, 5, or 7 years and the area under the receiver operating characteristic curve (AUC) were determined.

Outcome measures: Onset of T2DM at 3, 5, and 7 years.

Results: The k-NN machine-learning technique, which yielded AUC values of 0.83, 0.82, and 0.79 for 3-, 5-, and 7-year prediction horizons, respectively, outperformed the most commonly used LR method and other previously reported methods. Comparable results were achieved using the SVM and LR models with corresponding AUC values of (SVM: 0.73, LR: 0.74), (SVM: 0.68, LR: 0.72), and (SVM: 0.71, LR: 0.70) for 3-, 5-, and 7-year prediction horizons, respectively. For all models, the discriminatory power decreased as the prediction horizon increased from 3 to 7 years.

Conclusions: Machine-learning techniques represent a useful addition to the commonly reported LR technique. Our prognostic models for the future risk of T2DM could be used to plan and implement early prevention programmes for at risk groups in the Arab population.