Table_1_Applying Machine Learning to Carotid Sonographic Features for Recurrent Stroke in Patients With Acute Stroke.docx (28.22 kB)
Download file

Table_1_Applying Machine Learning to Carotid Sonographic Features for Recurrent Stroke in Patients With Acute Stroke.docx

Download (28.22 kB)
posted on 28.01.2022, 14:33 by Shih-Yi Lin, Kin-Man Law, Yi-Chun Yeh, Kuo-Chen Wu, Jhih-Han Lai, Chih-Hsueh Lin, Wu-Huei Hsu, Cheng-Chieh Lin, Chia-Hung Kao

Although carotid sonographic features have been used as predictors of recurrent stroke, few large-scale studies have explored the use of machine learning analysis of carotid sonographic features for the prediction of recurrent stroke.


We retrospectively collected electronic medical records of enrolled patients from the data warehouse of China Medical University Hospital, a tertiary medical center in central Taiwan, from January 2012 to November 2018. We included patients who underwent a documented carotid ultrasound within 30 days of experiencing an acute first stroke during the study period. We classified these participants into two groups: those with non-recurrent stroke (those who has not been diagnosed with acute stroke again during the study period) and those with recurrent stoke (those who has been diagnosed with acute stroke during the study period). A total of 1,235 carotid sonographic parameters were analyzed. Data on the patients' demographic characteristics and comorbidities were also collected. Python 3.7 was used as the programming language, and the scikit-learn toolkit was used to complete the derivation and verification of the machine learning methods.


In total, 2,411 patients were enrolled in this study, of whom 1,896 and 515 had non-recurrent and recurrent stroke, respectively. After extraction, 43 features of carotid sonography (36 carotid sonographic parameters and seven transcranial color Doppler sonographic parameter) were analyzed. For predicting recurrent stroke, CatBoost achieved the highest area under the curve (0.844, CIs 95% 0.824–0.868), followed by the Light Gradient Boosting Machine (0.832, CIs 95% 0.813–0.851), random forest (0.819, CIs 95% 0.802–0.846), support-vector machine (0.759, CIs 95% 0.739–0.781), logistic regression (0.781, CIs 95% 0.764–0.800), and decision tree (0.735, CIs 95% 0.717–0.755) models.


When using the CatBoost model, the top three features for predicting recurrent stroke were determined to be the use of anticoagulation medications, the use of NSAID medications, and the resistive index of the left subclavian artery. The CatBoost model demonstrated efficiency and achieved optimal performance in the predictive classification of non-recurrent and recurrent stroke.