Data_Sheet_1_Machine learning-based prediction of symptomatic intracerebral hemorrhage after intravenous thrombolysis for stroke: a large multicenter study.PDF
This study aimed to compare the performance of different machine learning models in predicting symptomatic intracranial hemorrhage (sICH) after thrombolysis treatment for ischemic stroke.
MethodsThis multicenter study utilized the Shenyang Stroke Emergency Map database, comprising 8,924 acute ischemic stroke patients from 29 comprehensive hospitals who underwent thrombolysis between January 2019 and December 2021. An independent testing cohort was further established, including 1,921 patients from the First People’s Hospital of Shenyang. The structured dataset encompassed 15 variables, including clinical and therapeutic metrics. The primary outcome was the sICH occurrence post-thrombolysis. Models were developed using an 80/20 split for training and internal validation. Performance was assessed using machine learning classifiers, including logistic regression with lasso regularization, support vector machine (SVM), random forest, gradient-boosted decision tree (GBDT), and multilayer perceptron (MLP). The model boasting the highest area under the curve (AUC) was specifically employed to highlight feature importance.
ResultsBaseline characteristics were compared between the training cohort (n = 6,369) and the external validation cohort (n = 1,921), with the sICH incidence being slightly higher in the training cohort (1.6%) compared to the validation cohort (1.1%). Among the evaluated models, the logistic regression with lasso regularization achieved the highest AUC of 0.87 (95% confidence interval [CI]: 0.79–0.95; p < 0.001), followed by the MLP model with an AUC of 0.766 (95% CI: 0.637–0.894; p = 0.04). The reference model and SVM showed AUCs of 0.575 and 0.582, respectively, while the random forest and GBDT models performed less optimally with AUCs of 0.536 and 0.436, respectively. Decision curve analysis revealed net benefits primarily for the SVM and MLP models. Feature importance from the logistic regression model emphasized anticoagulation therapy as the most significant negative predictor (coefficient: −2.0833) and recombinant tissue plasminogen activator as the principal positive predictor (coefficient: 0.5082).
ConclusionAfter a comprehensive evaluation, the MLP model is recommended due to its superior ability to predict the risk of symptomatic hemorrhage post-thrombolysis in ischemic stroke patients. Based on decision curve analysis, the MLP-based model was chosen and demonstrated enhanced discriminative ability compared to the reference. This model serves as a valuable tool for clinicians, aiding in treatment planning and ensuring more precise forecasting of patient outcomes.