Table_1_Helix Matrix Transformation Combined With Convolutional Neural Network Algorithm for Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry-Based Bacterial Identification.docx
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) analysis is a rapid and reliable method for bacterial identification. Classification algorithms, as a critical part of the MALDI-TOF MS analysis approach, have been developed using both traditional algorithms and machine learning algorithms. In this study, a method that combined helix matrix transformation with a convolutional neural network (CNN) algorithm was presented for bacterial identification. A total of 14 bacterial species including 58 strains were selected to create an in-house MALDI-TOF MS spectrum dataset. The 1D array-type MALDI-TOF MS spectrum data were transformed through a helix matrix transformation into matrix-type data, which was fitted during the CNN training. Through the parameter optimization, the threshold for binarization was set as 16 and the final size of a matrix-type data was set as 25 × 25 to obtain a clean dataset with a small size. A CNN model with three convolutional layers was well trained using the dataset to predict bacterial species. The filter sizes for the three convolutional layers were 4, 8, and 16. The kernel size was three and the activation function was the rectified linear unit (ReLU). A back propagation neural network (BPNN) model was created without helix matrix transformation and a convolution layer to demonstrate whether the helix matrix transformation combined with CNN algorithm works better. The areas under the receiver operating characteristic (ROC) curve of the CNN and BPNN models were 0.98 and 0.87, respectively. The accuracies of the CNN and BPNN models were 97.78 ± 0.08 and 86.50 ± 0.01, respectively, with a significant statistical difference (p < 0.001). The results suggested that helix matrix transformation combined with the CNN algorithm enabled the feature extraction of the bacterial MALDI-TOF MS spectrum, which might be a proposed solution to identify bacterial species.
Read the peer-reviewed publication