Image3_Transcriptome-Wide Annotation of m5C RNA Modifications Using Machine Learning.PDF (121.54 kB)

Image3_Transcriptome-Wide Annotation of m5C RNA Modifications Using Machine Learning.PDF

figure

posted on 2018-12-03, 14:48 authored by Jie Song, Jingjing Zhai, Enze Bian, Yujia Song, Jiantao Yu, Chuang Ma

The emergence of epitranscriptome opened a new chapter in gene regulation. 5-methylcytosine (m⁵C), as an important post-transcriptional modification, has been identified to be involved in a variety of biological processes such as subcellular localization and translational fidelity. Though high-throughput experimental technologies have been developed and applied to profile m⁵C modifications under certain conditions, transcriptome-wide studies of m⁵C modifications are still hindered by the dynamic nature of m⁵C and the lack of computational prediction methods. In this study, we introduced PEA-m5C, a machine learning-based m⁵C predictor trained with features extracted from the flanking sequence of m⁵C modifications. PEA-m5C yielded an average AUC (area under the receiver operating characteristic) of 0.939 in 10-fold cross-validation experiments based on known Arabidopsis m⁵C modifications. A rigorous independent testing showed that PEA-m5C (Accuracy [Acc] = 0.835, Matthews correlation coefficient [MCC] = 0.688) is remarkably superior to the recently developed m⁵C predictor iRNAm5C-PseDNC (Acc = 0.665, MCC = 0.332). PEA-m5C has been applied to predict candidate m⁵C modifications in annotated Arabidopsis transcripts. Further analysis of these m⁵C candidates showed that 4nt downstream of the translational start site is the most frequently methylated position. PEA-m5C is freely available to academic users at: https://github.com/cma2015/PEA-m5C.