Table_2_Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning.DOCX (3.05 MB)

Table_2_Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning.DOCX

Download (3.05 MB)
dataset
posted on 21.07.2020, 05:07 by Guishan Zhang, Yiyun Deng, Qingyu Liu, Bingxu Ye, Zhiming Dai, Yaowen Chen, Xianhua Dai

Circular RNA (circRNA) is a closed long non-coding RNA (lncRNA) formed by covalently closed loops through back-splicing. Emerging evidence indicates that circRNA can influence cellular physiology through various molecular mechanisms. Thus, accurate circRNA identification and prediction of its regulatory information are critical for understanding its biogenesis. Although several computational tools based on machine learning have been proposed for circRNA identification, the prediction accuracy remains to be improved. Here, first we present circLGB, a machine learning-based framework to discriminate circRNA from other lncRNAs. circLGB integrates commonly used sequence-derived features and three new features containing adenosine to inosine (A-to-I) deamination, A-to-I density and the internal ribosome entry site. circLGB categorizes circRNAs by utilizing a LightGBM classifier with feature selection. Second, we introduce circMRT, an ensemble machine learning framework to systematically predict the regulatory information for circRNA, including their interactions with microRNA, the RNA binding protein, and transcriptional regulation. Feature sets including sequence-based features, graph features, genome context, and regulatory information features were modeled in circMRT. Experiments on public and our constructed datasets show that the proposed algorithms outperform the available state-of-the-art methods. circLGB is available at http://www.circlgb.com. Source codes are available at https://github.com/Peppags/circLGB-circMRT.

History

References

Licence

Exports