sorry, we can't preview this file
Table_1_PredPRBA: Prediction of Protein-RNA Binding Affinity Using Gradient Boosted Regression Trees.xls
Protein-RNA interactions play essential roles in many biological aspects. Quantifying the binding affinity of protein-RNA complexes is helpful to the understanding of protein-RNA recognition mechanisms and identification of strong binding partners. Due to experimentally measured protein-RNA binding affinity data available is still limited to date, there is a pressing demand for accurate and reliable computational approaches. In this paper, we propose a computational approach, PredPRBA, which can effectively predict protein-RNA binding affinity using gradient boosted regression trees. We build a dataset of protein-RNA binding affinity that includes 103 protein-RNA complex structures manually collected from related literature. Then, we generate 37 kinds of sequence and structural features and explore the relationship between the features and protein-RNA binding affinity. We find that the binding affinity mainly depends on the structure of RNA molecules. According to the type of RNA associated with proteins composed of the protein-RNA complex, we split the 103 protein-RNA complexes into six categories. For each category, we build a gradient boosted regression tree (GBRT) model based on the generated features. We perform a comprehensive evaluation for the proposed method on the binding affinity dataset using leave-one-out cross-validation. We show that PredPRBA achieves correlations ranging from 0.723 to 0.897 among six categories, which is significantly better than other typical regression methods and the pioneer protein-RNA binding affinity predictor SPOT-Seq-RNA. In addition, a user-friendly web server has been developed to predict the binding affinity of protein-RNA complexes. The PredPRBA webserver is freely available at http://PredPRBA.denglab.org/.
History
References
- https://doi.org//10.1021/bi981772z
- https://doi.org//10.1016/S1074-7613(00)80047-3
- https://doi.org//10.1093/nar/gkn102
- https://doi.org//10.1002/prot.24083
- https://doi.org//10.1093/nar/gku679
- https://doi.org//10.1093/nar/gkv876
- https://doi.org//10.1093/nar/28.1.235
- https://doi.org//10.1016/j.jmva.2010.06.019
- https://doi.org//10.1145/1143844.1143865
- https://doi.org//10.1002/prot.10085
- https://doi.org//10.1016/S0893-6080(03)00169-2
- https://doi.org//10.1093/nar/gkt980
- https://doi.org//10.1002/prot.23214
- https://doi.org//10.1186/1471-2105-12-S13-S5
- https://doi.org//10.1214/aos/1013203451.
- https://doi.org//10.1016/j.febslet.2008.03.004
- https://doi.org//10.1371/journal.pone.0108928
- https://doi.org//10.1080/07391102.2012.708604
- https://doi.org//10.1016/j.cell.2010.03.009
- https://doi.org//10.1093/nar/gku077
- https://doi.org//10.1021/acs.jctc.6b00254
- https://doi.org//10.1198/tas.2003.s211
- https://doi.org//10.1002/bip.360221211
- https://doi.org//10.1038/nrg2111
- https://doi.org//10.1093/nar/gkl819
- https://doi.org//10.1109/ICMLA.2011.55
- https://doi.org//10.1080/15472450.2018.1536978
- https://doi.org//10.3390/s18051556
- https://doi.org//10.1017/S1355838201002515
- https://doi.org//10.1002/prot.23117
- https://doi.org//10.1155/2018/5018053
- https://doi.org//10.1007/978-1-4899-7478-5_221
- https://doi.org//10.1016/j.neucom.2015.11.105
- https://doi.org//10.1186/1748-7188-6-26
- https://doi.org//10.1006/jmbi.1994.1334
- https://doi.org//10.1093/nar/gkv446
- https://doi.org//10.1016/S0969-2126(00)00507-4
- https://doi.org//10.1093/bioinformatics/btx822
- https://doi.org//10.1016/0022-2836(88)90564-5
- https://doi.org//10.1093/nar/gkr636
- https://doi.org//10.1093/nar/gkp011
- https://doi.org//10.1186/s12859-017-1879-2
- https://doi.org//10.1186/1471-2105-12-348
- https://doi.org//10.1038/s41598-018-32511-1
- https://doi.org//10.3354/cr030079
- https://doi.org//10.1186/1471-2105-11-174
- https://doi.org//10.1016/j.rse.2005.05.008
- https://doi.org//10.1371/journal.pone.0074443
- https://doi.org//10.1002/pro.2383
- https://doi.org//10.1007/978-1-4939-0366-5_9
- https://doi.org//10.1093/nar/gkq1266
- https://doi.org//10.2174/1389200219666180829121038
Usage metrics
Read the peer-reviewed publication
Categories
- Gene and Molecular Therapy
- Biomarkers
- Genetics
- Genetically Modified Animals
- Developmental Genetics (incl. Sex Determination)
- Epigenetics (incl. Genome Methylation and Epigenomics)
- Gene Expression (incl. Microarray and other genome-wide approaches)
- Livestock Cloning
- Genome Structure and Regulation
- Genetic Engineering
- Genomics