Table4_Predicting Drosha and Dicer Cleavage Sites with DeepMirCut.XLSX
MicroRNAs are a class of small RNAs involved in post-transcriptional gene silencing with roles in disease and development. Many computational tools have been developed to identify novel microRNAs. However, there have been no attempts to predict cleavage sites for Drosha from primary sequence, or to identify cleavage sites using deep neural networks. Here, we present DeepMirCut, a recurrent neural network-based software that predicts both Dicer and Drosha cleavage sites. We built a microRNA primary sequence database including flanking genomic sequences for 34,713 microRNA annotations. We compare models trained on sequence data, sequence and secondary structure data, as well as input data with annotated structures. Our best model is able to predict cuts within closer average proximity than results reported for other methods. We show that a guanine nucleotide before and a uracil nucleotide after Dicer cleavage sites on the 3′ arm of the microRNA precursor had a positive effect on predictions while the opposite order (U before, G after) had a negative effect. Our analysis was also able to predict several positions where bulges had either positive or negative effects on the score. We expect that our approach and the data we have curated will enable several future studies.