Data_Sheet_1_Transcriptomics Identifies Modules of Differentially Expressed Genes and Novel Cyclotides in Viola pubescens.FASTA

Viola is a large genus with worldwide distribution and many traits not currently exemplified in model plants including unique breeding systems and the production of cyclotides. Here we report de novo genome assembly and transcriptomic analyses of the non-model species Viola pubescens using short-read DNA sequencing data and RNA-Seq from eight diverse tissues. First, V. pubescens genome size was estimated through flow cytometry, resulting in an approximate haploid genome of 455 Mbp. Next, the draft V. pubescens genome was sequenced and assembled resulting in 264,035,065 read pairs and 161,038 contigs with an N50 length of 3,455 base pairs (bp). RNA-Seq data were then assembled into tissue-specific transcripts. Together, the DNA and transcript data generated 38,081 ab initio gene models which were functionally annotated based on homology to Arabidopsis thaliana genes and Pfam domains. Gene expression was visualized for each tissue via principal component analysis and hierarchical clustering, and gene co-expression analysis identified 20 modules of tissue-specific transcriptional networks. Some of these modules highlight genetic differences between chasmogamous and cleistogamous flowers and may provide insight into V. pubescens’ mixed breeding system. Orthologous clustering with the proteomes of A. thaliana and Populus trichocarpa revealed 8,531 sequences unique to V. pubescens, including 81 novel cyclotide precursor sequences. Cyclotides are plant peptides characterized by a stable, cyclic cystine knot motif, making them strong candidates for drug scaffolding and protein engineering. Analysis of the RNA-Seq data for these cyclotide transcripts revealed diverse expression patterns both between transcripts and tissues. The diversity of these cyclotides was also highlighted in a maximum likelihood protein cladogram containing V. pubescens cyclotides and published cyclotide sequences from other Violaceae and Rubiaceae species. Collectively, this work provides the most comprehensive sequence resource for Viola, offers valuable transcriptomic insight into V. pubescens, and will facilitate future functional genomics research in Viola and other diverse plant groups.