Data_Sheet_1_Diatom DNA Metabarcoding for Biomonitoring: Strategies to Avoid Major Taxonomical and Bioinformatical Biases Limiting Molecular Indices Capacities.CSV
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Recent years provided intense progression in the implementation of molecular techniques in a wide variety of research fields in ecology. Biomonitoring and bioassessment can greatly benefit from DNA metabarcoding and High-Throughput Sequencing (HTS) methods that potentially provide reliable, high quantity and quality standardized data in a cost- and time-efficient way. However, DNA metabarcoding has its drawbacks, introducing biases at all the steps of the process, particularly during bioinformatics treatments used to prepare HTS data for ecological analyses. The high diversity of bioinformatics methods (e.g., OTU clustering, chimera detection, taxonomic assignment) and parameters (e.g., percentage similarity threshold used to define OTUs) make inter-studies comparison difficult, limiting the development of standardized and easy-accessible bioassessment procedures for routine freshwater monitoring. In order to study and overcome these drawbacks, we constructed four de novo indices to assess river ecological status based on the same biological samples of diatoms analyzed with morphological and molecular methods. The biological inventories produced are (i) morphospecies identified by microscopy, (ii) OTUs provided via metabarcoding and hierarchical clustering of sequences using a 95% similarity threshold, (iii) individual sequence units (ISUs) via metabarcoding and only minimal bioinformatical quality filtering, and (iv) exact sequence variants (ESVs) using DADA2 denoising algorithm. The indices based on molecular data operated directly with ecological values estimated for OTUs/ ISUs/ ESVs. Our study used an approach of bypassing taxonomic assignment, so bias related to unclassified sequences missing from reference libraries could be handled and no information on ecology of sequences is lost. Additionally, we showed that the indices based on ISUs and ESVs were equivalent, outperforming the OTU-based one in terms of predictive power and accuracy by revealing the hidden ecological information of sequences that are otherwise clustered in the same OTU (intra-species/intra-population variability). Furthermore, ISUs, ESVs, and morphospecies indices provided similar estimation of site ecological status, validating that ISUs with limited bioinformatics treatments may be used for DNA freshwater monitoring. Our study is a proof of concept where taxonomy- and clustering-free approach is presented, that we believe is a step forward a standardized and comparable DNA bioassessment, complementary to morphological methods.
Read the peer-reviewed publication