10.3389/fmicb.2019.01560.s001
Wang Xi
Wang
Xi
Yan Gao
Yan
Gao
Zhangyu Cheng
Zhangyu
Cheng
Chaoyun Chen
Chaoyun
Chen
Maozhen Han
Maozhen
Han
Pengshuo Yang
Pengshuo
Yang
Guangzhou Xiong
Guangzhou
Xiong
Kang Ning
Kang
Ning
Data_Sheet_1_Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome.zip
Frontiers
2019
quality control
contamination screening
metagenome
next generation sequencing (NGS)
novel pipeline
2019-07-09 04:50:22
Dataset
https://frontiersin.figshare.com/articles/dataset/Data_Sheet_1_Using_QC-Blind_for_Quality_Control_and_Contamination_Screening_of_Bacteria_DNA_Sequencing_Data_Without_Reference_Genome_zip/8831153
<p>Quality control for next generation sequencing (NGS) has become increasingly important with the ever increasing importance of sequencing data for omics studies. Tools have been developed for filtering possible contaminants from species with known reference genome. Unfortunately, reference genomes for all the species involved, including the contaminants, are required for these tools to work. This precludes many real-life samples that have no information about the complete genome of the target species, and are contaminated with unknown microbial species. In this work we proposed QC-Blind, a novel quality control pipeline for removing contaminants without any use of reference genomes. The pipeline merely requires the information about a few marker genes of the target species. The entire pipeline consists of unsupervised read assembly, contig binning, read clustering, and marker gene assignment. When evaluated on in silico, ab initio and in vivo datasets, QC-Blind proved effective in removing unknown contaminants with high specificity and accuracy, while preserving most of the genomic information of the target bacterial species. Therefore, QC-Blind could serve well in situations where limited information is available for both target and contamination species.</p>