posted on 2018-02-06, 11:38 authored by Xiaobing Li, Xuejuan Shen, Xiao Chen, Dan Xiang, Robert W. Murphy, Yongyi Shen

Fishes are, by far, the most diverse group of vertebrates. Their classification relies heavily on morphology. In practice, the correct morphological identification of species often depends on personal experience because many species vary in their body shape, color and other external characters. Thus, the identification of a species may be prone to errors. Due to the rapid development of molecular biology, the number of sequences of fishes deposited in GenBank has grown explosively. These published data likely contain errors owing to invalid or incorrectly identified species. The erroneous data can lead to downstream problems. Thus, it is critical that such errors get identified and corrected. A strategy based on DNA barcoding can detect potentially erroneous data, especially when intraspecific K2P variation exceeds interspecific K2P divergence. Analyses of the most used DNA marker for fishes (mitochondrial Cytb) discovers that intraspecific differences of fishes are generally less than 1%, while interspecific differences are generally higher than 10%. Based on this ruler, our analyses identify 1,303 potential problematic Cytb sequences of fishes in GenBank and point to taxonomic problems, errors in identification, genetic introgression and other concerns. Care must be taken to avoid the perpetuation of errors when using these available data.


