Data_Sheet_2_Pan-Genomic Study of Mycobacterium tuberculosis Reflecting the Primary/Secondary Genes, Generality/Individuality, and the Interconversion Through Copy Number Variations.PDF

Tuberculosis (TB) has surpassed HIV as the leading infectious disease killer worldwide since 2014. The main pathogen, Mycobacterium tuberculosis (Mtb), contains ~4,000 genes that account for ~90% of the genome. However, it is still unclear which of these genes are primary/secondary, which are responsible for generality/individuality, and which interconvert during evolution. Here we utilized a pan-genomic analysis of 36 Mtb genomes to address these questions. We identified 3,679 Mtb core (i.e., primary) genes, determining their phenotypic generality (e.g., virulence, slow growth, dormancy). We also observed 1,122 dispensable and 964 strain-specific secondary genes, reflecting partially shared and lineage-/strain-specific individualities. Among which, five L2 lineage-specific genes might be related to the increased virulence of the L2 lineage. Notably, we discovered 28 Mtb “Super Core Genes” (SCGs: more than a copy in at least 90% strains), which might be of increased importance, and reflected the “super phenotype generality.” Most SCGs encode PE/PPE, virulence factors, antigens, and transposases, and have been verified as playing crucial roles in Mtb pathogenicity. Further investigation of the 28 SCGs demonstrated the interconversion among SCGs, single-copy core, dispensable, and strain-specific genes through copy number variations (CNVs) during evolution; different mutations on different copies highlight the delicate adaptive-evolution regulation amongst Mtb lineages. This reflects that the importance of genes varied through CNVs, which might be driven by selective pressure from environment/host-adaptation. In addition, compared with Mycobacterium bovis (Mbo), Mtb possesses 48 specific single core genes that partially reflect the differences between Mtb and Mbo individuality.