10.3389/fmicb.2019.01446.s002 Bryan Naidenov Bryan Naidenov Alexander Lim Alexander Lim Karyn Willyerd Karyn Willyerd Nathanial J. Torres Nathanial J. Torres William L. Johnson William L. Johnson Hong Jin Hwang Hong Jin Hwang Peter Hoyt Peter Hoyt John E. Gustafson John E. Gustafson Charles Chen Charles Chen Image_2_Pan-Genomic and Polymorphic Driven Prediction of Antibiotic Resistance in Elizabethkingia.tif Frontiers 2019 nanopore sequencing Elizabethkingia antimicrobial resistance machine learning AMR prediction 2019-07-04 14:40:29 Figure https://frontiersin.figshare.com/articles/figure/Image_2_Pan-Genomic_and_Polymorphic_Driven_Prediction_of_Antibiotic_Resistance_in_Elizabethkingia_tif/8684237 <p>The Elizabethkingia are a genetically diverse genus of emerging pathogens that exhibit multidrug resistance to a range of common antibiotics. Two representative species, Elizabethkingia bruuniana and E. meningoseptica, were phenotypically tested to determine minimum inhibitory concentrations (MICs) for five antibiotics. Ultra-long read sequencing with Oxford Nanopore Technologies (ONT) and subsequent de novo assembly produced complete, gapless circular genomes for each strain. Alignment based annotation with Prokka identified 5,480 features in E. bruuniana and 5,203 features in E. meningoseptica, where none of these identified genes or gene combinations corresponded to observed phenotypic resistance values. Pan-genomic analysis, performed with an additional 19 Elizabethkingia strains, identified a core-genome size of 2,658,537 bp, 32 uniquely identifiable intrinsic chromosomal antibiotic resistance core-genes and 77 antibiotic resistance pan-genes. Using core-SNPs and pan-genes in combination with six machine learning (ML) algorithms, binary classification of clindamycin and vancomycin resistance achieved f1 scores of 0.94 and 0.84, respectively. Performance on the more challenging multiclass problem for fusidic acid, rifampin and ciprofloxacin resulted in f1 scores of 0.70, 0.75, and 0.54, respectively. By producing two sets of quality biological predictors, pan-genome genes and core-genome SNPs, from long-read sequence data and applying an ensemble of ML techniques, our results demonstrated that accurate phenotypic inference, at multiple AMR resolutions, can be achieved.</p>