Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
The French National Reference Center for Staphylococci currently uses DNA arrays and spa typing for the initial epidemiological characterization of Staphylococcus aureus strains. We here describe the use of whole-genome sequencing (WGS) to investigate retrospectively four distinct and virulent S. aureus lineages [clonal complexes (CCs): CC1, CC5, CC8, CC30] involved in hospital and community outbreaks or sporadic infections in France. We used a WGS bioinformatics pipeline based on de novo assembly (reference-free approach), single nucleotide polymorphism analysis, and on the inclusion of epidemiological markers. We examined the phylogeographic diversity of the French dominant hospital-acquired CC8-MRSA (methicillin-resistant S. aureus) Lyon clone through WGS analysis which did not demonstrate evidence of large-scale geographic clustering. We analyzed sporadic cases along with two outbreaks of a CC1-MSSA (methicillin-susceptible S. aureus) clone containing the Panton–Valentine leukocidin (PVL) and results showed that two sporadic cases were closely related. We investigated an outbreak of PVL-positive CC30-MSSA in a school environment and were able to reconstruct the transmission history between eight families. We explored different outbreaks among newborns due to the CC5-MRSA Geraldine clone and we found evidence of an unsuspected link between two otherwise distinct outbreaks. Here, WGS provides the resolving power to disprove transmission events indicated by conventional methods (same sequence type, spa type, toxin profile, and antibiotic resistance profile) and, most importantly, WGS can reveal unsuspected transmission events. Therefore, WGS allows to better describe and understand outbreaks and (inter-)national dissemination of S. aureus lineages. Our findings underscore the importance of adding WGS for (inter-)national surveillance of infections caused by virulent clones of S. aureus but also substantiate the fact that technological optimization at the bioinformatics level is still urgently needed for routine use. However, the greatest limitation of WGS analysis is the completeness and the correctness of the reference database being used and the conversion of floods of data into actionable results. The WGS bioinformatics pipeline (EpiSeqTM) we used here can easily generate a uniform database and associated metadata for epidemiological applications.
Read the peer-reviewed publication