Datasheet1_Exploring how space, time, and sampling impact our ability to measure genetic structure across Plasmodium falciparum populations.pdf
A primary use of malaria parasite genomics is identifying highly related infections to quantify epidemiological, spatial, or temporal factors associated with patterns of transmission. For example, spatial clustering of highly related parasites can indicate foci of transmission and temporal differences in relatedness can serve as evidence for changes in transmission over time. However, for infections in settings of moderate to high endemicity, understanding patterns of relatedness is compromised by complex infections, overall high forces of infection, and a highly diverse parasite population. It is not clear how much these factors limit the utility of using genomic data to better understand transmission in these settings. In particular, further investigation is required to determine which patterns of relatedness we expect to see with high quality, densely sampled genomic data in a high transmission setting and how these observations change under different study designs, missingness, and biases in sample collection. Here we investigate two identity-by-state measures of relatedness and apply them to amplicon deep sequencing data collected as part of a longitudinal cohort in Western Kenya that has previously been analysed to identify individual-factors associated with sharing parasites with infected mosquitoes. With these data we use permutation tests, to evaluate several hypotheses about spatiotemporal patterns of relatedness compared to a null distribution. We observe evidence of temporal structure, but not of fine-scale spatial structure in the cohort data. To explore factors associated with the lack of spatial structure in these data, we construct a series of simplified simulation scenarios using an agent based model calibrated to entomological, epidemiological and genomic data from this cohort study to investigate whether the lack of spatial structure observed in the cohort could be due to inherent power limitations of this analytical method. We further investigate how our hypothesis testing behaves under different sampling schemes, levels of completely random and systematic missingness, and different transmission intensities.