Data_Sheet_1_Genetic Diversity and Low Stratification of the Population of the United Arab Emirates.docx

With high consanguinity rates on the Arabian Peninsula, it would not have been unexpected if the population of the United Arab Emirates (UAE) was shown to be relatively homogenous. However, this study of 1000 UAE nationals provided a contrasting perspective, one of a relatively heterogeneous population. Located at the apex of Europe, Asia, and Africa, the observed diversity could be explained by a plethora of migration patterns since the first Out-of-Africa movement. A strategy to explore the extent of genetic variation of the population of the UAE is presented. The first step involved a comprehensive population stratification study that was instructive for subsequent whole genome sequencing (WGS) of suitable representatives (which is described elsewhere). When these UAE data were compared to previous smaller studies from the region, the findings were consistent with a population that is a diverse and admixed group of people. However, rather than sharp and distinctive clusters, cluster analysis reveals low levels of stratification throughout the population. UAE emirates exhibit high within-Emirate-distance/among-Emirate distance ratios. Supervised admixture analysis showed a continuous gradient of ancestral populations, suggesting that admixture on the south eastern tip of the Arabian Peninsula occurred gradually. When visualized using a unique technique that combined admixture ratios and principal component analysis (PCA), unappreciated diversity was revealed while mitigating projection bias of conventional PCA. We observe low population stratification in the UAE in terms of homozygosity versus separation cluster coefficients. This holds for the UAE in a global context as well as for isolated cluster analysis of the Emirati birthplaces. However, the subtle clustering observed in the Emirates reflects geographic proximity and historic migration events. The analytical strategy used here highlights the complementary nature of data from genotype array and WGS for anthropological studies. Specifically, genotype array data were instructive to select representative subjects for WGS. Furthermore, from the 2.3 million allele frequencies obtained from genotype arrays, we identified 46,481 loci with allele frequencies that were significantly different with respect to other world populations. This comparison of allele frequencies facilitates variant prioritization in common diseases. In addition, these loci bear great potential as biomarkers in anthropological and forensic studies.