Data_Sheet_1_Phylogenetic Typology.pdf (300.74 kB)

Data_Sheet_1_Phylogenetic Typology.pdf

dataset

posted on 2021-07-19, 05:07 authored by Gerhard Jäger, Johannes Wahle

In this article we propose a novel method to estimate the frequency distribution of linguistic variables while controlling for statistical non-independence due to shared ancestry. Unlike previous approaches, our technique uses all available data, from language families large and small as well as from isolates, while controlling for different degrees of relatedness on a continuous scale estimated from the data. Our approach involves three steps: First, distributions of phylogenies are inferred from lexical data. Second, these phylogenies are used as part of a statistical model to estimate transition rates between parameter states. Finally, the long-term equilibrium of the resulting Markov process is computed. As a case study, we investigate a series of potential word-order correlations across the languages of the world.