Quantitative Historical Linguistics



The project aims to uncover and clarify phylogenetic relationships between native South American languages using quantitative methods. The two main objectives of the projects are digitalization of the lexical resources on native South American languages and development of new and innovative computer-assisted methods to quantitatively analyze this information.

Most data available on native South American languages is predominantly in printed form. To quantitatively investigate the linguistic relationships among them, a large effort is made to digitize as many dictionaries and translated texts of South American languages as possible, starting with the more accepted larger families (e.g., Arawakan, Cariban, Tupían), and working our way through to the many smaller families and language isolates. By the end of the project we expect to have large body of digitized data which is expected to stimulate historical-comparative linguistic research also after the current project has ended. The list of the languages that we are currently working on can be found HERE.

The second, more general, objective of the project is to transform historical-comparative linguistics from a primarily handcrafted scholarly endeavor, performed by individual researchers, into a quantitative and collaborative field of research. In order to achieve this we are developing different algorithmic approaches to assist the historical-comparative approach to the reconstruction of the phylogeny of languages. The algorithms developed during the course of this project can be employed for quantitative language comparison in more general sense, both from synchronic and diachronic perspective. They are realized as a suite of open-source Python modules that can be used to analyze and compare languages at the phonetic/phonological, lexical and semantic level.

QuantHistLing is a five year project (2010-2015) funded by the European Research Council under a Starting Researcher Grant.