The CPHL project, funded in part by the National Science Foundation, is a collaborative effort involving linguistics, computer science, and statistics, aimed at various goals.
  1. Producing and maintaining real linguistic datasets, in particular of Indo-European languages.
  2. Formulating statistical models that capture the evolution of historical linguistic data.
  3. Designing simulation tools and accuracy measures for generating synthetic data for studying the performance of reconstruction methods.
  4. Developing and implementing statistically-based as well as combinatorial methods for reconstructing language phylogenies, including phylogenetic networks.
An article appearing in the New York Times described the earlier work done in this project. Click here to read the article.

NSF support for this project was provided through grants 0312911 and 0312830.

[Back to Top]
Mathematical and Computational Approaches to Linguistic Phylogeny
Banff International Research Station
May 27 - June 3, 2006

Mathematical Modeling and Analysis of Language Diversification
Harvard University
March 21, 2005


[Back to Top]


Copyright Notice: The documents accessible through these links are included by the author as a means to ensure convenient electronic dissemination of technical work on a non-commercial basis. Copyright and all rights therein are maintained by the copyright holders (the authors or the publishers), notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's and publisher's copyright. In particular, these works may not be re-posted without permission of the copyright holders.

1 F. Barbancon, T. Warnow, D. Ringe, S. Evans, and L. Nakhleh, "An experimental study comparing linguistic phylogenetic reconstruction methods." Proceedings of the conference Languages and Genes, held at UC Santa Barbara (Cambridge University Press). To appear, 2009. PDF
2 D. Ringe and T. Warnow, "Linguistic history and computational cladistics." In: Origin and Evolution of Languages: Approaches, Models, Paradigms, B. Laks (ed.), Equinox Publishing, March 2008. PDF
3 L. Nakhleh, T. Warnow, D. Ringe, and S.N. Evans, "A Comparison of Phylogenetic Reconstruction Methods on an IE Dataset." Transactions of the Philological Society, 3(2): 171-192, 2005. (The full version of the paper includes more details about running the methods, the datasets, etc.) PDF
4 L. Nakhleh, D. Ringe, and T. Warnow, "Perfect Phylogenetic Networks: A New Methodology for Reconstructing the Evolutionary History of Natural Languages." LANGUAGE, Journal of the Linguistic Society of America, 81(2):382-420, 2005. PDF
5 T. Warnow, S.N. Evans, D. Ringe, and L. Nakhleh, "A Stochastic model of language evolution that incorporates homoplasy and borrowing." Phylogenetic Methods and the Prehistory of Languages. Cambridge, UK, July 2004. PDF
6 T. Warnow, S.N. Evans, D. Ringe, and L. Nakhleh, "Stochastic models of language evolution and an application to the Indo-European family of languages." Technical report, Department of Statistics, The University of California, Berkeley, 2004. PDF
7 S.N. Evans, Don Ringe, and Tandy Warnow, "Inference of divergence times as a statistical inverse problem." Phylogenetic Methods and the Prehistory of Languages. Cambridge, UK, July 2004. PDF
8 S.N. Evans, and Tandy Warnow, "Unidentifiable divergence times in rates-across-sites models." IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(3): 130-134, 2004. PDF
9 E. Erdem, V. Lifschitz, L. Nakhleh, and D. Ringe, "Reconstructing the evolutionary history of Indo-European languages using answer set programming." Proceedings of the 5th International Symposium on Practical Aspects of Declarative Languages (PADL 03), 2003. PDF
10 D. Ringe, Tandy Warnow, and A. Taylor, "Indo-European and Computational Cladistics." Transactions of the Philological Society, 100(1):59-129, 2002. PDF
11 M. Bonet, C.A. Phillips, T. Warnow, and S. Yooseph, "Constructing evolutionary trees in the presence of polymorphic characters." SIAM J. Computing, 29(1):103-131.
12 D. Ringe, " Tocharian class II presents and subjunctives and the reconstruction of the Proto-Indo-European verb." Tocharian and Indo-European Studies 9:121-142, 2000.
13 D. Ringe, T. Warnow, and A. Taylor, "Computational cladistics and the position of Tocharian ." In: The Bronze Age and early Iron Age peoples of eastern Central Asia (1998, ed. Victor Mair; JIES Monograph 26), pp. 391-414.
14 T. Warnow, "Mathematical approaches to comparative linguistics." Proceedings of the National Academy of Sciences, Vol. 94, pp. 6585-6590, 1997.
15 T. Warnow, D. Ringe, and A. Taylor, "Reconstructing the evolutionary history of natural languages ." Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 1996, pp. 314-322.
16 T. Warnow, D. Ringe, and A. Taylor, "Reconstructing the evolutionary history of natural languages." IRCS Report 95-16. Philadelphia (1995): Institute for Research in Cognitive Science, University of Pennsylvania. Technical report, 18 pp.

[Back to Top]


Linguists Don Ringe and Ann Taylor have produced two datasets of 24 languages, representing the 12 major subgroups of IE languages. The screened dataset is produced from the unscreened dataset by removing all characters that clearly exhibited parallel evolution and/or back-mutation (those two phenomena are usually referred to as "homoplasy").
[Back to Top]


Software tools that we are in the process of making public include:
[Back to Top]