A computational approach to the study of semantic change in Latin: the case of Christian Latin vocabulary

Date:

Abstract:
This paper will employ computational methods (specifically, word embeddings) to implement the study of semantic change and of other lexical phenomena resulting from the spread of Christianity among speakers of Latin.

The study of semantic change is notoriously fraught with difficulties. While linguists have been able to identify and classify the various types of semantic change, the phenomenon is often deemed too chaotic for it to be possible to investigate it systematically. Because some of the catalysts of semantic change such as sociocultural and historical events are unpredictable, some insist that semantic change is also unpredictable. This view has been challenged in the past few decades. Part of this effort is led by computationally-oriented scholars, who are taking advantage of the increasing availability of digital corpora to gather information about trends of semantic change (Tahmasebi et al. 2018: 2). The digital corpora make it possible to conduct large-scale analyses of semantic change (i.e. across several hundreds of texts, or more), which up until recently would have been considered extremely time-consuming. Among the methods used to run these large-scale analyses are word embeddings, an extremely powerful tool which I adopt in this paper to upscale my own research on Christian Latin vocabulary.

This talk is a development from my presentation at LVLT 14, where I aimed to quantify the extent of the role of Christianity in the development of Late Latin and Romance vocabulary. The Paradebeispiele for the influence of Christianity on Latin (Löfstedt 1959: 81), parabola and parabolare, respectively came to mean ‘word’ and ‘to speak’ in some Romance languages, clearly substituting some relatively basic Classical Latin items (verbum and loquor). These could therefore be seen as an indicator of the intensity of the influence of Christian culture on Latin. My initial attempt to quantify the influence of Christianity consisted of (i) the close-reading of a notable Christian text, the Itinerarium Egeriae, (ii) the gathering of lexemes I identified as having undergone a change or as having replaced some other lexeme under the push of Christianity, and of (iii) the cross-checking of these same items with their Romance descendants to see if the Christian-led change or replacement had survived. But this approach can be challenged – who can guarantee that the meaning of those words was (un)stable between the Itinerarium Egeriae and the current Romance descendants? The only way to verify this would be to read texts containing those lexemes across the centuries.

The realisation that a manual analysis would only ever allow me to look at a small number of texts brought me to look for an answer in computational methods. This talk will then offer the results of a study which started with the same lexemes identified in my previous study, but which instead adopts word embeddings to answer my original question. Embeddings (a.k.a. word vectors) can be obtained by inputting a large amount of textual data into a tool which converts words into vectors representing their meaning. The data I am working with so far consists of large selection of texts from the LatinISE corpus (McGillivray and Kilgariff 2013) ranging from 300 BCE to 600 CE, although I plan to expand this with corpora of early Romance languages. The fact that embeddings have a spatial representation allows us to compare them both with (i) other vectors representing other words (to see what words they are similar to) and with (ii) other vectors representing the same word in different timeframes (to see how the same word has changed over time). The latter can be achieved by splitting the corpus into different chronological subcorpora. The separation I created for this purpose coincides with the first attestations of Christian Latin texts (ca. 200 CE). The analysis is then fine-grained by comparing embeddings found for an exclusively Christian post-200 CE subcorpus to those found for a contemporary non-Christian one. This makes it possible to see if a new Christian meaning is only employed in specialised texts or whether it affects the language as a whole. As a final step, I will provide a qualitative comparison between the results provided by the computational methods and those I found through philological analysis.

References:
Löfstedt, Einar. 1959. Late Latin. Oslo: Almqvist & Wiksell.
McGillivray, Barbara, and Adam Kilgarriff. 2013. “Tools for historical corpus research, and a corpus of Latin.” In New Methods in Historical Corpora, edited by Paul Bennett, Martin Durrell, Silke Scheible, and Richard J. Whitt, 247–256. Tübingen: Narr.
Tahmasebi, Nina, Lars Borin, and Adam Jatowt. 2021. “Survey of Computational Approaches to Diachronic Conceptual Change.” In Computational approaches to semantic change, edited by Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu, and Simon Hengchen, 1–91. Berlin: Language Science Press.

Download handout