Comparing Sanskrit Texts for Critical Editions: The Sequences Move Problem

Authors: Nicolas Béchet and Marc Csernel

Polibits, Vol. 45, pp. 27-35, 2012.

Abstract: A critical edition takes into account various versions of the same text in order to show the differences between two distinct versions, in terms of words that have been missing, changed, omitted or displaced. Traditionally, Sanskrit is written without spaces between words, and the word order can be changed without altering the meaning of a sentence. This paper describes the characteristics which make Sanskrit text comparisons a specific matter. It presents two different methods for comparing Sanskrit texts, which can be used to develop a computer assisted critical edition. The first one method uses the L.C.S., while the second one uses the global alignment algorithm. Comparing them, we see that the second method provides better results, but that neither of these methods can detect when a word or a sentence fragment has been moved. We then present a method based on N-gram that can detect such a movement when it is not too far from its original location. We show how the method behaves on several examples.

Keywords: Sanskrit, text alignment

PDF: Comparing Sanskrit Texts for Critical Editions: The Sequences Move Problem
PDF: Comparing Sanskrit Texts for Critical Editions: The Sequences Move Problem