论文标题
梵语细分进行了重新审视
Sanskrit Segmentation Revisited
论文作者
论文摘要
计算分析梵语文本需要在初始阶段进行适当的分割。为梵文文本细分开发了各种工具。其中,GérardHuet在梵语遗产引擎中的读者分析了基于一词参数的输入文本和段 - 诸如IIC,IFC,IFC,PR,SERT等和SANDHI(或过渡)的阶段,这些阶段发生在一个单词的末尾,下一个单词的初始部分。它借助各个阶段来争取所有可能的解决方案。这些阶段及其分析在句子解析器的领域中使用。但是,在细分中,除了确定与各个阶段形成的单词是否在形态上有效的单词之外,它们没有使用它们。本文试图通过忽略阶段详细信息(除少数情况)来修改上述细分器,并提出了一个概率功能,以优先考虑解决方案列表以提高顶部最有效的解决方案。
Computationally analyzing Sanskrit texts requires proper segmentation in the initial stages. There have been various tools developed for Sanskrit text segmentation. Of these, Gérard Huet's Reader in the Sanskrit Heritage Engine analyzes the input text and segments it based on the word parameters - phases like iic, ifc, Pr, Subst, etc., and sandhi (or transition) that takes place at the end of a word with the initial part of the next word. And it enlists all the possible solutions differentiating them with the help of the phases. The phases and their analyses have their use in the domain of sentential parsers. In segmentation, though, they are not used beyond deciding whether the words formed with the phases are morphologically valid. This paper tries to modify the above segmenter by ignoring the phase details (except for a few cases), and also proposes a probability function to prioritize the list of solutions to bring up the most valid solutions at the top.