论文标题

赫尔菲:希伯来语 - 弗里克 - 五个平行的圣经语料库,具有跨语性的词素对齐

HELFI: a Hebrew-Greek-Finnish Parallel Bible Corpus with Cross-Lingual Morpheme Alignment

论文作者

Yli-Jyrä, Anssi, Purhonen, Josi, Liljeqvist, Matti, Antturi, Arto, Nieminen, Pekka, Räntilä, Kari M., Luoto, Valtter

论文摘要

25年前,手动构建了形态上对齐的希伯来书和希腊文字bitexts(伴随翻译的文本),以创建分析性的一致性(Luoto等,1997),以用于芬兰的卑鄙翻译。 Bitexts的创建者最近确保了发行商发布其细粒度对齐的许可,但对齐方式仍然取决于专有的第三方资源,例如受版权保护的文本版本和源文本的专有形态分析。在本文中,我们描述了一个非平凡的编辑过程,从创建原始的One-Olpose数据库开始,并仅使用免费的文本版本和注释以其重建结束。该过程产生了一个公开可用的数据集,其中包含(i)源文本及其翻译,(ii)形态学分析,(iii)跨语性词素对齐。

Twenty-five years ago, morphologically aligned Hebrew-Finnish and Greek-Finnish bitexts (texts accompanied by a translation) were constructed manually in order to create an analytical concordance (Luoto et al., 1997) for a Finnish Bible translation. The creators of the bitexts recently secured the publisher's permission to release its fine-grained alignment, but the alignment was still dependent on proprietary, third-party resources such as a copyrighted text edition and proprietary morphological analyses of the source texts. In this paper, we describe a nontrivial editorial process starting from the creation of the original one-purpose database and ending with its reconstruction using only freely available text editions and annotations. This process produced an openly available dataset that contains (i) the source texts and their translations, (ii) the morphological analyses, (iii) the cross-lingual morpheme alignments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源