论文标题

VEP和Karelian语言的开放语料库:概述和应用程序

The Open corpus of the Veps and Karelian languages: overview and applications

论文作者

Boyko, Tatyana, Zaitseva, Nina, Krizhanovskaya, Natalia, Krizhanovsky, Andrew, Novak, Irina, Pellinen, Nataliya, Rodionova, Aleksandra

论文摘要

卡雷利亚共和国的波罗的海语言的研究越来越重视是语料库语言学的方法和工具。自2016年以来,Karelian研究中心的语言学家,数学家和程序员一直在与VEP和Karelian语言的开放语料库(VEPKAR)合作,这是2009年创建的VEPS语料库的延伸。 (语言,流派等)以及许多语言类别(由于我们之前创建的单词形式的生成器,实施了文本中的词汇和语法搜索)。编译了3000个文本的语料库,上传和标记了文本,将文本分类为语言,方言,类型和流派的系统,并创建了单词形式的生成器。未来的计划包括开发用于使用音频记录的语音模块和使用形态分析输出的句法标记模块。由于语料库管理器和正在进行的vepkar富集的持续功能进步,并具有新的材料和文本标记,因此用户可以处理广泛的科学和应用任务。在创建通用国家VEPKAR语料库时,其开发商和经理在19-21世纪在VEP和Karelian语言的状态中尽可能地保存和展示。

A growing priority in the study of Baltic-Finnic languages of the Republic of Karelia has been the methods and tools of corpus linguistics. Since 2016, linguists, mathematicians, and programmers at the Karelian Research Centre have been working with the Open Corpus of the Veps and Karelian Languages (VepKar), which is an extension of the Veps Corpus created in 2009. The VepKar corpus comprises texts in Karelian and Veps, multifunctional dictionaries linked to them, and software with an advanced system of search using various criteria of the texts (language, genre, etc.) and numerous linguistic categories (lexical and grammatical search in texts was implemented thanks to the generator of word forms that we created earlier). A corpus of 3000 texts was compiled, texts were uploaded and marked up, the system for classifying texts into languages, dialects, types and genres was introduced, and the word-form generator was created. Future plans include developing a speech module for working with audio recordings and a syntactic tagging module using morphological analysis outputs. Owing to continuous functional advancements in the corpus manager and ongoing VepKar enrichment with new material and text markup, users can handle a wide range of scientific and applied tasks. In creating the universal national VepKar corpus, its developers and managers strive to preserve and exhibit as fully as possible the state of the Veps and Karelian languages in the 19th-21st centuries.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源