论文标题

标准化语言数据:注释(正版前)法语的方法和工具

Standardizing linguistic data: method and tools for annotating (pre-orthographic) French

论文作者

Gabay, Simon, Clérice, Thibault, Camps, Jean-Baptiste, Tanguy, Jean-Baptiste, Gille-Levenson, Matthias

论文摘要

随着各个时期的大型语料库的发展,尽管垂直差异,但要增加语言注释(例如,引理,POS标签,形态注释)至关重要,以提高数据的互操作性。在本文中,我们在方法论上(提出注释原则)和技术(通过创建所需的培训数据和相关模型)上描述了(早期)现代法语(16-18 c。)的语言标记,并尽可能地考虑到现有的现有标准,以实现当代的现有标准,尤其是中世纪的法国。

With the development of big corpora of various periods, it becomes crucial to standardise linguistic annotation (e.g. lemmas, POS tags, morphological annotation) to increase the interoperability of the data produced, despite diachronic variations. In the present paper, we describe both methodologically (by proposing annotation principles) and technically (by creating the required training data and the relevant models) the production of a linguistic tagger for (early) modern French (16-18th c.), taking as much as possible into account already existing standards for contemporary and, especially, medieval French.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源