论文标题
Treebanking用户生成的内容:基于UD的指南,语料库和统一建议的概述
Treebanking User-Generated Content: a UD Based Overview of Guidelines, Corpora and Unified Recommendations
论文作者
论文摘要
本文对主要的语言现象进行了讨论,该讨论在网络和社交媒体上发现的用户生成的文本的分析中造成了困难,并提出了一套在义话分析的普遍依赖(UD)框架内治疗的注释指南。一方面给出了越来越多的树仓,其含有用户生成的内容的及其在这些资源中的待遇有些不一致,本文的目的是双重的:(1)提供一个凝结的,虽然全面的,虽然是基于可用的文献的凝结,但基于他们的主要特征以及对他们的主要特征和拟议的概述,并提出了ob toseriate and Unitative croritation Creritation Creritation Creriate Creriate of unteration Creritation Creriate and Untivation Creritation Creriate,(指南,以促进对这些类型文本中发现的特定现象的一致治疗。本文的总体目标是为有兴趣在UD中开发类似资源的研究人员提供一个共同的框架,从而促进了跨语言的一致性,这是一个始终是UD精神核心的原则。
This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this article is twofold: (1) to provide a condensed, though comprehensive, overview of such treebanks -- based on available literature -- along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The overarching goal of this article is to provide a common framework for researchers interested in developing similar resources in UD, thus promoting cross-linguistic consistency, which is a principle that has always been central to the spirit of UD.