论文标题
自动汇编学术写作资源,并通过非正式的单词识别和释义系统进行评估
Automatic Compilation of Resources for Academic Writing and Evaluating with Informal Word Identification and Paraphrasing System
论文作者
论文摘要
我们提出了自动建立学术写作资源的第一种方法。目的是建立一个自动编辑文本的写作援助系统,以便更好地遵守学术写作风格。除了现有的学术资源(例如当代美国英语(COCA)学术单词列表,新的学术单词列表和学术搭配清单)之外,我们还探索了如何动态建立这些资源,这些资源将用于自动识别非正式或非学术单词或短语。资源是使用可以扩展到不同域和语言的不同通用方法来编译的。我们通过系统实施来描述对资源的评估。该系统由非正式的单词识别(IWI),学术候选术产生和释义排名组成。为了生成候选人并在上下文中对其进行排名,我们使用了PPDB和WordNet释义资源。我们在上下文(CONCO)中使用这些概念“全词”词汇替代数据集用于非正式单词识别和释义生成实验。我们的非正式单词识别组件的F-1得分为82%,大大优于分层分类器基线。这项工作的主要贡献是独立于领域的方法,用于为写作辅助工具建立目标资源。
We present the first approach to automatically building resources for academic writing. The aim is to build a writing aid system that automatically edits a text so that it better adheres to the academic style of writing. On top of existing academic resources, such as the Corpus of Contemporary American English (COCA) academic Word List, the New Academic Word List, and the Academic Collocation List, we also explore how to dynamically build such resources that would be used to automatically identify informal or non-academic words or phrases. The resources are compiled using different generic approaches that can be extended for different domains and languages. We describe the evaluation of resources with a system implementation. The system consists of an informal word identification (IWI), academic candidate paraphrase generation, and paraphrase ranking components. To generate candidates and rank them in context, we have used the PPDB and WordNet paraphrase resources. We use the Concepts in Context (CoInCO) "All-Words" lexical substitution dataset both for the informal word identification and paraphrase generation experiments. Our informal word identification component achieves an F-1 score of 82%, significantly outperforming a stratified classifier baseline. The main contribution of this work is a domain-independent methodology to build targeted resources for writing aids.