论文标题

解析现代英语以进行语言搜索

Parsing Early Modern English for Linguistic Search

论文作者

Kulick, Seth, Ryant, Neville

论文摘要

我们调查了过去几年中NLP的进步是否使大大增加可用于历史语法研究的数据大小的问题。这汇集了NLP的许多常用工具 - 单词嵌入,标记和解析 - 在语言查询中使用自动注释的Corpora。我们使用经过数十亿个类似文本的单词训练的Elmo Embeddings培训了言论(POS)标记器和解析器,并在历史英语的语料库上训练。评估基于标准指标以及使用解析数据的查询搜索的准确性。

We investigate the question of whether advances in NLP over the last few years make it possible to vastly increase the size of data usable for research in historical syntax. This brings together many of the usual tools in NLP - word embeddings, tagging, and parsing - in the service of linguistic queries over automatically annotated corpora. We train a part-of-speech (POS) tagger and parser on a corpus of historical English, using ELMo embeddings trained over a billion words of similar text. The evaluation is based on the standard metrics, as well as on the accuracy of the query searches using the parsed data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源