论文标题

具有有限状态传感器的命名实体提取

Named Entity Extraction with Finite State Transducers

论文作者

Villalba, Diego Alexander Huérfano, Guzmán, Elizabeth León

论文摘要

我们描述了一个命名的实体标记系统,该系统需要最少的语言知识,并且可以应用于更多目标语言而没有实质性更改。该系统基于Brill的标记器的想法,这使其非常简单。使用监督的机器学习,我们构建了一系列自动机(或传感器),以标记给定的文本。最终型号完全由自动机组成,需要一个线性的标记时间。它是用Conll-$ 2002 $中提供的西班牙数据集测试的,该数据集获得了$ f_ {β= 1} $ $ 60 \%\%。$。此外,我们提出了一种用于构建用于编码所有学习上下文的最终传感器的算法。

We describe a named entity tagging system that requires minimal linguistic knowledge and can be applied to more target languages without substantial changes. The system is based on the ideas of the Brill's tagger which makes it really simple. Using supervised machine learning, we construct a series of automatons (or transducers) in order to tag a given text. The final model is composed entirely of automatons and it requires a lineal time for tagging. It was tested with the Spanish data set provided in the CoNLL-$2002$ attaining an overall $F_{β= 1}$ measure of $60\%.$ Also, we present an algorithm for the construction of the final transducer used to encode all the learned contextual rules.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源