论文标题

具有局部距离邻居功能的TOR Darknet中提高命名实体识别

Improving Named Entity Recognition in Tor Darknet with Local Distance Neighbor Feature

论文作者

Al-Nabki, Mhd Wesam, Jañez-Martino, Francisco, Vasco-Carofilis, Roberto A., Fidalgo, Eduardo, Velasco-Mata, Javier

论文摘要

嘈杂的用户生成的文本中的名称实体识别通常是通过合并外部信息资源(例如Gazetteers)来增强的一项艰巨任务。但是,宪报是特定于任务的,建造和维护的昂贵。本文采用并改善了Aguilar等人的方法。通过介绍一个名为“当地距离邻居”的新颖特征,该功能替代了地名词典。我们在W-NUT-2017数据集上测试了新方法,为指定实体的组,人和产品类别获得了最新结果。接下来,我们在W-NUT-2017数据集中添加了851个手动标记的样品,以说明与武器和毒品销售有关的TOR DarkNet中的命名实体。最后,我们的提案在此扩展数据集中获得了52.96%和50.57%的实体和表面F1分数,这证明了其对执法机构在TOR隐藏服务中检测指定实体的有用性。

Name entity recognition in noisy user-generated texts is a difficult task usually enhanced by incorporating an external resource of information, such as gazetteers. However, gazetteers are task-specific, and they are expensive to build and maintain. This paper adopts and improves the approach of Aguilar et al. by presenting a novel feature, called Local Distance Neighbor, which substitutes gazetteers. We tested the new approach on the W-NUT-2017 dataset, obtaining state-of-the-art results for the Group, Person and Product categories of Named Entities. Next, we added 851 manually labeled samples to the W-NUT-2017 dataset to account for named entities in the Tor Darknet related to weapons and drug selling. Finally, our proposal achieved an entity and surface F1 scores of 52.96% and 50.57% on this extended dataset, demonstrating its usefulness for Law Enforcement Agencies to detect named entities in the Tor hidden services.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源