伯特（S）检测多词表达式

论文标题

伯特（S）检测多词表达式

BERT(s) to Detect Multiword Expressions

论文作者

Premasiri, Damith, Ranasinghe, Tharindu

论文摘要

多字表达式（MWES）呈现单词组，其中整体的含义不是源于其部分的含义。处理MWE的任务在许多自然语言处理（NLP）应用中至关重要，包括机器翻译和术语提取。因此，检测MWES是一个流行的研究主题。在本文中，我们在检测MWES的任务中探索了最新的神经变压器。我们在数据集中凭经验评估了Semeval-2016任务10：检测最小语义单元及其含义（DIMSUM）的几种变压器模型。我们表明，变压器模型的表现优于先前基于长期记忆（LSTM）的神经模型。代码和预培训模型将免费提供给社区。

Multiword expressions (MWEs) present groups of words in which the meaning of the whole is not derived from the meaning of its parts. The task of processing MWEs is crucial in many natural language processing (NLP) applications, including machine translation and terminology extraction. Therefore, detecting MWEs is a popular research theme. In this paper, we explore state-of-the-art neural transformers in the task of detecting MWEs.We empirically evaluate several transformer models in the dataset for SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM). We show that transformer models outperform the previous neural models based on long short-term memory (LSTM). The code and pre-trained model will be made freely available to the community.

下载PDF全文

下载文献需遵守相关版权规定

论文标题