在线话语中使用上下文分离器的讽刺检测

论文标题

在线话语中使用上下文分离器的讽刺检测

Sarcasm Detection using Context Separators in Online Discourse

论文作者

Pant, Kartikey, Dadu, Tanvi

论文摘要

讽刺是一种复杂的语音形式，在其中隐式传达了含义。作为一种复杂的表达形式，检测讽刺是一个艰苦的问题。认识讽刺的困难有许多陷阱，包括日常通信中的误解，这使我们越来越关注自动讽刺检测。在第二版的比喻语言处理（Figlang 2020）研讨会中，讽刺检测的共同任务发布了两个数据集，其中包含响应以及从Twitter和Reddit采样的上下文。在这项工作中，我们使用Roberta_large检测两个数据集中的讽刺。我们通过使用三种不同类型的输入（仅响应，上下文响应和上下文响应（分开），通过使用三种不同类型的输入来改善上下文嵌入模型的性能的重要性（分开）。我们表明，我们提出的架构在两个数据集中都具有竞争力。我们还表明，在上下文和目标响应之间增加了分离令牌会导致REDDIT数据集中F1得分的5.13％提高。

Sarcasm is an intricate form of speech, where meaning is conveyed implicitly. Being a convoluted form of expression, detecting sarcasm is an assiduous problem. The difficulty in recognition of sarcasm has many pitfalls, including misunderstandings in everyday communications, which leads us to an increasing focus on automated sarcasm detection. In the second edition of the Figurative Language Processing (FigLang 2020) workshop, the shared task of sarcasm detection released two datasets, containing responses along with their context sampled from Twitter and Reddit. In this work, we use RoBERTa_large to detect sarcasm in both the datasets. We further assert the importance of context in improving the performance of contextual word embedding based models by using three different types of inputs - Response-only, Context-Response, and Context-Response (Separated). We show that our proposed architecture performs competitively for both the datasets. We also show that the addition of a separation token between context and target response results in an improvement of 5.13% in the F1-score in the Reddit dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题