“你真的是说你说的吗？” ：使用双语单词嵌入在印度英语代码混合数据中的讽刺检测

论文标题

“你真的是说你说的吗？” ：使用双语单词嵌入在印度英语代码混合数据中的讽刺检测

"Did you really mean what you said?" : Sarcasm Detection in Hindi-English Code-Mixed Data using Bilingual Word Embeddings

论文作者

Aggarwal, Akshita, Wadhawan, Anshul, Chaudhary, Anshima, Maurya, Kavita

论文摘要

随着世界各地人们对社交媒体平台的使用越来越多，许多有趣的NLP问题已经存在。其中之一就是在社交媒体文本中发现讽刺。我们提出了一系列推文，用于训练自定义单词嵌入式和一个标记用于讽刺检测的数据集。我们提出了一种基于深度学习的方法，以解决印度英语代码中讽刺检测的问题，使用源自FastText和Word2Vec方法的双语单词嵌入。我们尝试了各种深度学习模型，包括CNN，LSTMS，双向LSTMS（有和没有注意力）。我们能够用我们的深度学习模型胜过所有最先进的表演，而基于注意力的双向LSTMS可以表现出78.49％的最佳性能。

With the increased use of social media platforms by people across the world, many new interesting NLP problems have come into existence. One such being the detection of sarcasm in the social media texts. We present a corpus of tweets for training custom word embeddings and a Hinglish dataset labelled for sarcasm detection. We propose a deep learning based approach to address the issue of sarcasm detection in Hindi-English code mixed tweets using bilingual word embeddings derived from FastText and Word2Vec approaches. We experimented with various deep learning models, including CNNs, LSTMs, Bi-directional LSTMs (with and without attention). We were able to outperform all state-of-the-art performances with our deep learning models, with attention based Bi-directional LSTMs giving the best performance exhibiting an accuracy of 78.49%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题