论文标题

XED:用于情感分析和情感检测的多语言数据集

XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

论文作者

Öhman, Emily, Pàmies, Marc, Kajava, Kaisla, Tiedemann, Jörg

论文摘要

我们介绍了Xed,这是一个多语言的细粒情感数据集。该数据集由人类通知的Finnish(25K)和英语句子(30k)组成,以及预计的30种其他语言的注释,为许多低资源语言提供了新的资源。我们使用plutchik的核心情绪来注释数据集,并添加中性,以创建多标签多类数据集。使用特定于语言的BERT模型和SVM仔细评估了数据集,以表明XED与其他类似数据集的同步性能执行,因此是情感分析和情感检测的有用工具。

We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik's core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源