XED：用于情感分析和情感检测的多语言数据集

论文标题

XED：用于情感分析和情感检测的多语言数据集

XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

论文作者

Öhman, Emily, Pàmies, Marc, Kajava, Kaisla, Tiedemann, Jörg

论文摘要

我们介绍了Xed，这是一个多语言的细粒情感数据集。该数据集由人类通知的Finnish（25K）和英语句子（30k）组成，以及预计的30种其他语言的注释，为许多低资源语言提供了新的资源。我们使用plutchik的核心情绪来注释数据集，并添加中性，以创建多标签多类数据集。使用特定于语言的BERT模型和SVM仔细评估了数据集，以表明XED与其他类似数据集的同步性能执行，因此是情感分析和情感检测的有用工具。

We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik's core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题