论文标题

对100种语言的多标签情绪分析,具有动态权重的标签不平衡

Multi-Label Sentiment Analysis on 100 Languages with Dynamic Weighting for Label Imbalance

论文作者

Yilmaz, Selim F., Kaynak, E. Batuhan, Koç, Aykut, Dibeklioğlu, Hamdi, Kozat, Suleyman S.

论文摘要

我们调查了跨语性情感分析,由于其在市场研究,政治和社会科学等领域的应用,这引起了人们的关注。特别是,我们在多标签环境中介绍了情感之轮,在多标签环境中介绍了一个情感分析框架。我们介绍了一种新型的动态加权方法,该方法在训练过程中平衡了每个类别的贡献,这与以前的静态加权方法不同,该方法根据其类频率分配了非改变权重。此外,我们适应了焦点损失,从而有利于从单标签对象识别文献到我们的多标签设置的更难实例。此外,我们得出了一种选择最佳类特异性阈值的方法,该阈值在线性时间复杂性中最大化宏F1得分。通过一系列广泛的实验,我们表明我们的方法使用单个模型在9个不同语言中获得了9个指标中的7个指标的最先进性能,与公共基线相比,在半eval竞争中的表现最佳。我们公开分享我们的模型代码,该代码可以用100种语言进行情感分析,以促进进一步的研究。

We investigate cross-lingual sentiment analysis, which has attracted significant attention due to its applications in various areas including market research, politics and social sciences. In particular, we introduce a sentiment analysis framework in multi-label setting as it obeys Plutchik wheel of emotions. We introduce a novel dynamic weighting method that balances the contribution from each class during training, unlike previous static weighting methods that assign non-changing weights based on their class frequency. Moreover, we adapt the focal loss that favors harder instances from single-label object recognition literature to our multi-label setting. Furthermore, we derive a method to choose optimal class-specific thresholds that maximize the macro-f1 score in linear time complexity. Through an extensive set of experiments, we show that our method obtains the state-of-the-art performance in 7 of 9 metrics in 3 different languages using a single model compared to the common baselines and the best-performing methods in the SemEval competition. We publicly share our code for our model, which can perform sentiment analysis in 100 languages, to facilitate further research.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源