论文标题
CL-XABSA:基于跨语言的情感分析的对比度学习
CL-XABSA: Contrastive Learning for Cross-lingual Aspect-based Sentiment Analysis
论文作者
论文摘要
作为一项在自然语言处理(NLP)领域的广泛研究,基于方面的情感分析(ABSA)是预测文本中相对于相应方面所表达的情感的任务。不幸的是,大多数语言缺乏足够的注释资源,因此越来越多的研究人员专注于跨语义方面的情感分析(XABSA)。但是,最近的研究仅集中于跨语性数据对准而不是模型对齐。为此,我们提出了一个新颖的框架CL-XABSA:基于跨语言的情感分析的对比度学习。基于对比度学习,我们在不同的语义空间中关闭具有相同标签的样品之间的距离,从而实现了不同语言的语义空间的收敛。具体而言,我们设计了两种对比策略,即代币嵌入(TL-CTE)和情感水平的对比度学习,对代币嵌入(SL-CTE)的对比度学习,以使源语言和目标语言的语义空间正规化以使其更加统一。由于我们的框架可以在培训期间以多种语言接收数据集,因此我们的框架不仅可以用于XABSA任务,还可以针对基于多语言的情感分析(MABSA)进行调整。为了进一步提高模型的性能,我们执行知识蒸馏技术利用未标记的目标语言的数据。在蒸馏XABSA任务中,我们进一步探讨了不同数据(源数据集,翻译数据集和代码转换数据集)的比较有效性。结果表明,所提出的方法在XABSA,蒸馏XABSA和MABSA的三个任务中具有一定的改进。为了获得可重复性,我们的本文代码可在https://github.com/gklmip/cl-xabsa上找到。
As an extensive research in the field of natural language processing (NLP), aspect-based sentiment analysis (ABSA) is the task of predicting the sentiment expressed in a text relative to the corresponding aspect. Unfortunately, most languages lack sufficient annotation resources, thus more and more recent researchers focus on cross-lingual aspect-based sentiment analysis (XABSA). However, most recent researches only concentrate on cross-lingual data alignment instead of model alignment. To this end, we propose a novel framework, CL-XABSA: Contrastive Learning for Cross-lingual Aspect-Based Sentiment Analysis. Based on contrastive learning, we close the distance between samples with the same label in different semantic spaces, thus achieving a convergence of semantic spaces of different languages. Specifically, we design two contrastive strategies, token level contrastive learning of token embeddings (TL-CTE) and sentiment level contrastive learning of token embeddings (SL-CTE), to regularize the semantic space of source and target language to be more uniform. Since our framework can receive datasets in multiple languages during training, our framework can be adapted not only for XABSA task but also for multilingual aspect-based sentiment analysis (MABSA). To further improve the performance of our model, we perform knowledge distillation technology leveraging data from unlabeled target language. In the distillation XABSA task, we further explore the comparative effectiveness of different data (source dataset, translated dataset, and code-switched dataset). The results demonstrate that the proposed method has a certain improvement in the three tasks of XABSA, distillation XABSA and MABSA. For reproducibility, our code for this paper is available at https://github.com/GKLMIP/CL-XABSA.