论文标题
YOSM:新的Yoruba情感语料库,用于电影评论
yosm: A new yoruba sentiment corpus for movie reviews
论文作者
论文摘要
一个人彻底欣赏和推荐的电影可能会被他人讨厌。人类的一个特征是具有积极或负面的感觉的能力。为了自动对人类的感受进行分类和研究,自然语言处理,情感分析和意见挖掘的一个方面旨在了解人类对可能影响产品,社交媒体平台,政府或社会讨论甚至电影的几个问题的感受。关于高资源语言的情感分析的几项工作已经完成,而像约鲁巴这样的低资源语言已经被淘汰。由于适合低资源语言的数据集和语言体系结构的稀缺性,非洲语言“低资源语言”被忽略且未完全探索。因此,我们的注意力放在约鲁巴(Yoruba)上,以探讨尼日利亚电影评论的情感分析。该数据包括1500张电影评论,这些评论来自IMDB,烂番茄,Letterboxd,Cinemapointer和Nollyrated。我们使用最先进的预训练的语言模型(如Mbert和Afriberta)开发了情感分类模型,以对电影评论进行分类。
A movie that is thoroughly enjoyed and recommended by an individual might be hated by another. One characteristic of humans is the ability to have feelings which could be positive or negative. To automatically classify and study human feelings, an aspect of natural language processing, sentiment analysis and opinion mining were designed to understand human feelings regarding several issues which could affect a product, a social media platforms, government, or societal discussions or even movies. Several works on sentiment analysis have been done on high resource languages while low resources languages like Yoruba have been sidelined. Due to the scarcity of datasets and linguistic architectures that will suit low resource languages, African languages "low resource languages" have been ignored and not fully explored. For this reason, our attention is placed on Yoruba to explore sentiment analysis on reviews of Nigerian movies. The data comprised 1500 movie reviews that were sourced from IMDB, Rotten Tomatoes, Letterboxd, Cinemapointer and Nollyrated. We develop sentiment classification models using the state-of-the-art pre-trained language models like mBERT and AfriBERTa to classify the movie reviews.