论文标题
尼日利亚Pidgin英语的语义富集用于上下文情感分类
Semantic Enrichment of Nigerian Pidgin English for Contextual Sentiment Classification
论文作者
论文摘要
多年来,尼日利亚英语改编(Pidgin)通过多语言代码切换,代码混合和语言适应性发展。尽管Pidgin在拼写和发音方面都保留了许多单词,但这些单词的基本含义已经发生了很大变化。例如,“生姜”不是一种植物,而是动机和“坦克”的表达不是容器,而是感激之情。这意味着目前使用尼日利亚社交媒体文本的直接英语情感分析的方法是最佳的,因为它将无法捕获这些单词当代含义的语义变化和上下文演变。实际上,尽管尼日利亚Pidgin改编中的许多单词与标准英语相同,但完全基于英语的情感分析模型并非旨在捕获单独使用或代码混合时捕获尼日利亚Pidgin的全部意图。通过增强稀缺的人类标记为代码变化的文本,并具有丰富的综合代码改革文本和含义,我们在情感评分方面取得了重大改善。我们的研究探讨了如何在本质上存在重要词本地化的内部代码混合和切换环境中理解情感。这项工作呈现了300个Vader Lexicon兼容的尼日利亚Pidgin情感标记及其得分以及14,000 Gold Standard Standard Nigerian Pidgin Tweets及其观点标签。
Nigerian English adaptation, Pidgin, has evolved over the years through multi-language code switching, code mixing and linguistic adaptation. While Pidgin preserves many of the words in the normal English language corpus, both in spelling and pronunciation, the fundamental meaning of these words have changed significantly. For example,'ginger' is not a plant but an expression of motivation and 'tank' is not a container but an expression of gratitude. The implication is that the current approach of using direct English sentiment analysis of social media text from Nigeria is sub-optimal, as it will not be able to capture the semantic variation and contextual evolution in the contemporary meaning of these words. In practice, while many words in Nigerian Pidgin adaptation are the same as the standard English, the full English language based sentiment analysis models are not designed to capture the full intent of the Nigerian pidgin when used alone or code-mixed. By augmenting scarce human labelled code-changed text with ample synthetic code-reformatted text and meaning, we achieve significant improvements in sentiment scoring. Our research explores how to understand sentiment in an intrasentential code mixing and switching context where there has been significant word localization.This work presents a 300 VADER lexicon compatible Nigerian Pidgin sentiment tokens and their scores and a 14,000 gold standard Nigerian Pidgin tweets and their sentiments labels.