论文标题
代码混合社交媒体文本上的多语言滥用识别
Multilingual Abusiveness Identification on Code-Mixed Social Media Text
论文作者
论文摘要
随着时间的流逝,社交媒体平台一直在看到其使用的采用和增长。在过去一年人们的互动,对话和表达受到限制的情况下,随着锁定的锁定,这种增长进一步加速了。保持平台免受滥用内容的安全性以获得更好的用户体验变得越来越重要。在英语社交媒体内容上已经完成了许多工作,但是在非英语社交媒体上的文本分析相对不受欢迎。非英语社交媒体内容在同一句子中使用了混合代码,音译和使用不同的经文面临的其他挑战。在这项工作中,我们提出了一种在包含指示语言的多语言MOJ数据集上滥用滥用的方法。我们的方法应对非英语社交媒体内容的共同挑战,也可以扩展到其他语言。
Social Media platforms have been seeing adoption and growth in their usage over time. This growth has been further accelerated with the lockdown in the past year when people's interaction, conversation, and expression were limited physically. It is becoming increasingly important to keep the platform safe from abusive content for better user experience. Much work has been done on English social media content but text analysis on non-English social media is relatively underexplored. Non-English social media content have the additional challenges of code-mixing, transliteration and using different scripture in same sentence. In this work, we propose an approach for abusiveness identification on the multilingual Moj dataset which comprises of Indic languages. Our approach tackles the common challenges of non-English social media content and can be extended to other languages as well.