论文标题

Optimize_prime@dravidianlangtech-acl2022:泰米尔语中的滥用评论检测

Optimize_Prime@DravidianLangTech-ACL2022: Abusive Comment Detection in Tamil

论文作者

Patankar, Shantanu, Gokhale, Omkar, Litake, Onkar, Mandke, Aditya, Kadam, Dipali

论文摘要

本文试图解决低资源指示语言中滥用评论检测的问题。虐待评论是对某人或一群人冒犯的陈述。这些评论针对属于特定种族,性别,种姓,种族,性行为等的个人。虐待评论检测是一个重大问题,尤其是在社交媒体使用者最近增加的情况下。本文介绍了我们的团队使用的方法-ACL 2022共享任务“在泰米尔语中的虐待评论检测”中使用的方法。此任务将YouTube注释在泰米尔语中检测和分类,并将英语codemixed格式分为多个类别。我们使用了三种方法来优化我们的结果:集成模型,复发性神经网络和变压器。在泰米尔语数据中,Muril和XLM-Roberta是我们表现最好的模型,其宏观平均F1得分为0.43。此外,对于混合数据的数据,Muril和M-Bert提供了次灰的结果,其宏观平均F1得分为0.45。

This paper tries to address the problem of abusive comment detection in low-resource indic languages. Abusive comments are statements that are offensive to a person or a group of people. These comments are targeted toward individuals belonging to specific ethnicities, genders, caste, race, sexuality, etc. Abusive Comment Detection is a significant problem, especially with the recent rise in social media users. This paper presents the approach used by our team - Optimize_Prime, in the ACL 2022 shared task "Abusive Comment Detection in Tamil." This task detects and classifies YouTube comments in Tamil and Tamil- English Codemixed format into multiple categories. We have used three methods to optimize our results: Ensemble models, Recurrent Neural Networks, and Transformers. In the Tamil data, MuRIL and XLM-RoBERTA were our best performing models with a macro-averaged f1 score of 0.43. Furthermore, for the Code-mixed data, MuRIL and M-BERT provided sub-lime results, with a macro-averaged f1 score of 0.45.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源