Optimize_prime@dravidianlangtech-acl2022：泰米尔语中的滥用评论检测

论文标题

Optimize_prime@dravidianlangtech-acl2022：泰米尔语中的滥用评论检测

Optimize_Prime@DravidianLangTech-ACL2022: Abusive Comment Detection in Tamil

论文作者

Patankar, Shantanu, Gokhale, Omkar, Litake, Onkar, Mandke, Aditya, Kadam, Dipali

论文摘要

本文试图解决低资源指示语言中滥用评论检测的问题。虐待评论是对某人或一群人冒犯的陈述。这些评论针对属于特定种族，性别，种姓，种族，性行为等的个人。虐待评论检测是一个重大问题，尤其是在社交媒体使用者最近增加的情况下。本文介绍了我们的团队使用的方法-ACL 2022共享任务“在泰米尔语中的虐待评论检测”中使用的方法。此任务将YouTube注释在泰米尔语中检测和分类，并将英语codemixed格式分为多个类别。我们使用了三种方法来优化我们的结果：集成模型，复发性神经网络和变压器。在泰米尔语数据中，Muril和XLM-Roberta是我们表现最好的模型，其宏观平均F1得分为0.43。此外，对于混合数据的数据，Muril和M-Bert提供了次灰的结果，其宏观平均F1得分为0.45。

This paper tries to address the problem of abusive comment detection in low-resource indic languages. Abusive comments are statements that are offensive to a person or a group of people. These comments are targeted toward individuals belonging to specific ethnicities, genders, caste, race, sexuality, etc. Abusive Comment Detection is a significant problem, especially with the recent rise in social media users. This paper presents the approach used by our team - Optimize_Prime, in the ACL 2022 shared task "Abusive Comment Detection in Tamil." This task detects and classifies YouTube comments in Tamil and Tamil- English Codemixed format into multiple categories. We have used three methods to optimize our results: Ensemble models, Recurrent Neural Networks, and Transformers. In the Tamil data, MuRIL and XLM-RoBERTA were our best performing models with a macro-averaged f1 score of 0.43. Furthermore, for the Code-mixed data, MuRIL and M-BERT provided sub-lime results, with a macro-averaged f1 score of 0.45.

下载PDF全文

下载文献需遵守相关版权规定

论文标题