安第斯山脉（Semeval-2020）任务12：一个共同训练的伯特多语言模型，用于进攻性语言检测

论文标题

安第斯山脉（Semeval-2020）任务12：一个共同训练的伯特多语言模型，用于进攻性语言检测

ANDES at SemEval-2020 Task 12: A jointly-trained BERT multilingual model for offensive language detection

论文作者

Pérez, Juan Manuel, Arango, Aymé, Luque, Franco

论文摘要

本文描述了我们参与Semeval-2020任务12：多语言进攻性语言检测。我们通过微调多语言BERT来共同训练了一个模型，以解决所有建议的语言：英语，丹麦，土耳其语，希腊语和阿拉伯语。我们的单个模型具有竞争性的结果，尽管在所有语言上共享相同的参数，但性能接近表现最佳系统。还进行了零拍和很少的实验，以分析这些语言之间的转移性能。我们将代码公开以进行进一步研究

This paper describes our participation in SemEval-2020 Task 12: Multilingual Offensive Language Detection. We jointly-trained a single model by fine-tuning Multilingual BERT to tackle the task across all the proposed languages: English, Danish, Turkish, Greek and Arabic. Our single model had competitive results, with a performance close to top-performing systems in spite of sharing the same parameters across all languages. Zero-shot and few-shot experiments were also conducted to analyze the transference performance among these languages. We make our code public for further research

下载PDF全文

下载文献需遵守相关版权规定

论文标题