论文标题

语音@srib在2020年的Semeval-20任务9和12:在社交媒体中堆叠的情感和进攻性检测的结合方法

Voice@SRIB at SemEval-2020 Task 9 and 12: Stacked Ensembling method for Sentiment and Offensiveness detection in Social Media

论文作者

Singh, Abhishek, Parmar, Surya Pratap Singh

论文摘要

在Twitter,Facebook和Reddit等社交媒体平台中,人们更喜欢使用代码混合语言,例如西班牙语 - 英语,印度语英语来表达他们的意见。在本文中,我们描述了我们使用的不同模型,使用外部数据集训练嵌入式,哨兵的结合方法和攻击任务。使用预训练的嵌入通常有助于多个任务,例如句子分类和机器翻译。在这项实验中,我们已经训练了经过训练的代码混合嵌入式和Twitter预先训练的嵌入到半陈述任务中。我们在数据集上的宏F1得分,精度,准确性和召回率上评估了我们的模型。我们打算表明,高参数调整和数据预处理步骤有助于改善分数。在我们的实验中,我们能够在评估后的Obsereval Greek语言子任务上实现0.886 F1-MaCro,而在评估期间最高为0.852。我们以0.756的最佳F1得分在Spanglish比赛中排名第三。 codalab用户名问28。

In social-media platforms such as Twitter, Facebook, and Reddit, people prefer to use code-mixed language such as Spanish-English, Hindi-English to express their opinions. In this paper, we describe different models we used, using the external dataset to train embeddings, ensembling methods for Sentimix, and OffensEval tasks. The use of pre-trained embeddings usually helps in multiple tasks such as sentence classification, and machine translation. In this experiment, we haveused our trained code-mixed embeddings and twitter pre-trained embeddings to SemEval tasks. We evaluate our models on macro F1-score, precision, accuracy, and recall on the datasets. We intend to show that hyper-parameter tuning and data pre-processing steps help a lot in improving the scores. In our experiments, we are able to achieve 0.886 F1-Macro on OffenEval Greek language subtask post-evaluation, whereas the highest is 0.852 during the Evaluation Period. We stood third in Spanglish competition with our best F1-score of 0.756. Codalab username is asking28.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源