论文标题

使用SVM和幼稚贝叶斯的仇恨言语分类

Hate Speech Classification Using SVM and Naive BAYES

论文作者

Asogwa, D. C, Chukwuneke, C. I, Ngene, C. C, Anigbogu, G. N

论文摘要

以前仅限于口头交流的仇恨传播迅速通过互联网移动。允许人们讨论和表达意见的社交媒体和社区论坛正在成为传播仇恨信息的平台。许多国家已经制定了法律以避免在线仇恨言论。他们使那些经营社交媒体负责其未能消除仇恨言论的公司。但是随着在线内容的不断增长,仇恨言论的传播也随之增长,由于数据昂贵且耗时,因此在线平台上对仇恨言论的手动分析是不可行的。因此,重要的是自动处理在线用户内容以检测和删除在线媒体中的仇恨言论。许多最近的方法都遇到了可解释性问题,这意味着很难理解为什么系统做出决定。通过这项工作,使用支持向量机(SVM)和幼稚的贝叶斯算法提出了一些有关自动检测仇恨消息问题的解决方案。与其他方法相比,这是几乎最先进的性能,同时更简单,更容易解释的决策。该技术的经验评估导致SVM和NB的分类精度分别在测试集中分别为99%和50%。 关键字:分类;仇恨言论;功能提取,算法,监督学习

The spread of hatred that was formerly limited to verbal communications has rapidly moved over the Internet. Social media and community forums that allow people to discuss and express their opinions are becoming platforms for the spreading of hate messages. Many countries have developed laws to avoid online hate speech. They hold the companies that run the social media responsible for their failure to eliminate hate speech. But as online content continues to grow, so does the spread of hate speech However, manual analysis of hate speech on online platforms is infeasible due to the huge amount of data as it is expensive and time consuming. Thus, it is important to automatically process the online user contents to detect and remove hate speech from online media. Many recent approaches suffer from interpretability problem which means that it can be difficult to understand why the systems make the decisions they do. Through this work, some solutions for the problem of automatic detection of hate messages were proposed using Support Vector Machine (SVM) and Naïve Bayes algorithms. This achieved near state-of-the-art performance while being simpler and producing more easily interpretable decisions than other methods. Empirical evaluation of this technique has resulted in a classification accuracy of approximately 99% and 50% for SVM and NB respectively over the test set. Keywords: classification; hate speech; feature extraction, algorithm, supervised learning

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源