论文标题

4chan&8chan嵌入

4chan & 8chan embeddings

论文作者

Voué, Pierre, De Smedt, Tom, De Pauw, Guy

论文摘要

我们从4chan和8chan上的公开 / POL /消息板收集了3000万个消息,并将其编译成有毒语言使用的模型。训练有素的单词嵌入(0.4GB)是免费发布的,对于进一步研究有毒话语或增强仇恨言论检测系统可能很有用:https://textgain.com/8chan。

We have collected over 30M messages from the publicly available /pol/ message boards on 4chan and 8chan, and compiled them into a model of toxic language use. The trained word embeddings (0.4GB) are released for free and may be useful for further study on toxic discourse or to boost hate speech detection systems: https://textgain.com/8chan.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源