论文标题
强化有毒的语音探测器,以防止遮盖毒性
Fortifying Toxic Speech Detectors Against Veiled Toxicity
论文作者
论文摘要
现代有毒的语音探测器无能为力地承认伪装的令人反感的语言,例如故意避免已知有毒词典的对抗性攻击或隐性偏见的表现。为这种掩盖的毒性构建大型注释的数据集可能非常昂贵。在这项工作中,我们提出了一个框架,旨在强化现有的有毒语音探测器,而没有大量标记的遮盖毒性语料库。仅使用少数探测示例来表达数量级,更多的犯罪罪行。我们通过这些发现的进攻性例子来增强有毒语音探测器的训练数据,从而使其更强大,同时保留其在检测明显的毒性方面的效用。
Modern toxic speech detectors are incompetent in recognizing disguised offensive language, such as adversarial attacks that deliberately avoid known toxic lexicons, or manifestations of implicit bias. Building a large annotated dataset for such veiled toxicity can be very expensive. In this work, we propose a framework aimed at fortifying existing toxic speech detectors without a large labeled corpus of veiled toxicity. Just a handful of probing examples are used to surface orders of magnitude more disguised offenses. We augment the toxic speech detector's training data with these discovered offensive examples, thereby making it more robust to veiled toxicity while preserving its utility in detecting overt toxicity.