论文标题
与新颖性检测的文本分类
Text Classification with Novelty Detection
论文作者
论文摘要
本文研究了检测文本分类中新颖或意外情况的问题。在传统的文本分类中,必须在培训中看到测试中的课程。但是,在许多应用程序中,情况并非如此,因为在测试中,我们可能会看到任何培训类别的意外实例。在本文中,我们提出了一种更有效的方法,将原始问题转换为配对匹配问题,然后输出两个实例属于同一类的可能性。在这种方法下,我们提出了两个模型。更有效的模型使用两对实例的两个嵌入矩阵作为CNN的两个通道。此类对的输出概率用于判断测试实例是来自可见类还是新颖/意外的。实验结果表明,该提出的方法基本上优于最先进的基线。
This paper studies the problem of detecting novel or unexpected instances in text classification. In traditional text classification, the classes appeared in testing must have been seen in training. However, in many applications, this is not the case because in testing, we may see unexpected instances that are not from any of the training classes. In this paper, we propose a significantly more effective approach that converts the original problem to a pair-wise matching problem and then outputs how probable two instances belong to the same class. Under this approach, we present two models. The more effective model uses two embedding matrices of a pair of instances as two channels of a CNN. The output probabilities from such pairs are used to judge whether a test instance is from a seen class or is novel/unexpected. Experimental results show that the proposed method substantially outperforms the state-of-the-art baselines.