论文标题
元模式关注得分:一种新的评估措施,具有人类价值的多分类剂
Meta Pattern Concern Score: A Novel Evaluation Measure with Human Values for Multi-classifiers
论文作者
论文摘要
尽管高级分类器已越来越多地用于现实的安全 - 关键应用应用中,但如何正确评估在社区中特定人类价值的黑框模型仍然是一个关注的问题。这种人类价值包括惩罚不同程度上不同严重程度的错误案例,并在一般绩效中妥协以减少特定的危险案例。在本文中,我们提出了一种基于概率预测的抽象表示和预测信心的特许权的可调节阈值,提出了一种新的评估措施,称为元模式关注得分,以将人类价值引入多分类器中。从技术上讲,我们从两种常见指标的优点和缺点中学习,即基于混淆矩阵的评估措施和损失值,因此即使在一般任务下,我们的措施也作为他们的措施也有效,并且交叉熵损失成为我们措施的特殊情况。此外,我们的措施还可以通过动态调整学习率来改进模型培训。对四种模型和六个数据集进行的实验证实了我们度量的有效性和效率。案例研究表明,它不仅可以通过仅牺牲0.04%的训练准确性来降低0.53%的危险案例模型,而且还可以完善学习率以训练新模型的训练率平均超过原始型号,其本身降低了1.62%,而危险案例的数量却少了0.36%。
While advanced classifiers have been increasingly used in real-world safety-critical applications, how to properly evaluate the black-box models given specific human values remains a concern in the community. Such human values include punishing error cases of different severity in varying degrees and making compromises in general performance to reduce specific dangerous cases. In this paper, we propose a novel evaluation measure named Meta Pattern Concern Score based on the abstract representation of probabilistic prediction and the adjustable threshold for the concession in prediction confidence, to introduce the human values into multi-classifiers. Technically, we learn from the advantages and disadvantages of two kinds of common metrics, namely the confusion matrix-based evaluation measures and the loss values, so that our measure is effective as them even under general tasks, and the cross entropy loss becomes a special case of our measure in the limit. Besides, our measure can also be used to refine the model training by dynamically adjusting the learning rate. The experiments on four kinds of models and six datasets confirm the effectiveness and efficiency of our measure. And a case study shows it can not only find the ideal model reducing 0.53% of dangerous cases by only sacrificing 0.04% of training accuracy, but also refine the learning rate to train a new model averagely outperforming the original one with a 1.62% lower value of itself and 0.36% fewer number of dangerous cases.