简洁且可解释的多标签规则集

论文标题

简洁且可解释的多标签规则集

Concise and interpretable multi-label rule sets

论文作者

Ciaperoni, Martino, Xiao, Han, Gionis, Aristides

论文摘要

多标签分类变得越来越普遍，但对解释性的关注不多。在本文中，我们开发了一个多标签分类器，可以将其表示为简洁的简单“ IF-then”规则，因此，与Black-Box模型相比，它提供了更好的解释性。值得注意的是，我们的方法能够找到一小部分相关模式，这些模式导致准确的多标签分类，而现有的基于规则的分类器在搜索规则方面是近视和浪费的，需要大量规则才能达到高精度。特别是，我们制定了选择多标签规则以最大化目标函数的问题，该规则不仅考虑了标签上的歧视能力，还考虑了多样性。考虑多样性有助于避免冗余，从而控制解决方案集中的规则数量。为了解决上述最大化问题，我们提出了一种2-辅助算法，该算法依赖于一种新技术来采样高质量的规则。除了我们的理论分析外，我们还提供了彻底的实验评估，这表明我们的方法在先前工作中无与伦比的预测性能和可解释性之间提供了权衡。

Multi-label classification is becoming increasingly ubiquitous, but not much attention has been paid to interpretability. In this paper, we develop a multi-label classifier that can be represented as a concise set of simple "if-then" rules, and thus, it offers better interpretability compared to black-box models. Notably, our method is able to find a small set of relevant patterns that lead to accurate multi-label classification, while existing rule-based classifiers are myopic and wasteful in searching rules,requiring a large number of rules to achieve high accuracy. In particular, we formulate the problem of choosing multi-label rules to maximize a target function, which considers not only discrimination ability with respect to labels, but also diversity. Accounting for diversity helps to avoid redundancy, and thus, to control the number of rules in the solution set. To tackle the said maximization problem we propose a 2-approximation algorithm, which relies on a novel technique to sample high-quality rules. In addition to our theoretical analysis, we provide a thorough experimental evaluation, which indicates that our approach offers a trade-off between predictive performance and interpretability that is unmatched in previous work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题