论文标题
特征选择的顺序关注
Sequential Attention for Feature Selection
论文作者
论文摘要
功能选择是为机器学习模型选择一个特征子集的问题,该模型最大化模型质量受预算限制。对于神经网络,先前的方法,包括基于$ \ ell_1 $正则化,注意力和其他技术的方法,通常会在一个评估回合中选择整个功能子集,忽略选择过程中功能的残留值,即,鉴于功能的边际贡献,鉴于已经选择了其他功能。我们提出了一种称为顺序注意的特征选择算法,该算法可为神经网络获得最新的经验结果。该算法基于有效的贪婪前向选择的一通实现,并在每个步骤中使用注意力权重作为特征重要性的代理。我们通过表明对这种设置的适应性等同于经典的正交匹配追求(OMP)算法,从而对我们的线性回归算法进行理论见解,从而继承其所有可证明的保证。我们的理论和经验分析为注意力及其与过度参数化的联系提供了新的解释,这可能具有独立的利益。
Feature selection is the problem of selecting a subset of features for a machine learning model that maximizes model quality subject to a budget constraint. For neural networks, prior methods, including those based on $\ell_1$ regularization, attention, and other techniques, typically select the entire feature subset in one evaluation round, ignoring the residual value of features during selection, i.e., the marginal contribution of a feature given that other features have already been selected. We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical results for neural networks. This algorithm is based on an efficient one-pass implementation of greedy forward selection and uses attention weights at each step as a proxy for feature importance. We give theoretical insights into our algorithm for linear regression by showing that an adaptation to this setting is equivalent to the classical Orthogonal Matching Pursuit (OMP) algorithm, and thus inherits all of its provable guarantees. Our theoretical and empirical analyses offer new explanations towards the effectiveness of attention and its connections to overparameterization, which may be of independent interest.