论文标题
在稀疏的培训中,在哪里注意特征选择?
Where to Pay Attention in Sparse Training for Feature Selection?
论文作者
论文摘要
最近出现了基于神经网络的特征选择的新研究系列。尽管它优于经典方法,但它需要许多训练迭代才能融合和检测内容丰富的功能。对于具有大量样本或非常高维特征空间的数据集而言,计算时间变得非常长。在本文中,我们提出了一种基于稀疏自动编码器的特征选择的新的无监督方法。特别是,我们提出了一种新的稀疏培训算法,该算法在培训期间优化了模型的稀疏拓扑,以迅速关注信息功能。稀疏拓扑的基于注意力的适应性使经过几次训练迭代后可以快速检测信息。我们在10个不同类型的数据集上进行了广泛的实验,包括图像,语音,文本,人工和生物学。它们涵盖了广泛的特征,例如低维特征空间,以及很少的大型训练样本。我们提出的方法在选择信息功能方面优于最先进的方法,同时降低培训迭代和计算成本。此外,实验显示了我们在极度嘈杂的环境中我们方法的鲁棒性。
A new line of research for feature selection based on neural networks has recently emerged. Despite its superiority to classical methods, it requires many training iterations to converge and detect informative features. The computational time becomes prohibitively long for datasets with a large number of samples or a very high dimensional feature space. In this paper, we present a new efficient unsupervised method for feature selection based on sparse autoencoders. In particular, we propose a new sparse training algorithm that optimizes a model's sparse topology during training to pay attention to informative features quickly. The attention-based adaptation of the sparse topology enables fast detection of informative features after a few training iterations. We performed extensive experiments on 10 datasets of different types, including image, speech, text, artificial, and biological. They cover a wide range of characteristics, such as low and high-dimensional feature spaces, and few and large training samples. Our proposed approach outperforms the state-of-the-art methods in terms of selecting informative features while reducing training iterations and computational costs substantially. Moreover, the experiments show the robustness of our method in extremely noisy environments.