论文标题
对使用神经网络进行学习特征相互作用的本地Optima的研究
A study of local optima for learning feature interactions using neural networks
论文作者
论文摘要
在许多领域,例如生物信息学,高能量物理学,发电分布等,都希望学习选择少量变量的非线性模型,并明确对它们之间的相互作用进行明确建模以预测响应。原则上,神经网络(NNS)可以完成此任务,因为它们可以很好地对非线性特征相互作用进行建模。但是,NNS需要大量的培训数据才能具有良好的概括。在本文中,我们研究了数据传说制度,其中NN接受了相对少量的培训数据的培训。为此,我们研究了NNS的特征选择,众所周知可以改善线性模型的概括。作为具有特征选择和特征相互作用的数据的极端情况,我们研究了具有无关变量的XOR样数据。我们在实验上观察到,XOR样数据上的跨透明损失函数具有许多非等效的局部Optiama,并且局部Optima的数量随着无关变量的数量而成倍增长。为了处理局部最小值和特征选择,我们提出了一种节点修剪和特征选择算法,即使有无关的变量,也可以提高NNS找到更好的局部最小值的能力。最后,我们表明,可以使用修剪来改善NN在实际数据集上的性能,从而在少数功能上获得紧凑的网络,具有良好的预测和可解释性。
In many fields such as bioinformatics, high energy physics, power distribution, etc., it is desirable to learn non-linear models where a small number of variables are selected and the interaction between them is explicitly modeled to predict the response. In principle, neural networks (NNs) could accomplish this task since they can model non-linear feature interactions very well. However, NNs require large amounts of training data to have a good generalization. In this paper we study the datastarved regime where a NN is trained on a relatively small amount of training data. For that purpose we study feature selection for NNs, which is known to improve generalization for linear models. As an extreme case of data with feature selection and feature interactions we study the XOR-like data with irrelevant variables. We experimentally observed that the cross-entropy loss function on XOR-like data has many non-equivalent local optima, and the number of local optima grows exponentially with the number of irrelevant variables. To deal with the local minima and for feature selection we propose a node pruning and feature selection algorithm that improves the capability of NNs to find better local minima even when there are irrelevant variables. Finally, we show that the performance of a NN on real datasets can be improved using pruning, obtaining compact networks on a small number of features, with good prediction and interpretability.