通过同时培训神经网络和稀疏编码，从不完整的特征中学习

论文标题

通过同时培训神经网络和稀疏编码，从不完整的特征中学习

Learning from Incomplete Features by Simultaneous Training of Neural Networks and Sparse Coding

论文作者

Caiafa, Cesar F., Wang, Ziyao, Solé-Casals, Jordi, Zhao, Qibin

论文摘要

在本文中，解决了具有不完整功能的数据集上训练分类器的问题。我们假设每个数据实例都可以使用不同的特征（随机或结构化）子集。当每个数据样本收集并非所有功能时，这种情况通常发生在应用程序中。开发了一种新的监督学习方法来训练通用分类器（例如逻辑回归或深神经网络），仅使用每个样本的特征子集，同时假设在未知字典上的数据向量表示稀疏。确定了足够的条件，以便如果可以在不完整的观测值上训练分类器，以使其重建被超平面良好分离，则同一分类器也可以正确分离原始（未观察到的）数据样本。提出了有关合成和众所周知数据集的广泛仿真结果，以验证我们的理论发现，并证明了与传统数据插补方法和一种最先进的算法相比，提出方法的有效性。

In this paper, the problem of training a classifier on a dataset with incomplete features is addressed. We assume that different subsets of features (random or structured) are available at each data instance. This situation typically occurs in the applications when not all the features are collected for every data sample. A new supervised learning method is developed to train a general classifier, such as a logistic regression or a deep neural network, using only a subset of features per sample, while assuming sparse representations of data vectors on an unknown dictionary. Sufficient conditions are identified, such that, if it is possible to train a classifier on incomplete observations so that their reconstructions are well separated by a hyperplane, then the same classifier also correctly separates the original (unobserved) data samples. Extensive simulation results on synthetic and well-known datasets are presented that validate our theoretical findings and demonstrate the effectiveness of the proposed method compared to traditional data imputation approaches and one state-of-the-art algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题