论文标题
多视图数据中缺失值的插图
Imputation of missing values in multi-view data
论文作者
论文摘要
由多个不同的特征集(称为视图)描述了一组对象的数据称为多视图数据。当多视图数据中丢失值时,视图中的所有功能可能同时丢失。这可能会导致大量丢失的数据,尤其是在与高维度结合使用时,可以使有条件的插补方法在计算上不可行。但是,可以利用多视图结构来减少插补的复杂性和计算负载。我们基于现有的多视图学习算法介绍了一种新的插补方法。它在降低维度的空间中执行插补,以解决多视图上下文固有的计算挑战。我们将新插补方法的性能与模拟数据集和真实数据应用程序中的几种现有的归档算法进行了比较。结果表明,新的插补方法以低得多的计算成本导致竞争结果,并使其使用高级归档算法,例如Missforest和预测均值匹配,在这些设置中,否则它们在计算上是不可行的。
Data for which a set of objects is described by multiple distinct feature sets (called views) is known as multi-view data. When missing values occur in multi-view data, all features in a view are likely to be missing simultaneously. This may lead to very large quantities of missing data which, especially when combined with high-dimensionality, can make the application of conditional imputation methods computationally infeasible. However, the multi-view structure could be leveraged to reduce the complexity and computational load of imputation. We introduce a new imputation method based on the existing stacked penalized logistic regression (StaPLR) algorithm for multi-view learning. It performs imputation in a dimension-reduced space to address computational challenges inherent to the multi-view context. We compare the performance of the new imputation method with several existing imputation algorithms in simulated data sets and a real data application. The results show that the new imputation method leads to competitive results at a much lower computational cost, and makes the use of advanced imputation algorithms such as missForest and predictive mean matching possible in settings where they would otherwise be computationally infeasible.