论文标题
贝叶斯半监督多类别分类
Bayesian Semi-supervised Multi-category Classification under Nonparanormality
论文作者
论文摘要
半监督学习是一种使用标记和未标记数据的模型培训方法。本文提出了一个完全贝叶斯半监督的学习算法,可以应用于任何多类别分类问题。我们假设在半监督设置中使用未标记的数据时,标签是随机缺少的。假设我们的数据中有$ k $类。我们假设观察结果遵循$ k $多变量正常分布,具体取决于其真实的类标签,在某些常见的未知转换应用于观察矢量的每个组件之后。该函数在b-splines系列中扩展,并将先验添加到系数中。我们考虑系数上的正常先验,并限制值以满足正态性和可识别性约束要求。高斯分布的精确矩阵先验是共轭的,而平均值则是不正确的统一之前。所得的后验仍然是有条件的缀合物,因此可以采用数据授权技术的吉布斯采样器。一项广泛的仿真研究将提出的方法与其他几种可用方法进行了比较。所提出的方法还应用于诊断乳腺癌和信号分类的真实数据集。我们得出的结论是,在各种情况下,提出的方法具有更好的预测准确性。
Semi-supervised learning is a model training method that uses both labeled and unlabeled data. This paper proposes a fully Bayes semi-supervised learning algorithm that can be applied to any multi-category classification problem. We assume the labels are missing at random when using unlabeled data in a semi-supervised setting. Suppose we have $K$ classes in the data. We assume that the observations follow $K$ multivariate normal distributions depending on their true class labels after some common unknown transformation is applied to each component of the observation vector. The function is expanded in a B-splines series, and a prior is added to the coefficients. We consider a normal prior on the coefficients and constrain the values to meet the normality and identifiability constraints requirement. The precision matrices of the Gaussian distributions are given a conjugate Wishart prior, while the means are given the improper uniform prior. The resulting posterior is still conditionally conjugate, and the Gibbs sampler aided by a data-augmentation technique can thus be adopted. An extensive simulation study compares the proposed method with several other available methods. The proposed method is also applied to real datasets on diagnosing breast cancer and classification of signals. We conclude that the proposed method has a better prediction accuracy in various cases.