论文标题
射电银河动物园:使用半监督学习来利用数据集偏移下的宽度未标记数据集进行射电星系分类
Radio Galaxy Zoo: Using semi-supervised learning to leverage large unlabelled data-sets for radio galaxy classification under data-set shift
论文作者
论文摘要
在这项工作中,我们研究了应用于射电星系形态学分类的最先进的半监督学习(SSL)算法的分类准确性和鲁棒性。我们测试标签较少的SSL是否可以实现与监督最新的最新数据相当的测试精确度,以及在合并以前看不见的数据时是否存在。我们发现,对于考虑到射电星系分类问题,SSL提供了其他正则化,并且表现优于基线测试精度。但是,与在计算机科学基准测试数据集上报告的模型性能指标相反,我们发现改进仅限于狭窄的标签量范围,在低标签量下,性能迅速下降。此外,我们表明SSL不会改善模型校准,而不管分类是否得到改进。此外,我们发现,当使用同一无线电调查中绘制的不同基础目录提供SSL所需的标记和未标记的数据集时,观察到分类性能的显着下降,突显了在数据集偏移下应用SSL技术的困难。我们表明,一个班级失去平衡的数据池通过先前的概率转移对性能产生负面影响,我们建议这可以解释这种性能下降,并且使用标记和未定分数据集的标记和未定分数据集之间的特征距离作为数据集转移的量度可以提供模型性能的预测,但对于典型的Galaxy数据集,与典型的数据集相关联,该技术不相关iS of Sample Is of Is of Of Of Of Of Of Of Of Of Of Of Of Of Of Of(1000)(1000),而不是相关的,则可以提供(1000),1000)(1000),),该技术的范围(1000),)足够强大以更换火车测试周期。
In this work we examine the classification accuracy and robustness of a state-of-the-art semi-supervised learning (SSL) algorithm applied to the morphological classification of radio galaxies. We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state-of-the-art and whether this holds when incorporating previously unseen data. We find that for the radio galaxy classification problem considered, SSL provides additional regularisation and outperforms the baseline test accuracy. However, in contrast to model performance metrics reported on computer science benchmarking data-sets, we find that improvement is limited to a narrow range of label volumes, with performance falling off rapidly at low label volumes. Additionally, we show that SSL does not improve model calibration, regardless of whether classification is improved. Moreover, we find that when different underlying catalogues drawn from the same radio survey are used to provide the labelled and unlabelled data-sets required for SSL, a significant drop in classification performance is observered, highlighting the difficulty of applying SSL techniques under dataset shift. We show that a class-imbalanced unlabelled data pool negatively affects performance through prior probability shift, which we suggest may explain this performance drop, and that using the Frechet Distance between labelled and unlabelled data-sets as a measure of data-set shift can provide a prediction of model performance, but that for typical radio galaxy data-sets with labelled sample volumes of O(1000), the sample variance associated with this technique is high and the technique is in general not sufficiently robust to replace a train-test cycle.