迈向带有混合对比学习的星系基础模型

论文标题

迈向带有混合对比学习的星系基础模型

Towards Galaxy Foundation Models with Hybrid Contrastive Learning

论文作者

Walmsley, Mike, Slijepcevic, Inigo Val, Bowles, Micah, Scaife, Anna M. M.

论文摘要

新的天文任务通常与已经收集的标签的早期任务有关。我们将对比度框架BYOL调整为利用这些标签作为预处理的任务，同时还可以增强不变性。对于大规模预处理，我们介绍了GZ-EVO V0.1，这是一组为552K Galaxy图像的9650万志愿者响应，再加上另外134万个可比较的未标记星系。 206 GZ-EVO答案中的大多数对于任何给定的星系都不为人所知，因此我们的训练训练的任务使用了自然处理未知答案的Dirichlet损失。在有或没有混合学习的情况下，GZ-evo预训练即使有大量下游标签也可以改善直接训练（+44k标签的精度为4％）。我们的混合训练/对比方法进一步提高了下游准确性与训练或对比度学习，尤其是在低标签转移方案中（具有750个标签的6％精度）。

New astronomical tasks are often related to earlier tasks for which labels have already been collected. We adapt the contrastive framework BYOL to leverage those labels as a pretraining task while also enforcing augmentation invariance. For large-scale pretraining, we introduce GZ-Evo v0.1, a set of 96.5M volunteer responses for 552k galaxy images plus a further 1.34M comparable unlabelled galaxies. Most of the 206 GZ-Evo answers are unknown for any given galaxy, and so our pretraining task uses a Dirichlet loss that naturally handles unknown answers. GZ-Evo pretraining, with or without hybrid learning, improves on direct training even with plentiful downstream labels (+4% accuracy with 44k labels). Our hybrid pretraining/contrastive method further improves downstream accuracy vs. pretraining or contrastive learning, especially in the low-label transfer regime (+6% accuracy with 750 labels).

下载PDF全文

下载文献需遵守相关版权规定

论文标题