论文标题
主动学习:外层面天文瞬变的资源分配
Active learning with RESSPECT: Resource allocation for extragalactic astronomical transients
论文作者
论文摘要
最新的可用天文数据的数量和复杂性的增加导致了监督机器学习技术的广泛使用。已经提出了积极的学习策略作为优化稀缺标签资源分布的替代方法。但是,由于可以获取标签的特定条件,因此无法实现基本假设,例如样本代表性和标签成本稳定性。光谱后续推荐系统(Resspect)项目旨在考虑天文学数据环境的现实描述,为鲁宾天文台的时空遗产调查(LSST)构建优化的培训样品(LSST)。在这项工作中,我们在现实的模拟天文数据方案中测试主动学习技术的鲁棒性。我们的实验考虑了培训和泳池样本的演变,每个对象的不同成本以及两个不同的预算来源。结果表明,传统的主动学习策略显着优于随机抽样。然而,更复杂的批处理策略无法显着克服简单的不确定性抽样技术。我们的发现说明了三个重要点:1)主动学习策略是优化天文学的标签辅助任务的有力工具,2)用于即将进行的大型调查,例如LSST,这样的技术使我们能够为调查的第一天定制培训样本的构建,以及3)与自天文机构的特殊数据相关的特殊数据环境,该机器人是为了量化的量量提供了一个机器的量化。
The recent increase in volume and complexity of available astronomical data has led to a wide use of supervised machine learning techniques. Active learning strategies have been proposed as an alternative to optimize the distribution of scarce labeling resources. However, due to the specific conditions in which labels can be acquired, fundamental assumptions, such as sample representativeness and labeling cost stability cannot be fulfilled. The Recommendation System for Spectroscopic follow-up (RESSPECT) project aims to enable the construction of optimized training samples for the Rubin Observatory Legacy Survey of Space and Time (LSST), taking into account a realistic description of the astronomical data environment. In this work, we test the robustness of active learning techniques in a realistic simulated astronomical data scenario. Our experiment takes into account the evolution of training and pool samples, different costs per object, and two different sources of budget. Results show that traditional active learning strategies significantly outperform random sampling. Nevertheless, more complex batch strategies are not able to significantly overcome simple uncertainty sampling techniques. Our findings illustrate three important points: 1) active learning strategies are a powerful tool to optimize the label-acquisition task in astronomy, 2) for upcoming large surveys like LSST, such techniques allow us to tailor the construction of the training sample for the first day of the survey, and 3) the peculiar data environment related to the detection of astronomical transients is a fertile ground that calls for the development of tailored machine learning algorithms.