论文标题
深层网络中多感官整合的关键学习期
Critical Learning Periods for Multisensory Integration in Deep Networks
论文作者
论文摘要
我们表明,神经网络在培训的早期阶段,神经网络整合来自不同来源的信息的能力贴切地贴在适当相关的信号方面。在这个初始阶段,在人工和生物系统中,在这个初始阶段的学习过程可能会永久损害技能的发展,而这种现象被称为关键学习时期。我们表明,关键时期是由复杂且不稳定的早期短暂动力学产生的,这些动态对训练有素的系统的最终性能及其学习的表示决定性。该证据挑战了通过分析宽和浅网络的分析,即神经网络的早期学习动力很简单,类似于线性模型的动态。确实,我们表明,即使是深层线性网络也会显示出重要的多源集成学习期,而浅网络则没有。为了更好地了解内部表示如何根据干扰或感觉缺陷发生变化,我们引入了一种新的源灵敏度度量,这使我们能够跟踪训练过程中源的抑制和整合。我们对抑制作用的分析表明,跨源重建是一个自然的辅助训练目标,实际上,我们表明,经过跨传感器重建目标训练的体系结构对关键时期更具弹性。我们的发现表明,与以前的监督工作相比,最近的自我监管的多模式训练的成功可能部分是由于更强大的学习动态,而不仅仅是由于更好的架构和/或更多数据所致。
We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training. Interfering with the learning process during this initial stage can permanently impair the development of a skill, both in artificial and biological systems where the phenomenon is known as a critical learning period. We show that critical periods arise from the complex and unstable early transient dynamics, which are decisive of final performance of the trained system and their learned representations. This evidence challenges the view, engendered by analysis of wide and shallow networks, that early learning dynamics of neural networks are simple, akin to those of a linear model. Indeed, we show that even deep linear networks exhibit critical learning periods for multi-source integration, while shallow networks do not. To better understand how the internal representations change according to disturbances or sensory deficits, we introduce a new measure of source sensitivity, which allows us to track the inhibition and integration of sources during training. Our analysis of inhibition suggests cross-source reconstruction as a natural auxiliary training objective, and indeed we show that architectures trained with cross-sensor reconstruction objectives are remarkably more resilient to critical periods. Our findings suggest that the recent success in self-supervised multi-modal training compared to previous supervised efforts may be in part due to more robust learning dynamics and not solely due to better architectures and/or more data.