论文标题
数据增强和图像理解
Data augmentation and image understanding
论文作者
论文摘要
跨学科研究通常是科学进步的核心。本文探讨了机器学习,认知科学和神经科学之间的一些有利的协同作用。特别是,本文的重点是视觉和图像。从行为和神经科学方面的角度来看,人类视觉系统已被广泛研究,因为视力是大多数人的主要意义。反过来,机器视觉也是一个积极的研究领域,目前由人工神经网络的使用主导。这项工作着重于与视觉感知和生物愿景更加一致的学习表现。为此,我研究了认知科学和计算神经科学的工具和方面,并试图将它们纳入机器学习模型中。 本论文的一个核心主题是数据增强,这是一种用于训练人工神经网络的常用技术,可通过图像的转换来增加数据集的大小。尽管经常被忽视,但数据增强实现了在感知上合理的转换,因为它们与我们在视觉世界中看到的转换相对应 - 例如,观点或照明的变化。此外,神经科学家发现,大脑在这些转换下不变代表对象。在整个论文中,我都使用这些见解将数据增强分析为一种特别有用的电感偏见,一种更有效的人工神经网络的正则化方法,以及分析和改善视觉模型不变性以感知到可见的变换的框架。总体而言,这项工作旨在更多地阐明数据增强的特性,并证明跨学科研究的潜力。
Interdisciplinary research is often at the core of scientific progress. This dissertation explores some advantageous synergies between machine learning, cognitive science and neuroscience. In particular, this thesis focuses on vision and images. The human visual system has been widely studied from both behavioural and neuroscientific points of view, as vision is the dominant sense of most people. In turn, machine vision has also been an active area of research, currently dominated by the use of artificial neural networks. This work focuses on learning representations that are more aligned with visual perception and the biological vision. For that purpose, I have studied tools and aspects from cognitive science and computational neuroscience, and attempted to incorporate them into machine learning models of vision. A central subject of this dissertation is data augmentation, a commonly used technique for training artificial neural networks to augment the size of data sets through transformations of the images. Although often overlooked, data augmentation implements transformations that are perceptually plausible, since they correspond to the transformations we see in our visual world -- changes in viewpoint or illumination, for instance. Furthermore, neuroscientists have found that the brain invariantly represents objects under these transformations. Throughout this dissertation, I use these insights to analyse data augmentation as a particularly useful inductive bias, a more effective regularisation method for artificial neural networks, and as the framework to analyse and improve the invariance of vision models to perceptually plausible transformations. Overall, this work aims to shed more light on the properties of data augmentation and demonstrate the potential of interdisciplinary research.