论文标题
关于非差异功能的自动分化的正确性
On Correctness of Automatic Differentiation for Non-Differentiable Functions
论文作者
论文摘要
差异化位于许多机器学习算法的核心,并且由流行的Autodiff系统(例如Tensorflow和Pytorch)提供了良好的支持。最初,这些系统是为了计算可不同功能的衍生物的开发,但是实际上,它们通常应用于具有非差异性的功能。例如,使用RELU的神经网络通常定义了非差异功能,但是涉及这些功能的损失的梯度是使用自动化系统在实践中计算的。这种现状提出了一个自然的问题:当Autodiff系统将其应用于此类非差异功能时,它们是否正式正确?在本文中,我们为这个问题提供了积极的答案。使用反例,我们首先指出通常存在的非正式论证中的缺陷,例如:深度学习中产生的非差异性不会引起任何问题,因为它们形成了零度零件。然后,我们研究了一类称为PAP函数的功能,其中包括当今深度学习中几乎所有(可能是非差异)功能。对于这些PAP函数,我们提出了一种称为强化衍生物的新类型的衍生物,并证明这些衍生物始终存在,并且与几乎所有输入的标准衍生物相吻合。我们还表明,这些强化衍生物是大多数自动系统计算或试图计算基本上计算的内容。通过这种方式,我们正式建立了应用于非差异功能的自动化系统的正确性。
Differentiation lies at the core of many machine-learning algorithms, and is well-supported by popular autodiff systems, such as TensorFlow and PyTorch. Originally, these systems have been developed to compute derivatives of differentiable functions, but in practice, they are commonly applied to functions with non-differentiabilities. For instance, neural networks using ReLU define non-differentiable functions in general, but the gradients of losses involving those functions are computed using autodiff systems in practice. This status quo raises a natural question: are autodiff systems correct in any formal sense when they are applied to such non-differentiable functions? In this paper, we provide a positive answer to this question. Using counterexamples, we first point out flaws in often-used informal arguments, such as: non-differentiabilities arising in deep learning do not cause any issues because they form a measure-zero set. We then investigate a class of functions, called PAP functions, that includes nearly all (possibly non-differentiable) functions in deep learning nowadays. For these PAP functions, we propose a new type of derivatives, called intensional derivatives, and prove that these derivatives always exist and coincide with standard derivatives for almost all inputs. We also show that these intensional derivatives are what most autodiff systems compute or try to compute essentially. In this way, we formally establish the correctness of autodiff systems applied to non-differentiable functions.