脱落推断，重量缩放不均匀

论文标题

脱落推断，重量缩放不均匀

Dropout Inference with Non-Uniform Weight Scaling

论文作者

Yang, Zhaoyuan, Jain, Arpit

论文摘要

辍学作为正则化已被广泛使用，以防止训练神经网络过度拟合。在培训期间，单位及其连接被随机删除，可以将其视为对原始模型的许多不同的子模型进行抽样。在测试时，重量缩放和蒙特卡洛近似是两种广泛应用的方法来近似输出。当所有子模型都是低偏见的复杂学习者时，两种方法实际上都可以正常工作。但是，在这项工作中，我们演示了某些子模型的行为更接近高偏差模型和不均匀的重量缩放是更好的推理近似值。

Dropout as regularization has been used extensively to prevent overfitting for training neural networks. During training, units and their connections are randomly dropped, which could be considered as sampling many different submodels from the original model. At test time, weight scaling and Monte Carlo approximation are two widely applied approaches to approximate the outputs. Both approaches work well practically when all submodels are low-bias complex learners. However, in this work, we demonstrate scenarios where some submodels behave closer to high-bias models and a non-uniform weight scaling is a better approximation for inference.

下载PDF全文

下载文献需遵守相关版权规定

论文标题