论文标题
通过随机舍入的热图回归
Heatmap Regression via Randomized Rounding
论文作者
论文摘要
热图回归已成为基于深度学习的语义里程碑定位的主流方法,包括面部标志性的定位和人类姿势估计。尽管热图回归对不受约束的设置中的姿势,照明和遮挡的巨大变化是可靠的,但通常会遭受子像素定位问题的痛苦。具体而言,考虑到热图中的激活点索引始终是整数,因此当使用热图作为数值坐标的表示时,会出现量化误差。克服亚像素定位问题的先前方法通常依赖于高分辨率热图。结果,达到定位准确性和计算成本之间始终存在权衡,其中热图回归的计算复杂性以二次的方式取决于热图分辨率。在本文中,我们正式分析了香草热图回归的量化误差,并提出了一个简单而有效的量化系统来解决亚像素定位问题。由随机舍入操作诱导的拟议量化系统1)在训练过程中使用概率方法将数值坐标的分数编码为地面真实热图; 2)在测试过程中从一组激活点解码预测的数值坐标。我们证明,提出的用于热图回归的量量化系统是公正的且无损的。对流行面部标志性定位数据集(WFLW,300W,COFW和AFLW)和人姿势估计数据集(MPII和可可)的实验结果证明了拟议方法对有效,准确的语义地标定位的有效性。代码可在http://github.com/baoshengyu/h3r上找到。
Heatmap regression has become the mainstream methodology for deep learning-based semantic landmark localization, including in facial landmark localization and human pose estimation. Though heatmap regression is robust to large variations in pose, illumination, and occlusion in unconstrained settings, it usually suffers from a sub-pixel localization problem. Specifically, considering that the activation point indices in heatmaps are always integers, quantization error thus appears when using heatmaps as the representation of numerical coordinates. Previous methods to overcome the sub-pixel localization problem usually rely on high-resolution heatmaps. As a result, there is always a trade-off between achieving localization accuracy and computational cost, where the computational complexity of heatmap regression depends on the heatmap resolution in a quadratic manner. In this paper, we formally analyze the quantization error of vanilla heatmap regression and propose a simple yet effective quantization system to address the sub-pixel localization problem. The proposed quantization system induced by the randomized rounding operation 1) encodes the fractional part of numerical coordinates into the ground truth heatmap using a probabilistic approach during training; and 2) decodes the predicted numerical coordinates from a set of activation points during testing. We prove that the proposed quantization system for heatmap regression is unbiased and lossless. Experimental results on popular facial landmark localization datasets (WFLW, 300W, COFW, and AFLW) and human pose estimation datasets (MPII and COCO) demonstrate the effectiveness of the proposed method for efficient and accurate semantic landmark localization. Code is available at http://github.com/baoshengyu/H3R.