论文标题
使用深神经网络从单眼RGB图像中估算基于锚的指尖位置估算方法
Anchors Based Method for Fingertips Position Estimation from a Monocular RGB Image using Deep Neural Network
论文作者
论文摘要
在虚拟,增强和混合现实中,手势的使用越来越流行,以减少虚拟世界和现实世界之间的差异。对于无缝体验,指尖的精确位置至关重要。许多研究工作都是基于使用深度信息来估计指尖位置的。但是,大多数使用RGB图像进行指尖检测的工作仅限于单个手指。由于各种因素,从单个RGB图像中检测多个指尖非常具有挑战性。在本文中,我们提出了一种基于深神经网络(DNN)的方法来估计指尖位置。我们将这种方法命名为基于锚的指尖位置估计(ABFPE),这是一个两步的过程。使用回归来估计指尖位置,通过计算最近锚点的指尖位置的差异。所提出的框架表现最好,对手部检测结果的依赖有限。在我们在Scut-egoture数据集上的实验中,我们在视频框架上达到了指尖检测误差为2.3552像素,分辨率为$ 640 \ times 480 $,大约$ 92.98 \%\%的测试图像的平均像素错误的平均像素错误为五个像素。
In Virtual, augmented, and mixed reality, the use of hand gestures is increasingly becoming popular to reduce the difference between the virtual and real world. The precise location of the fingertip is essential/crucial for a seamless experience. Much of the research work is based on using depth information for the estimation of the fingertips position. However, most of the work using RGB images for fingertips detection is limited to a single finger. The detection of multiple fingertips from a single RGB image is very challenging due to various factors. In this paper, we propose a deep neural network (DNN) based methodology to estimate the fingertips position. We christened this methodology as an Anchor based Fingertips Position Estimation (ABFPE), and it is a two-step process. The fingertips location is estimated using regression by computing the difference in the location of a fingertip from the nearest anchor point. The proposed framework performs the best with limited dependence on hand detection results. In our experiments on the SCUT-Ego-Gesture dataset, we achieved the fingertips detection error of 2.3552 pixels on a video frame with a resolution of $640 \times 480$ and about $92.98\%$ of test images have average pixel errors of five pixels.