对象本地化的手法辅助以辅助自我为中心的视觉

论文标题

对象本地化的手法辅助以辅助自我为中心的视觉

Hand-Priming in Object Localization for Assistive Egocentric Vision

论文作者

Lee, Kyungjun, Shrivastava, Abhinav, Kacorri, Hernisa

论文摘要

以自我为中心的视觉具有增加视觉信息获取并改善视觉障碍者的生活质量的巨大承诺，而对象识别是该人群的日常挑战之一。尽管我们努力提高识别性能，但仍然很难确定用户感兴趣的对象。由于没有视觉反馈的相机，该对象甚至可能不包含在框架中。同样，通常用于推断以自我为中心视力的感兴趣领域的凝视信息通常是无法可靠的。但是，盲人用户通常倾向于将他们的手与他们希望识别的对象进行交互，或者只是将其放置在靠近的位置，以实现更好的瞄准相机。我们提出的本地化模型利用手的存在作为启动感兴趣对象中心区域的上下文信息。在我们的方法中，手部细分被馈送到整个本地化网络或其最后的卷积层。使用来自视力和盲人的以自我为中心的数据集，我们表明，手动宣传的精度比其他方法（例如微调，多级和多任务学习）更高，这些方法还编码本地化中的手动对象相互作用。

Egocentric vision holds great promises for increasing access to visual information and improving the quality of life for people with visual impairments, with object recognition being one of the daily challenges for this population. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users often tend to include their hand either interacting with the object that they wish to recognize or simply placing it in proximity for better camera aiming. We propose localization models that leverage the presence of the hand as the contextual information for priming the center area of the object of interest. In our approach, hand segmentation is fed to either the entire localization network or its last convolutional layers. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves higher precision than other approaches, such as fine-tuning, multi-class, and multi-task learning, which also encode hand-object interactions in localization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题