以3D为中心的行动目标预测

论文标题

以3D为中心的行动目标预测

Egocentric Prediction of Action Target in 3D

论文作者

Li, Yiming, Cao, Ziang, Liang, Andrew, Liang, Benjamin, Chen, Luoyao, Zhao, Hang, Feng, Chen

论文摘要

我们有兴趣尽早预期一个人的对象操纵动作的目标位置，其中以自我为中心的愿景在3D工作区中。这在像人类机器人合作之类的领域中很重要，但尚未受到视觉和学习社区的足够关注。为了刺激对这项具有挑战性的自我视觉任务的更多研究，我们提出了一个超过100万帧的RGB-D和IMU流的大型多模式数据集，并根据我们的半自动注释中的高质量2D和3D标签提供评估指标。同时，我们使用复发性神经网络设计基线方法，并进行各种消融研究以验证其有效性。我们的结果表明，这项新任务值得在机器人技术，愿景和学习社区中进一步研究。

We are interested in anticipating as early as possible the target location of a person's object manipulation action in a 3D workspace from egocentric vision. It is important in fields like human-robot collaboration, but has not yet received enough attention from vision and learning communities. To stimulate more research on this challenging egocentric vision task, we propose a large multimodality dataset of more than 1 million frames of RGB-D and IMU streams, and provide evaluation metrics based on our high-quality 2D and 3D labels from semi-automatic annotation. Meanwhile, we design baseline methods using recurrent neural networks and conduct various ablation studies to validate their effectiveness. Our results demonstrate that this new task is worthy of further study by researchers in robotics, vision, and learning communities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题