论文标题
swoosh!嘎嘎声!扑通! - 听起来的动作
Swoosh! Rattle! Thump! -- Actions that Sound
论文作者
论文摘要
真正聪明的代理商需要捕捉所有感官的相互作用,以建立对自己世界的丰富物理理解。在机器人技术中,我们在使用视觉和触觉感知方面看到了巨大的进步。但是,我们经常忽略一个关键意义:声音。这主要是由于缺乏捕获动作和声音相互作用的数据。在这项工作中,我们对声音和机器人动作之间的相互作用进行了首次大规模研究。为此,我们使用机器人平台倾斜机器人创建了最大的可用声音视觉数据集,在60个对象上进行15,000个交互。通过倾斜物体并允许它们坠入机器人托盘的墙壁,我们收集了丰富的四通道音频信息。使用这些数据,我们探讨了声音和动作之间的协同作用,并介绍了三个关键见解。首先,声音指示细粒对象类信息,例如,声音可以将金属螺丝刀与金属扳手区分开。其次,声音还包含有关动作的因果效应的信息,即鉴于产生的声音,我们可以预测对对象应用了哪些动作。最后,从音频嵌入中得出的对象表示表示隐式物理属性。我们证明,在以前看不见的对象上,通过交互产生的音频嵌入可以预测前向模型比被动视觉嵌入好24%。项目视频和数据在https://dhiraj100892.github.io/swoosh/
Truly intelligent agents need to capture the interplay of all their senses to build a rich physical understanding of their world. In robotics, we have seen tremendous progress in using visual and tactile perception; however, we have often ignored a key sense: sound. This is primarily due to the lack of data that captures the interplay of action and sound. In this work, we perform the first large-scale study of the interactions between sound and robotic action. To do this, we create the largest available sound-action-vision dataset with 15,000 interactions on 60 objects using our robotic platform Tilt-Bot. By tilting objects and allowing them to crash into the walls of a robotic tray, we collect rich four-channel audio information. Using this data, we explore the synergies between sound and action and present three key insights. First, sound is indicative of fine-grained object class information, e.g., sound can differentiate a metal screwdriver from a metal wrench. Second, sound also contains information about the causal effects of an action, i.e. given the sound produced, we can predict what action was applied to the object. Finally, object representations derived from audio embeddings are indicative of implicit physical properties. We demonstrate that on previously unseen objects, audio embeddings generated through interactions can predict forward models 24% better than passive visual embeddings. Project videos and data are at https://dhiraj100892.github.io/swoosh/