论文标题
通过查询优化改善以对象为中心的学习
Improving Object-centric Learning with Query Optimization
论文作者
论文摘要
将复杂的自然场景分解成有意义的以对象为中心的抽象的能力在于人类的感知和推理的核心。在最近无监督的以对象学习的高潮中,插槽意见模块以其简单而有效的设计发挥了重要作用,并促进了许多强大的变体。但是,这些方法在没有监督的情况下很难训练,并且在物体的概念中含糊不清,尤其是对于复杂的自然场景而言。在本文中,我们建议通过调查可学习的查询作为老虎机注意学习的初始化的潜力来解决这些问题,并将其与现有尝试改善老虎机注意学习和双层优化的努力相结合。通过对插槽注意的简单代码调整,我们的模型,双层优化的查询插槽关注,在3个具有挑战性的合成和7个复杂的现实世界数据集中获得了最新的结果,在无监督的图像分割和重建中,取得了最新的结果,超过了先前的基地,从而超过了大量的余地。我们提供彻底的烧烤研究,以验证设计的必要性和有效性。此外,我们的模型具有概念绑定和零击学习的巨大潜力。我们的工作可在https://bo-qsa.github.io上公开获得。
The ability to decompose complex natural scenes into meaningful object-centric abstractions lies at the core of human perception and reasoning. In the recent culmination of unsupervised object-centric learning, the Slot-Attention module has played an important role with its simple yet effective design and fostered many powerful variants. These methods, however, have been exceedingly difficult to train without supervision and are ambiguous in the notion of object, especially for complex natural scenes. In this paper, we propose to address these issues by investigating the potential of learnable queries as initializations for Slot-Attention learning, uniting it with efforts from existing attempts on improving Slot-Attention learning with bi-level optimization. With simple code adjustments on Slot-Attention, our model, Bi-level Optimized Query Slot Attention, achieves state-of-the-art results on 3 challenging synthetic and 7 complex real-world datasets in unsupervised image segmentation and reconstruction, outperforming previous baselines by a large margin. We provide thorough ablative studies to validate the necessity and effectiveness of our design. Additionally, our model exhibits great potential for concept binding and zero-shot learning. Our work is made publicly available at https://bo-qsa.github.io