Bongard-Logo：人类水平概念学习和推理的新基准

论文标题

Bongard-Logo：人类水平概念学习和推理的新基准

Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning

论文作者

Nie, Weili, Yu, Zhiding, Mao, Lei, Patel, Ankit B., Zhu, Yuke, Anandkumar, Animashree

论文摘要

人类具有仅从几个样本中学习新颖概念并将这些概念推广到不同情况的固有能力。即使当今的机器学习模型凭借有关标准识别任务的大量培训数据表现出色，但机器级模式识别与人级概念学习之间存在很大的差距。为了缩小这一差距，引入了邦加德问题（BPS），这是智能系统中视觉认知的鼓舞人心的挑战。尽管在表示和学习学习方面取得了新的进步，但对于现代AI来说，BP仍然是一个艰巨的挑战。受到原始一百个BP的启发，我们为人类水平的概念学习和推理提出了一种新的基准Bongard-logo。我们开发了一种程序引导的生成技术，以在以动作为导向的徽标语言中产生大量的人解剖视觉认知问题。我们的基准测试捕获了人类认知的三个核心特性：1）与上下文相关的感知，其中相同的对象在给定不同上下文的情况下可能具有不同的解释； 2）类比制造感知，其中一些有意义的概念被交易为其他有意义的概念； 3）感知几个样本但无限词汇。在实验中，我们表明，最先进的深度学习方法的表现要比人类受试者差得多，这意味着它们无法捕获人类的核心人类认知特性。最后，我们讨论了针对视觉推理的一般体系结构的研究方向，以解决该基准。

Humans have an inherent ability to learn novel concepts from only a few samples and generalize these concepts to different situations. Even though today's machine learning models excel with a plethora of training data on standard recognition tasks, a considerable gap exists between machine-level pattern recognition and human-level concept learning. To narrow this gap, the Bongard problems (BPs) were introduced as an inspirational challenge for visual cognition in intelligent systems. Despite new advances in representation learning and learning to learn, BPs remain a daunting challenge for modern AI. Inspired by the original one hundred BPs, we propose a new benchmark Bongard-LOGO for human-level concept learning and reasoning. We develop a program-guided generation technique to produce a large set of human-interpretable visual cognition problems in action-oriented LOGO language. Our benchmark captures three core properties of human cognition: 1) context-dependent perception, in which the same object may have disparate interpretations given different contexts; 2) analogy-making perception, in which some meaningful concepts are traded off for other meaningful concepts; and 3) perception with a few samples but infinite vocabulary. In experiments, we show that the state-of-the-art deep learning methods perform substantially worse than human subjects, implying that they fail to capture core human cognition properties. Finally, we discuss research directions towards a general architecture for visual reasoning to tackle this benchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题