论文标题
对比性多模式学习,用于出现图形感官运动的出现
Contrastive Multimodal Learning for Emergence of Graphical Sensory-Motor Communication
论文作者
论文摘要
在本文中,我们研究了人造代理是否可以在沟通依赖感官运动渠道的生态环境中开发共享语言。为此,我们介绍了图形参考游戏(GREG),其中说话者必须制作图形话语才能命名视觉引用对象,而侦听器必须在分散的信息中选择分散的对象,鉴于交付的消息。这些话语是使用动态电动机原始图与素描库结合的绘制图像。为了解决格雷格,我们提出曲线:一种多模式对比深度学习机制,代表了通过梯度上升到学习能量景观上产生的指定对象和话语之间的能量(对齐)。我们证明,曲线不仅成功地解决了GREG,还使代理商可以自我组织一种概括的语言,该语言以训练期间从未见过的作品。除了评估方法的沟通性能外,我们还探索了新兴语言的结构。具体而言,我们表明所产生的语言形成了代理之间共享的连贯词典,并且对图形生产的基本组成规则无法解释组成的概括。
In this paper, we investigate whether artificial agents can develop a shared language in an ecological setting where communication relies on a sensory-motor channel. To this end, we introduce the Graphical Referential Game (GREG) where a speaker must produce a graphical utterance to name a visual referent object while a listener has to select the corresponding object among distractor referents, given the delivered message. The utterances are drawing images produced using dynamical motor primitives combined with a sketching library. To tackle GREG we present CURVES: a multimodal contrastive deep learning mechanism that represents the energy (alignment) between named referents and utterances generated through gradient ascent on the learned energy landscape. We demonstrate that CURVES not only succeeds at solving the GREG but also enables agents to self-organize a language that generalizes to feature compositions never seen during training. In addition to evaluating the communication performance of our approach, we also explore the structure of the emerging language. Specifically, we show that the resulting language forms a coherent lexicon shared between agents and that basic compositional rules on the graphical productions could not explain the compositional generalization.