论文标题
在视觉参考游戏中具有直通胶合符号估计器的视觉参考游戏中的(紧急)系统的概括和组成性
On (Emergent) Systematic Generalisation and Compositionality in Visual Referential Games with Straight-Through Gumbel-Softmax Estimator
论文作者
论文摘要
先前使用基于增强算法和(神经)迭代的学习模型的方法研究了以前已经研究了两名(或更多)代理商进行非视觉参考游戏时出现的人造语言中构图的驱动因素。在最近介绍\ textit {直通gumbel-softmax}(ST-GS)方法之后,本文调查了迄今为止在现场确定的综合性驱动力在何种程度上适用于ST-GS上下文中,以及在玩视觉参考游戏时,它们在多大程度上转化为(新兴的)系统性通用能力。使用地形相似性和零拍构图评估新兴语言的组成性和概括能力。首先,我们提供的证据表明,测试训练策略在处理视觉刺激时会显着影响零拍的组成测试,而在处理象征性刺激时不会影响零。其次,经验证据表明,使用小批量大小的ST-GS方法和过度的通信渠道改善了新兴语言的组成性。然而,虽然表现出强大的符号刺激,但是在处理视觉刺激时,批次大小的效果并不是那么明确。我们的结果还表明,并非全部过度的通信渠道都相等。确实,发现最大句子长度的增加对进一步的组成性和概括能力是有益的,但发现词汇大小的增加是有害的。最后,在训练时间的语言组成性与代理商的概括能力之间缺乏相关性,这是在具有视觉刺激的歧视性参考游戏的背景下观察到的。这类似于使用带有符号刺激的生成变体在现场中的观察结果。
The drivers of compositionality in artificial languages that emerge when two (or more) agents play a non-visual referential game has been previously investigated using approaches based on the REINFORCE algorithm and the (Neural) Iterated Learning Model. Following the more recent introduction of the \textit{Straight-Through Gumbel-Softmax} (ST-GS) approach, this paper investigates to what extent the drivers of compositionality identified so far in the field apply in the ST-GS context and to what extent do they translate into (emergent) systematic generalisation abilities, when playing a visual referential game. Compositionality and the generalisation abilities of the emergent languages are assessed using topographic similarity and zero-shot compositional tests. Firstly, we provide evidence that the test-train split strategy significantly impacts the zero-shot compositional tests when dealing with visual stimuli, whilst it does not when dealing with symbolic ones. Secondly, empirical evidence shows that using the ST-GS approach with small batch sizes and an overcomplete communication channel improves compositionality in the emerging languages. Nevertheless, while shown robust with symbolic stimuli, the effect of the batch size is not so clear-cut when dealing with visual stimuli. Our results also show that not all overcomplete communication channels are created equal. Indeed, while increasing the maximum sentence length is found to be beneficial to further both compositionality and generalisation abilities, increasing the vocabulary size is found detrimental. Finally, a lack of correlation between the language compositionality at training-time and the agents' generalisation abilities is observed in the context of discriminative referential games with visual stimuli. This is similar to previous observations in the field using the generative variant with symbolic stimuli.