论文标题
相互排他性训练和原始增强以诱导综合性
Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality
论文作者
论文摘要
最近的数据集揭示了在标准序列到序列模型中缺乏系统的概括能力。在这项工作中,我们分析了SEQ2SEQ模型的这种行为,并确定了两个因素:缺乏相互的排他性偏差(即,已经映射到目标序列的源序列不太可能被映射到其他目标序列),而不是将整个示例的趋势映射到其他示例中,而不是将整个示例分开,而不是将结构与内容分开。我们提出了两种技术来分别解决这两个问题:相互排他性训练,以防止模型通过基于不可能的损失面对新颖的,看不见的例子时产生可见的世代; Prim2primx数据增强将自动多样化每个句法函数的参数,以防止记忆并提供组成归电偏见,而无需暴露测试集数据。结合了这两种技术,我们使用标准序列到序列模型(LSTM和变压器)在两个广泛使用的组合性数据集上展示了实质性的经验改进:扫描和COGS。最后,我们提供了表征改进和剩余挑战的分析,并提供了我们方法的详细消融。我们的代码可在https://github.com/owenzx/met-primaug上找到
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models. In this work, we analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias (i.e., a source sequence already mapped to a target sequence is less likely to be mapped to other target sequences), and the tendency to memorize whole examples rather than separating structures from contents. We propose two techniques to address these two issues respectively: Mutual Exclusivity Training that prevents the model from producing seen generations when facing novel, unseen examples via an unlikelihood-based loss; and prim2primX data augmentation that automatically diversifies the arguments of every syntactic function to prevent memorizing and provide a compositional inductive bias without exposing test-set data. Combining these two techniques, we show substantial empirical improvements using standard sequence-to-sequence models (LSTMs and Transformers) on two widely-used compositionality datasets: SCAN and COGS. Finally, we provide analysis characterizing the improvements as well as the remaining challenges, and provide detailed ablations of our method. Our code is available at https://github.com/owenzx/met-primaug