论文标题
alfworld:对齐文本和具体的环境,以进行互动学习
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
论文作者
论文摘要
考虑到一个简单的要求,例如将洗净的苹果放在厨房冰箱中,人类可以通过想象动作序列并评分成功,原型和效率的可能性,纯粹是抽象的术语,而无需移动肌肉。一旦看到厨房,我们就可以更新我们的抽象计划以适合现场。体现的代理需要相同的能力,但是现有工作尚未提供抽象推理和具体执行所必需的基础架构。我们通过介绍Alfworld(一种模拟器,使代理商能够在文本世界中学习抽象的,基于文本的策略(Côté等,2018),然后在丰富的视觉环境中从Alfred Benchmark(Shridhar等,2020)执行目标。 Alfworld可以创建一个新的巴特勒代理,其抽象知识在文本世界中学习,直接与具体的视觉扎根动作相对应。反过来,正如我们从经验上证明的那样,这比仅在视觉扎根的环境中培训培训更好的代理概括。巴特勒简单,模块化的设计因素是使研究人员专注于改进管道(语言理解,计划,导航和视觉场景理解)的模型的问题。
Given a simple request like Put a washed apple in the kitchen fridge, humans can reason in purely abstract terms by imagining action sequences and scoring their likelihood of success, prototypicality, and efficiency, all without moving a muscle. Once we see the kitchen in question, we can update our abstract plans to fit the scene. Embodied agents require the same abilities, but existing work does not yet provide the infrastructure necessary for both reasoning abstractly and executing concretely. We address this limitation by introducing ALFWorld, a simulator that enables agents to learn abstract, text based policies in TextWorld (Côté et al., 2018) and then execute goals from the ALFRED benchmark (Shridhar et al., 2020) in a rich visual environment. ALFWorld enables the creation of a new BUTLER agent whose abstract knowledge, learned in TextWorld, corresponds directly to concrete, visually grounded actions. In turn, as we demonstrate empirically, this fosters better agent generalization than training only in the visually grounded environment. BUTLER's simple, modular design factors the problem to allow researchers to focus on models for improving every piece of the pipeline (language understanding, planning, navigation, and visual scene understanding).