论文标题
部分可观测时空混沌系统的无模型预测
Mind's Eye: Grounded Language Model Reasoning through Simulation
论文作者
论文摘要
人类和人工智能之间的成功有效的沟通依赖于世界的共同经验。通过仅根据书面文本进行培训,当前的语言模型(LMS)错过了人类在现实世界中的基础经历 - 他们未能将语言与物理世界联系起来会导致知识在推理中被歪曲和明显的错误。我们介绍了思想的眼睛,这是物理世界中基础语言模型推理的范式。考虑到一个物理推理的问题,我们使用计算物理引擎(DeepMind's Mujoco)来模拟可能的结果,然后将模拟结果用作输入的一部分,这使语言模型能够执行推理。对物理对准基准中39个任务的实验表明,Mind的眼睛可以通过大幅度提高推理能力(27.9%的零拍摄,而46.0%的46.0%的绝对精度平均提高了)。武装着大脑的较小语言模型可以获得与大100倍的模型相似的性能。最后,我们通过消融研究证实了思想眼睛的鲁棒性。
Successful and effective communication between humans and AI relies on a shared experience of the world. By training solely on written text, current language models (LMs) miss the grounded experience of humans in the real-world -- their failure to relate language to the physical world causes knowledge to be misrepresented and obvious mistakes in their reasoning. We present Mind's Eye, a paradigm to ground language model reasoning in the physical world. Given a physical reasoning question, we use a computational physics engine (DeepMind's MuJoCo) to simulate the possible outcomes, and then use the simulation results as part of the input, which enables language models to perform reasoning. Experiments on 39 tasks in a physics alignment benchmark demonstrate that Mind's Eye can improve reasoning ability by a large margin (27.9% zero-shot, and 46.0% few-shot absolute accuracy improvement on average). Smaller language models armed with Mind's Eye can obtain similar performance to models that are 100x larger. Finally, we confirm the robustness of Mind's Eye through ablation studies.