论文标题

科学世界:您的经纪人比五年级学生聪明吗?

ScienceWorld: Is your Agent Smarter than a 5th Grader?

论文作者

Wang, Ruoyao, Jansen, Peter, Côté, Marc-Alexandre, Ammanabrolu, Prithviraj

论文摘要

我们介绍了科学世界,这是一种基准,旨在在新的互动文本环境中测试代理商在标准小学科学课程层面上的科学推理能力。尽管在提问和科学文本处理中看到了基于变压器的进展,但我们发现当前的模型无法推理或解释新颖背景下学习的科学概念。例如,模型可以轻松回答已知材料的电导率是什么,但是当被问及如何在接地环境中进行实验以找到未知材料的电导率时。这就提出了一个问题,即当前模型是否只是通过看到大量类似示例或以可重复使用的方式来理解概念的方式来检索答案。我们假设需要将代理基于交互式环境,以实现此类推理能力。我们的实验提供了支持这一假设的经验证据 - 表明,150万个针对100K步骤进行交互培训的参数代理超过了110亿个参数模型,该参数模型在静态地培训了数百万个专家示范的科学提问和推理。

We present ScienceWorld, a benchmark to test agents' scientific reasoning abilities in a new interactive text environment at the level of a standard elementary school science curriculum. Despite the transformer-based progress seen in question-answering and scientific text processing, we find that current models cannot reason about or explain learned science concepts in novel contexts. For instance, models can easily answer what the conductivity of a known material is but struggle when asked how they would conduct an experiment in a grounded environment to find the conductivity of an unknown material. This begs the question of whether current models are simply retrieving answers by way of seeing a large number of similar examples or if they have learned to reason about concepts in a reusable manner. We hypothesize that agents need to be grounded in interactive environments to achieve such reasoning capabilities. Our experiments provide empirical evidence supporting this hypothesis -- showing that a 1.5 million parameter agent trained interactively for 100k steps outperforms a 11 billion parameter model statically trained for scientific question-answering and reasoning from millions of expert demonstrations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源