论文标题

我应该拆除这堵墙吗?通过评估新颖的行动来优化社会指标

Should I tear down this wall? Optimizing social metrics by evaluating novel actions

论文作者

Kramár, János, Rabinowitz, Neil, Eccles, Tom, Tacchetti, Andrea

论文摘要

治理的基本挑战之一是决定何时以及如何干预多代理系统,以影响范围范围的成功指标。当提议的干预措施是新颖且昂贵时,这尤其具有挑战性。例如,人们可能希望修改建筑物的布局以提高其逃生路线的效率。评估此类干预措施通常需要访问精心设计的模拟器,该模拟器必须在每个环境中进行临时构建,并且可能是昂贵的或不准确的。在这里,我们检查了一种简单的替代方法:通过观察性外推(双簧管)进行优化。这个想法是使用观察到的行为轨迹,没有任何干预措施,以学习预测模型将环境状态映射到单个代理结果,然后使用这些态度来评估和选择更改。我们在社会复杂的环境环境中评估双簧管,并考虑未经培训的新型物理干预措施。我们表明,经过培训的神经网络模型可预测基线环境上的代理回报有效地选择干预措施。因此,双簧管可以为具有挑战性的问题提供指导:“我应该拆除哪堵墙以最大程度地减少该组的Gini指数?”

One of the fundamental challenges of governance is deciding when and how to intervene in multi-agent systems in order to impact group-wide metrics of success. This is particularly challenging when proposed interventions are novel and expensive. For example, one may wish to modify a building's layout to improve the efficiency of its escape route. Evaluating such interventions would generally require access to an elaborate simulator, which must be constructed ad-hoc for each environment, and can be prohibitively costly or inaccurate. Here we examine a simple alternative: Optimize By Observational Extrapolation (OBOE). The idea is to use observed behavioural trajectories, without any interventions, to learn predictive models mapping environment states to individual agent outcomes, and then use these to evaluate and select changes. We evaluate OBOE in socially complex gridworld environments and consider novel physical interventions that our models were not trained on. We show that neural network models trained to predict agent returns on baseline environments are effective at selecting among the interventions. Thus, OBOE can provide guidance for challenging questions like: "which wall should I tear down in order to minimize the Gini index of this group?"

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源