元训练的代理商实施贝叶斯最佳代理商

论文标题

元训练的代理商实施贝叶斯最佳代理商

Meta-trained agents implement Bayes-optimal agents

论文作者

Mikulik, Vladimir, Delétang, Grégoire, McGrath, Tom, Genewein, Tim, Martic, Miljan, Legg, Shane, Ortega, Pedro A.

论文摘要

基于内存的元学习是一种强大的技术，可以快速适应目标分布中任何任务的代理。先前的一项理论研究认为，这种出色的表现是因为元训练方案激发了代理人的表现。我们对许多预测和强盗任务进行经验研究。受理论计算机科学的想法的启发，我们表明，元学习和贝叶斯最佳的代理不仅表现相似，而且甚至具有相似的计算结构，从某种意义上说，一个代理系统可以大致模拟另一种代理。此外，我们表明贝叶斯最佳剂是元学习动力学的固定点。我们的结果表明，基于内存的元学习可能是一种通用技术，用于数值近似于贝叶斯最佳的代理，即使我们目前不具有可拖延模型的任务分布也是如此。

Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution. A previous theoretical study has argued that this remarkable performance is because the meta-training protocol incentivises agents to behave Bayes-optimally. We empirically investigate this claim on a number of prediction and bandit tasks. Inspired by ideas from theoretical computer science, we show that meta-learned and Bayes-optimal agents not only behave alike, but they even share a similar computational structure, in the sense that one agent system can approximately simulate the other. Furthermore, we show that Bayes-optimal agents are fixed points of the meta-learning dynamics. Our results suggest that memory-based meta-learning might serve as a general technique for numerically approximating Bayes-optimal agents - that is, even for task distributions for which we currently don't possess tractable models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题