元学习假设空间用于顺序决策

论文标题

元学习假设空间用于顺序决策

Meta-Learning Hypothesis Spaces for Sequential Decision-making

论文作者

Kassraie, Parnian, Rothfuss, Jonas, Krause, Andreas

论文摘要

获得可靠的，自适应置信度的预测功能（假设）是顺序决策任务的核心挑战，例如土匪和基于模型的增强学习。这些置信度集通常依赖于对假设空间的先前假设，例如，繁殖核Hilbert Space（RKHS）的已知核。手动设计这样的内核容易容易出错，错误指定可能会导致性能差或不安全。在这项工作中，我们建议从离线数据（meta-kel）中进行元学习核。对于未知内核是已知碱核的组合的情况，我们基于结构化的稀疏性开发了一个估计量。在温和的条件下，我们保证我们的估计RKHS产生有效的置信度设置，随着越来越多的离线数据的量，它变得与赋予真正未知内核的置信度一样紧。我们展示了我们关于内核化强盗问题（又称贝叶斯优化）的方法，我们在其中建立了遗憾的界限，使与鉴于真正的内核的人竞争。我们还经验评估方法对贝叶斯优化任务的有效性。

Obtaining reliable, adaptive confidence sets for prediction functions (hypotheses) is a central challenge in sequential decision-making tasks, such as bandits and model-based reinforcement learning. These confidence sets typically rely on prior assumptions on the hypothesis space, e.g., the known kernel of a Reproducing Kernel Hilbert Space (RKHS). Hand-designing such kernels is error prone, and misspecification may lead to poor or unsafe performance. In this work, we propose to meta-learn a kernel from offline data (Meta-KeL). For the case where the unknown kernel is a combination of known base kernels, we develop an estimator based on structured sparsity. Under mild conditions, we guarantee that our estimated RKHS yields valid confidence sets that, with increasing amounts of offline data, become as tight as those given the true unknown kernel. We demonstrate our approach on the kernelized bandit problem (a.k.a.~Bayesian optimization), where we establish regret bounds competitive with those given the true kernel. We also empirically evaluate the effectiveness of our approach on a Bayesian optimization task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题