通过柔软的最大化方法，在瓶子环境中的多目标决策中提高性能

论文标题

通过柔软的最大化方法，在瓶子环境中的多目标决策中提高性能

Improving performance in multi-objective decision-making in Bottles environments with soft maximin approaches

论文作者

Smith, Benjamin J, Klassert, Robert, Pihlakas, Roland

论文摘要

对于任何负责满足人类价值观或偏好的人工智能而言，平衡多个竞争和冲突的目标都是必不可少的任务。冲突既是由于具有竞争价值的个体之间的错位而引起的，也是由单个人类拥有的冲突价值体系之间的。从规避损失原则开始，我们设计了一组软目标决策的软性最大化函数方法。在一系列先前开发的环境中，板凳标记这些功能，我们发现一种新的方法是“分裂功能的exp-log aververation”（SFELLA）（SFELLA），比对四个任务在四个任务上的效果验证了相同的任务，并且在四个任务上都在学习了相同的三个任务。 SFELLA还显示出相对鲁棒性的改善，以抵抗客观量表的变化，这可能突出了涉及环境动态分布变化的优势。由于出版规则，预先打印无法进一步的工作，但是在最终发布的版本中，我们将SFELLA与多目标奖励指数（更多）方法（Rolf，2020年）进行比较，SFELLA的性能类似于在简单的先前描述的资源中，但与新的资源相似，但与新资源相似，在新的资源中，该既定的代理都与新资源相似，而sefore则是在新的范围内，而sfella则是在新的范围内，而sfella却在新的范围内，而sfella则在新的范围内进行了启动，而sfella则在新的范围内进行了启动，而sfella则在新的范围内进行了启动，而sfella则在新的范围内进行了启动。就旧资源而言，几乎没有成本。总体而言，我们发现SFELLA对于避免有时以阈值方法出现的问题而有用，而在保留其保守的，避免倾向的激励结构的同时，比更多的奖励响应响应。

Balancing multiple competing and conflicting objectives is an essential task for any artificial intelligence tasked with satisfying human values or preferences. Conflict arises both from misalignment between individuals with competing values, but also between conflicting value systems held by a single human. Starting with principle of loss-aversion, we designed a set of soft maximin function approaches to multi-objective decision-making. Bench-marking these functions in a set of previously-developed environments, we found that one new approach in particular, 'split-function exp-log loss aversion' (SFELLA), learns faster than the state of the art thresholded alignment objective method (Vamplew et al, 2021) on three of four tasks it was tested on, and achieved the same optimal performance after learning. SFELLA also showed relative robustness improvements against changes in objective scale, which may highlight an advantage dealing with distribution shifts in the environment dynamics. Due to publishing rules, further work could not be presented in the preprint, but in the final published version, we will further compare SFELLA to the multi-objective reward exponentials (MORE) approach (Rolf, 2020), demonstrating that SFELLA performs similarly to MORE in a simple previously-described foraging task, but in a modified foraging environment with a new resource that was not depleted as the agent worked, SFELLA collected more of the new resource with very little cost incurred in terms of the old resource. Overall, we found SFELLA useful for avoiding problems that sometimes occur with a thresholded approach, and more reward-responsive than MORE while retaining its conservative, loss-averse incentive structure.

下载PDF全文

下载文献需遵守相关版权规定

论文标题