HCMD-Zero：从数据中学习值对齐的机制

论文标题

HCMD-Zero：从数据中学习值对齐的机制

HCMD-zero: Learning Value Aligned Mechanisms from Data

论文作者

Balaguer, Jan, Koster, Raphael, Weinstein, Ari, Campbell-Gillingham, Lucy, Summerfield, Christopher, Botvinick, Matthew, Tacchetti, Andrea

论文摘要

人工学习的代理人正在调解人类，公司和组织之间的越来越多的相互作用，近年来，机制设计与机器学习之间的交集进行了大量研究。但是，机制设计方法通常对参与者的行为方式（例如理性），知识设计师可以使用先验的方式（例如，访问强基线机制）或该机制的目标是什么（例如，总福利）的方法。在这里，我们介绍了HCMD-Zero，这是一种通用方法，用于构建这三个假设的机制。 HCMD-Zero学会了调解参与者之间的相互作用，并调整机制参数，以使自己更有可能受到参与者的青睐。这样做是通过与自己的副本一起参加选举竞赛，从而获得参与者的直接反馈。我们在风格化的资源分配游戏上测试我们的方法，该游戏突出了生产力，平等与自由骑行的诱惑之间的张力。 HCMD-Zero产生的机制比人参与者优先于强基线，它会自动这样做，而无需先验知识，并谨慎而有效地使用人类的行为轨迹。我们的分析表明，HCMD-Zero始终使机制政策在培训过程中越来越可能受到人类参与者的喜好，并且它导致具有可解释和直观政策的机制。

Artificial learning agents are mediating a larger and larger number of interactions among humans, firms, and organizations, and the intersection between mechanism design and machine learning has been heavily investigated in recent years. However, mechanism design methods often make strong assumptions on how participants behave (e.g. rationality), on the kind of knowledge designers have access to a priori (e.g. access to strong baseline mechanisms), or on what the goal of the mechanism should be (e.g. total welfare). Here we introduce HCMD-zero, a general purpose method to construct mechanisms making none of these three assumptions. HCMD-zero learns to mediate interactions among participants and adjusts the mechanism parameters to make itself more likely to be preferred by participants. It does so by remaining engaged in an electoral contest with copies of itself, thereby accessing direct feedback from participants. We test our method on a stylized resource allocation game that highlights the tension between productivity, equality and the temptation to free ride. HCMD-zero produces a mechanism that is preferred by human participants over a strong baseline, it does so automatically, without requiring prior knowledge, and using human behavioral trajectories sparingly and effectively. Our analysis shows HCMD-zero consistently makes the mechanism policy more and more likely to be preferred by human participants over the course of training, and that it results in a mechanism with an interpretable and intuitive policy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题