增加学生的参与度以通过多军匪徒提醒电子邮件

论文标题

增加学生的参与度以通过多军匪徒提醒电子邮件

Increasing Students' Engagement to Reminder Emails Through Multi-Armed Bandits

论文作者

Yanez, Fernando J., Zavaleta-Bernuy, Angela, Han, Ziwen, Liut, Michael, Rafferty, Anna, Williams, Joseph Jay

论文摘要

在教育环境中进行随机实验提出了一个问题，即我们如何使用机器学习技术来改善教育干预措施。在自适应实验中，使用多臂匪徒（MAB）算法（例如汤普森采样（TS）），即使在干预措施完成之前，也可以通过增加对最佳状态（ARM）的分配可能性（ARM）的可能性来增加获得更好的结果的机会。这是比传统的A/B测试的优势，该测试可能会将同等数量的学生分配给最佳和非最佳条件。问题是探索探索权衡取舍。尽管自适应政策旨在收集足够的信息来分配更多的学生以可靠地分配武器，但过去的工作表明，这可能不足以对武器是否有所不同，得出可靠的结论。因此，在整个实验过程中提供额外的均匀随机（UR）探索是有意思的。本文展示了一个真实的自适应实验，介绍了学生如何与讲师每周的电子邮件提醒以建立时间管理习惯。我们感兴趣的指标是打开电子邮件率，它跟踪由不同主题行的武器。这些是按照不同的分配算法传递的：UR，TS以及我们确定的TS† - 结合了TS和UR奖励以更新其先验。我们强调了这些自适应算法的问题 - 在没有显着差异时可能会剥削手臂 - 并解决它们的原因和后果。未来的方向包括研究最佳臂的早期选择不是理想的情况以及自适应算法如何解决它们的情况。

Conducting randomized experiments in education settings raises the question of how we can use machine learning techniques to improve educational interventions. Using Multi-Armed Bandits (MAB) algorithms like Thompson Sampling (TS) in adaptive experiments can increase students' chances of obtaining better outcomes by increasing the probability of assignment to the most optimal condition (arm), even before an intervention completes. This is an advantage over traditional A/B testing, which may allocate an equal number of students to both optimal and non-optimal conditions. The problem is the exploration-exploitation trade-off. Even though adaptive policies aim to collect enough information to allocate more students to better arms reliably, past work shows that this may not be enough exploration to draw reliable conclusions about whether arms differ. Hence, it is of interest to provide additional uniform random (UR) exploration throughout the experiment. This paper shows a real-world adaptive experiment on how students engage with instructors' weekly email reminders to build their time management habits. Our metric of interest is open email rates which tracks the arms represented by different subject lines. These are delivered following different allocation algorithms: UR, TS, and what we identified as TS† - which combines both TS and UR rewards to update its priors. We highlight problems with these adaptive algorithms - such as possible exploitation of an arm when there is no significant difference - and address their causes and consequences. Future directions includes studying situations where the early choice of the optimal arm is not ideal and how adaptive algorithms can address them.

下载PDF全文

下载文献需遵守相关版权规定

论文标题