实验设计的机器学习：改进阻塞的方法

论文标题

实验设计的机器学习：改进阻塞的方法

Machine Learning for Experimental Design: Methods for Improved Blocking

论文作者

Quistorff, Brian, Johnson, Gentry

论文摘要

限制实验设计的随机分组（例如，使用封闭/分层，成对匹配或重新授课）可以改善重要协变量对治疗控制的平衡，因此可以改善治疗效果的估计，特别是在中小型实验中。有关如何识别这些变量并实施限制的现有指导是不完整和冲突的。我们确定差异主要是由于以下事实：预处理数据中很重要的事情可能无法转化为处理后数据。我们重点介绍有足够数据提供明确指导和概述改进方法的设置，以大多使用现代机器学习（ML）技术自动化该过程。我们在使用现实世界数据的模拟中显示，这些方法同时降低了估计值的平方误差（14％-34％）和标准误差的大小（6％-16％）。

Restricting randomization in the design of experiments (e.g., using blocking/stratification, pair-wise matching, or rerandomization) can improve the treatment-control balance on important covariates and therefore improve the estimation of the treatment effect, particularly for small- and medium-sized experiments. Existing guidance on how to identify these variables and implement the restrictions is incomplete and conflicting. We identify that differences are mainly due to the fact that what is important in the pre-treatment data may not translate to the post-treatment data. We highlight settings where there is sufficient data to provide clear guidance and outline improved methods to mostly automate the process using modern machine learning (ML) techniques. We show in simulations using real-world data, that these methods reduce both the mean squared error of the estimate (14%-34%) and the size of the standard error (6%-16%).

下载PDF全文

下载文献需遵守相关版权规定

论文标题