论文标题
足够接近?大规模探索广告测量的非实验方法
Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement
论文作者
论文摘要
尽管它们很受欢迎,但并非出于广告测量的目的,随机对照试验(RCT)并不总是可用。因此,需要非实验数据。但是,Facebook和其他广告平台使用复杂而不断发展的流程为用户选择广告。因此,成功的非实验方法需要“撤消”此选择。我们在Facebook上分析了663个大规模实验,以调查通常在大型广告平台上记录的数据是否可以使用。通过访问5,000多个用户级功能,这些数据比大多数广告客户或其测量合作伙伴访问的数据更丰富。我们研究了两种准确的非实验方法 - 双重/偏见的机器学习(DML)和分层倾向得分匹配(SPSM) - 可以恢复实验效应。尽管DML的性能比SPSM更好,但两种方法都表现良好,即使使用灵活的深度学习模型来实施倾向和结果模型。上,中间和下部漏斗结果分别为29%,18%和5%的RCT升降机。使用DML(SPSM),漏斗的中位升力分别为83%(173%),58%(176%)和24%(64%),表明相对测量误差显着。我们进一步表征每种方法的性能相对较好的情况。总体而言,尽管可以访问大规模实验和丰富的用户级数据,但我们无法可靠地估计广告系列的因果效应。
Despite their popularity, randomized controlled trials (RCTs) are not always available for the purposes of advertising measurement. Non-experimental data is thus required. However, Facebook and other ad platforms use complex and evolving processes to select ads for users. Therefore, successful non-experimental approaches need to "undo" this selection. We analyze 663 large-scale experiments at Facebook to investigate whether this is possible with the data typically logged at large ad platforms. With access to over 5,000 user-level features, these data are richer than what most advertisers or their measurement partners can access. We investigate how accurately two non-experimental methods -- double/debiased machine learning (DML) and stratified propensity score matching (SPSM) -- can recover the experimental effects. Although DML performs better than SPSM, neither method performs well, even using flexible deep learning models to implement the propensity and outcome models. The median RCT lifts are 29%, 18%, and 5% for the upper, middle, and lower funnel outcomes, respectively. Using DML (SPSM), the median lift by funnel is 83% (173%), 58% (176%), and 24% (64%), respectively, indicating significant relative measurement errors. We further characterize the circumstances under which each method performs comparatively better. Overall, despite having access to large-scale experiments and rich user-level data, we are unable to reliably estimate an ad campaign's causal effect.