探索变压器骨架以进行异质治疗效果估计

论文标题

探索变压器骨架以进行异质治疗效果估计

Exploring Transformer Backbones for Heterogeneous Treatment Effect Estimation

论文作者

Zhang, Yi-Fan, Zhang, Hanlin, Lipton, Zachary C., Li, Li Erran, Xing, Eric P.

论文摘要

以前关于治疗效应估计的工作（TEE）并非广泛使用，因为它们主要是理论上的，在该理论上是有力的参数假设，但对于实际应用而言不可扣除。最近的工作使用多层感知器（MLP）来建模休闲关系，但是，MLP落后于ML方法的最新进展，这限制了其适用性和可推广性。为了超越单个领域的公式和更现实的学习场景，我们探索了MLP以外的模型设计空间，即变压器骨架，这些空间提供了灵活性，而注意力层则控制着治疗和协变量之间的相互作用，以利用潜在成果的结构性相似性，以实现混淆控制。通过仔细的模型设计，提出了变压器作为治疗效应估计器（Transtee）。我们从经验上表明，Transtee可以：（1）作为一种通用治疗效应估计值，在各种具有挑战性的TEE问题（例如，离散，连续，结构化或相关的治疗方法）中明显超过竞争性基准，并且在协方差符合结构数据时（E.G.G.G.G.，Torkes，torkes，torkes，torge，torge）时，都适用（2）产生多个优势：与倾向得分建模，参数效率，稳健性对连续治疗价值分配变化的稳健性，可在协变量调整中解释以及在审核预训练的语言模型中的现实效用

Previous works on Treatment Effect Estimation (TEE) are not in widespread use because they are predominantly theoretical, where strong parametric assumptions are made but untractable for practical application. Recent work uses multilayer perceptron (MLP) for modeling casual relationships, however, MLPs lag far behind recent advances in ML methodology, which limits their applicability and generalizability. To extend beyond the single domain formulation and towards more realistic learning scenarios, we explore model design spaces beyond MLPs, i.e., transformer backbones, which provide flexibility where attention layers govern interactions among treatments and covariates to exploit structural similarities of potential outcomes for confounding control. Through careful model design, Transformers as Treatment Effect Estimators (TransTEE) is proposed. We show empirically that TransTEE can: (1) serve as a general purpose treatment effect estimator that significantly outperforms competitive baselines in a variety of challenging TEE problems (e.g., discrete, continuous, structured, or dosage-associated treatments) and is applicable to both when covariates are tabular and when they consist of structural data (e.g., texts, graphs); (2) yield multiple advantages: compatibility with propensity score modeling, parameter efficiency, robustness to continuous treatment value distribution shifts, explainable in covariate adjustment, and real-world utility in auditing pre-trained language models

下载PDF全文

下载文献需遵守相关版权规定

论文标题