论文标题
Maxway CRT:改善Model-X推断的鲁棒性
Maxway CRT: Improving the Robustness of the Model-X Inference
论文作者
论文摘要
Model-X条件随机测试(CRT)是有条件独立性假设的灵活且强大的测试程序:X独立于Z上的条件。尽管具有许多有吸引力的属性,但Model-X CRT依赖于Model-X的假设,即我们对X |分布有完美的知识。 Z.如果在建模x |的分布时存在错误Z,这种方法可能会失去其有效性。当调整协变量Z具有很高的维度时,此问题更加严重,在这种情况下,X对Z的精确建模可能很难。为此,我们提出了Maxway(在Y的帮助下调整X)CRT,该CRT了解Y |的分布。 z,并使用它来校准X的重采样分布,以使X建模的误差获得鲁棒性。我们证明,Maxway CRT的类型I误差通胀可以通过低维调整模型的学习误差以及X |的学习误差的产物来控制。 z和y | Z,可以将其解释为“几乎双重强大的”属性。基于此,我们在实际情况下开发了Maxway CRT的实施算法,包括(替代)半监督学习和转移学习,其中有关Y |的有效信息。 Z可以可能由某些辅助或外部数据提供。通过在不同情况下的大量模拟研究,我们证明了Maxway CRT与现有的Model-X推断方法相比,Maxway CRT在保留类似的功能的同时,获得了I型误差控制。最后,我们将方法应用于两个真实示例,包括(1)用替代变量辅助的电子健康记录(EHR)数据研究肥胖悖论; (2)通过从多数群体中转移知识来推断少数民族群体中他汀类药物的副作用。
The model-X conditional randomization test (CRT) is a flexible and powerful testing procedure for the conditional independence hypothesis: X is independent of Y conditioning on Z. Though having many attractive properties, the model-X CRT relies on the model-X assumption that we have perfect knowledge of the distribution of X | Z. If there is an error in modeling the distribution of X | Z, this approach may lose its validity. This problem is even more severe when the adjustment covariates Z are of high dimensionality, in which situation precise modeling of X against Z can be hard. In response to this, we propose the Maxway (Model and Adjust X With the Assistance of Y) CRT, which learns the distribution of Y | Z, and uses it to calibrate the resampling distribution of X to gain robustness to the error in modeling X. We prove that the type-I error inflation of the Maxway CRT can be controlled by the learning error for the low-dimensional adjusting model plus the product of learning errors for X | Z and Y | Z, which could be interpreted as an "almost doubly robust" property. Based on this, we develop implementing algorithms of the Maxway CRT in practical scenarios including (surrogate-assisted) semi-supervised learning and transfer learning where valid information about Y | Z can be potentially provided by some auxiliary or external data. Through extensive simulation studies under different scenarios, we demonstrate that the Maxway CRT achieves significantly better type-I error control than existing model-X inference approaches while preserving similar powers. Finally, we apply our methodology to two real examples, including (1) studying obesity paradox with electronic health record (EHR) data assisted by surrogate variables; (2) inferring the side effect of statins among the ethnic minority group via transferring knowledge from the majority group.