对抗规范化的混合效应深度学习（武装）模型，可改善群集数据的可解释性，性能和泛化

论文标题

对抗规范化的混合效应深度学习（武装）模型，可改善群集数据的可解释性，性能和泛化

Adversarially-regularized mixed effects deep learning (ARMED) models for improved interpretability, performance, and generalization on clustered data

论文作者

Nguyen, Kevin P., Montillo, Albert

论文摘要

自然科学数据集经常违反独立的假设。可以将样品聚类（例如，通过研究地点，受试者或实验批次），导致虚假关联，模型拟合差和混杂分析。尽管在深度学习中很大程度上没有解决，但通过混合效应模型在统计社区中解决了这个问题，该模型将群集不变的固定效应与集群特异性的随机效应分开。 We propose a general-purpose framework for Adversarially-Regularized Mixed Effects Deep learning (ARMED) models through non-intrusive additions to existing neural networks: 1) an adversarial classifier constraining the original model to learn only cluster-invariant features, 2) a random effects subnetwork capturing cluster-specific features, and 3) an approach to apply random effects to clusters unseen during training.我们将武装在密集，卷积和自动编码器神经网络上应用于4个应用程序，包括模拟非线性数据，痴呆预后和诊断以及实时图像分析。与先前的技术相比，武装模型可以更好地区分混合物与模拟中的真实关联，并在临床应用中学习更多生物学上合理的特征。他们还可以量化集群间方差并可视化数据中的群集效应。最后，武装武装提高了数据的准确性，从训练期间看到的群集（与传统模型相比28％）和概括到看不见的簇（最多9％与传统模型）。

Natural science datasets frequently violate assumptions of independence. Samples may be clustered (e.g. by study site, subject, or experimental batch), leading to spurious associations, poor model fitting, and confounded analyses. While largely unaddressed in deep learning, this problem has been handled in the statistics community through mixed effects models, which separate cluster-invariant fixed effects from cluster-specific random effects. We propose a general-purpose framework for Adversarially-Regularized Mixed Effects Deep learning (ARMED) models through non-intrusive additions to existing neural networks: 1) an adversarial classifier constraining the original model to learn only cluster-invariant features, 2) a random effects subnetwork capturing cluster-specific features, and 3) an approach to apply random effects to clusters unseen during training. We apply ARMED to dense, convolutional, and autoencoder neural networks on 4 applications including simulated nonlinear data, dementia prognosis and diagnosis, and live-cell image analysis. Compared to prior techniques, ARMED models better distinguish confounded from true associations in simulations and learn more biologically plausible features in clinical applications. They can also quantify inter-cluster variance and visualize cluster effects in data. Finally, ARMED improves accuracy on data from clusters seen during training (up to 28% vs. conventional models) and generalization to unseen clusters (up to 9% vs. conventional models).

下载PDF全文

下载文献需遵守相关版权规定

论文标题