论文标题
用于嘈杂图像分类的治疗学习因果变压器
Treatment Learning Causal Transformer for Noisy Image Classification
论文作者
论文摘要
当前一流的深度学习(DL)视力模型主要基于探索和利用训练数据样本及其相关标签之间的固有相关性。但是,一个已知的实际挑战是,他们针对“嘈杂”数据的降级性能,这些数据是由诸如虚假相关性,无关紧要的环境,域转移和对抗性攻击等不同情况所引起的。在这项工作中,我们将“存在噪声”作为处理的二元信息纳入图像分类任务中,以通过共同估计其治疗效果来提高预测准确性。从因果变异推断中,我们提出了一种基于变压器的结构,即处理学习因果变压器(TLT),该结构使用潜在的生成模型来估算当前观察性输入以进行噪声图像分类的鲁棒特征表示。根据估计的噪声水平(建模为二元处理因子),TLT分配了由设计的因果损失进行预测的相应推理网络。我们还创建了新的嘈杂图像数据集,其中包含了广泛的噪声因子(例如对象掩盖,样式传输和对抗性扰动),以进行性能基准测试。通过几种反驳评估指标,TLT在嘈杂的图像分类中的出色表现得到了进一步验证。作为副产品,TLT还改善了视觉显着性方法来感知嘈杂的图像。
Current top-notch deep learning (DL) based vision models are primarily based on exploring and exploiting the inherent correlations between training data samples and their associated labels. However, a known practical challenge is their degraded performance against "noisy" data, induced by different circumstances such as spurious correlations, irrelevant contexts, domain shift, and adversarial attacks. In this work, we incorporate this binary information of "existence of noise" as treatment into image classification tasks to improve prediction accuracy by jointly estimating their treatment effects. Motivated from causal variational inference, we propose a transformer-based architecture, Treatment Learning Causal Transformer (TLT), that uses a latent generative model to estimate robust feature representations from current observational input for noise image classification. Depending on the estimated noise level (modeled as a binary treatment factor), TLT assigns the corresponding inference network trained by the designed causal loss for prediction. We also create new noisy image datasets incorporating a wide range of noise factors (e.g., object masking, style transfer, and adversarial perturbation) for performance benchmarking. The superior performance of TLT in noisy image classification is further validated by several refutation evaluation metrics. As a by-product, TLT also improves visual salience methods for perceiving noisy images.