一角钱：通过解开的本地解释对多模型模型的细粒度解释

论文标题

一角钱：通过解开的本地解释对多模型模型的细粒度解释

DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations

论文作者

Lyu, Yiwei, Liang, Paul Pu, Deng, Zihao, Salakhutdinov, Ruslan, Morency, Louis-Philippe

论文摘要

人类了解人工智能（AI）模型的决策过程的能力对于使利益相关者能够可视化模型行为，执行模型调试，促进对AI模型的信任并协助与人类AI的协作决策至关重要。结果，可解释和可解释的AI的研究领域在AI社区以及试图在其学科领域应用AI应用的跨学科科学家获得了关注。在本文中，我们专注于在解释多模型模型中推进最先进的方法 - 一类机器学习方法，以应对代表和捕获异质数据源（例如图像，文本，音频和时间序列数据）之间相互作用时面临的核心挑战。多模型模型已经扩大了医疗保健，机器人技术，多媒体，情感计算和人类计算机相互作用的众多现实世界应用。通过将模型分离为单峰贡献（UC）和多模式相互作用（MI），我们提出的方法，一角钱，可以对多模型模型进行准确且细粒度的分析，同时在任意模式，模型体系结构和任务之间保持一般性。通过对合成和现实世界多模式任务的全面实验套件，我们表明，一角钱可以生成准确的分离解释，可帮助多模型模型的用户对模型行为有了更深入的了解，并迈出了朝着调试和改进这些模型进行现实世界部署的一步。可以在https://github.com/lvyiwei1/dime上找到我们的实验代码。

The ability for a human to understand an Artificial Intelligence (AI) model's decision-making process is critical in enabling stakeholders to visualize model behavior, perform model debugging, promote trust in AI models, and assist in collaborative human-AI decision-making. As a result, the research fields of interpretable and explainable AI have gained traction within AI communities as well as interdisciplinary scientists seeking to apply AI in their subject areas. In this paper, we focus on advancing the state-of-the-art in interpreting multimodal models - a class of machine learning methods that tackle core challenges in representing and capturing interactions between heterogeneous data sources such as images, text, audio, and time-series data. Multimodal models have proliferated numerous real-world applications across healthcare, robotics, multimedia, affective computing, and human-computer interaction. By performing model disentanglement into unimodal contributions (UC) and multimodal interactions (MI), our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models while maintaining generality across arbitrary modalities, model architectures, and tasks. Through a comprehensive suite of experiments on both synthetic and real-world multimodal tasks, we show that DIME generates accurate disentangled explanations, helps users of multimodal models gain a deeper understanding of model behavior, and presents a step towards debugging and improving these models for real-world deployment. Code for our experiments can be found at https://github.com/lvyiwei1/DIME.

下载PDF全文

下载文献需遵守相关版权规定

论文标题