论文标题

多米诺骨牌:通过跨模式嵌入发现系统错误

Domino: Discovering Systematic Errors with Cross-Modal Embeddings

论文作者

Eyuboglu, Sabri, Varma, Maya, Saab, Khaled, Delbrouck, Jean-Benoit, Lee-Messer, Christopher, Dunnmon, Jared, Zou, James, Ré, Christopher

论文摘要

实现高整体精度的机器学习模型通常会在数据的重要子集(或切片)上系统错误。当使用高维输入(例如图像,音频)时,识别表现不佳的切片尤其具有挑战性,在这种情况下,重要的切片通常是未标记的。为了解决这个问题,最近的研究提出了自动切片发现方法(SDMS),该方法利用学习的模型表示来挖掘模型性能较差的切片的输入数据。为了对从业者有用,这些方法必须确定既表现不佳又连贯的切片(即通过人类可行的概念结合)。但是,目前尚无针对这些标准严格评估SDM的定量评估框架。此外,先前的定性评估表明,SDM通常识别出不一致的切片。在这项工作中,我们首先设计了一个原则的评估框架来解决这些挑战,该框架可以在三个输入域(自然图像,医学图像和时间序列数据)中对SDM进行定量比较。然后,在最新强大的跨模式表示学习方法的开发中,我们提出了多米诺骨牌,该SDM利用了跨模式嵌入和一种新颖的错误吸引的混合模型来发现和描述连贯的切片。我们发现Domino在我们的框架中准确地识别了1,235个切片的36% - 比先前的方法提高了12个百分点。此外,Domino是第一个可以提供自然语言描述已确定的切片的SDM,可以正确地生成35%设置中的切片的确切名称。

Machine learning models that achieve high overall accuracy often make systematic errors on important subsets (or slices) of data. Identifying underperforming slices is particularly challenging when working with high-dimensional inputs (e.g. images, audio), where important slices are often unlabeled. In order to address this issue, recent studies have proposed automated slice discovery methods (SDMs), which leverage learned model representations to mine input data for slices on which a model performs poorly. To be useful to a practitioner, these methods must identify slices that are both underperforming and coherent (i.e. united by a human-understandable concept). However, no quantitative evaluation framework currently exists for rigorously assessing SDMs with respect to these criteria. Additionally, prior qualitative evaluations have shown that SDMs often identify slices that are incoherent. In this work, we address these challenges by first designing a principled evaluation framework that enables a quantitative comparison of SDMs across 1,235 slice discovery settings in three input domains (natural images, medical images, and time-series data). Then, motivated by the recent development of powerful cross-modal representation learning approaches, we present Domino, an SDM that leverages cross-modal embeddings and a novel error-aware mixture model to discover and describe coherent slices. We find that Domino accurately identifies 36% of the 1,235 slices in our framework - a 12 percentage point improvement over prior methods. Further, Domino is the first SDM that can provide natural language descriptions of identified slices, correctly generating the exact name of the slice in 35% of settings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源