论文标题
使用潜在空间表示发现分配变化
Discovering Distribution Shifts using Latent Space Representations
论文作者
论文摘要
代表学习的快速进步导致嵌入模型的扩散,以及模型选择和实际应用的相关挑战。评估模型对新的,候选数据集和未能概括的概括性可能导致下游任务的性能不佳是不平凡的。分配转移是降低普遍性的原因之一,在实践中通常很难检测到。在本文中,我们使用嵌入式空间几何形状提出一个非参数框架来检测分布偏移,并指定两个测试。第一个测试通过建立由可理解的性能标准确定的鲁棒边界来检测变化,以比较参考和候选数据集。第二个测试通过将两个数据集的多个子示例列为分布和分布来检测变化。在评估中,两种测试在各种偏移场景中检测出构成模型的分布变化,用于合成数据集和实际数据集。
Rapid progress in representation learning has led to a proliferation of embedding models, and to associated challenges of model selection and practical application. It is non-trivial to assess a model's generalizability to new, candidate datasets and failure to generalize may lead to poor performance on downstream tasks. Distribution shifts are one cause of reduced generalizability, and are often difficult to detect in practice. In this paper, we use the embedding space geometry to propose a non-parametric framework for detecting distribution shifts, and specify two tests. The first test detects shifts by establishing a robustness boundary, determined by an intelligible performance criterion, for comparing reference and candidate datasets. The second test detects shifts by featurizing and classifying multiple subsamples of two datasets as in-distribution and out-of-distribution. In evaluation, both tests detect model-impacting distribution shifts, in various shift scenarios, for both synthetic and real-world datasets.