Vendi分数：机器学习的多样性评估指标

论文标题

Vendi分数：机器学习的多样性评估指标

The Vendi Score: A Diversity Evaluation Metric for Machine Learning

论文作者

Friedman, Dan, Dieng, Adji Bousso

论文摘要

多样性是机器学习（ML）许多领域（包括生成建模和数据集策划）的重要标准。但是，测量多样性的现有指标通常是特定于领域的且灵活性的限制。在本文中，我们通过提出VENDI评分来解决多样性评估问题，该评分将思想从生态和量子统计力学联系起来到ML。 Vendi评分定义为相似矩阵特征值的香农熵的指数。该矩阵是由用户定义的相似性函数诱导的，该函数应用于样本以评估多样性。在将相似性函数作为输入时，VENDI分数使其用户可以指定任何所需的多样性形式。重要的是，与ML中的许多现有指标不同，VENDI分数不需要对样本或标签上的参考数据集或分发，因此它概括且适用于任何生成模型，解码算法以及来自任何相似性可以定义的任何域中的数据集。我们在分子生成建模上展示了VENDI评分，我们发现它解决了该域中选择当前多样性度量的缺点。我们还将VENDI评分应用于图像的生成模型和文本的解码算法，我们发现它证实了有关这些域中多样性的已知结果。此外，我们使用Vendi分数来测量模式崩溃，这是生成对抗网络（GAN）的已知缺点。尤其是，Vendi分数表明，即使捕获标记数据集的所有模式的gan也可能比原始数据集多样化。最后，Vendi分数的可解释性使我们能够诊断出多个基准ML数据集的多样性，从而为多样性信息增强打开了大门。

Diversity is an important criterion for many areas of machine learning (ML), including generative modeling and dataset curation. However, existing metrics for measuring diversity are often domain-specific and limited in flexibility. In this paper, we address the diversity evaluation problem by proposing the Vendi Score, which connects and extends ideas from ecology and quantum statistical mechanics to ML. The Vendi Score is defined as the exponential of the Shannon entropy of the eigenvalues of a similarity matrix. This matrix is induced by a user-defined similarity function applied to the sample to be evaluated for diversity. In taking a similarity function as input, the Vendi Score enables its user to specify any desired form of diversity. Importantly, unlike many existing metrics in ML, the Vendi Score does not require a reference dataset or distribution over samples or labels, it is therefore general and applicable to any generative model, decoding algorithm, and dataset from any domain where similarity can be defined. We showcase the Vendi Score on molecular generative modeling where we found it addresses shortcomings of the current diversity metric of choice in that domain. We also applied the Vendi Score to generative models of images and decoding algorithms of text where we found it confirms known results about diversity in those domains. Furthermore, we used the Vendi Score to measure mode collapse, a known shortcoming of generative adversarial networks (GANs). In particular, the Vendi Score revealed that even GANs that capture all the modes of a labeled dataset can be less diverse than the original dataset. Finally, the interpretability of the Vendi Score allowed us to diagnose several benchmark ML datasets for diversity, opening the door for diversity-informed data augmentation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题