生成模型中数据偏差的注释

论文标题

生成模型中数据偏差的注释

A Note on Data Biases in Generative Models

论文作者

Esser, Patrick, Rombach, Robin, Ommer, Björn

论文摘要

很容易想到机器不太容易发生不公平和偏见。但是，机器学习方法根据数据计算其输出。虽然偏见可以在开发管道的任何阶段进入，但模型特别接受镜像他们经过训练的数据集的偏见，因此不一定反映有关世界的真相，而是主要是关于数据的真相。为了提高人们对现代算法与塑造它们的数据之间关系的认识，我们使用条件可逆的神经网络将数据集特定的信息从不同数据集共享的信息中解散。这样，我们可以将同一图像投影到不同的数据集上，从而揭示其固有的偏见。我们使用这种方法来（i）研究数据集质量对生成模型性能的影响，（ii）显示数据集的社会偏见是如何通过生成模型复制的，（iii）通过在照片，油肖像和动静等各种数据集之间的不合格数据集之间进行创造性应用。我们的代码和交互式演示可在https://github.com/compvis/net2net上获得。

It is tempting to think that machines are less prone to unfairness and prejudice. However, machine learning approaches compute their outputs based on data. While biases can enter at any stage of the development pipeline, models are particularly receptive to mirror biases of the datasets they are trained on and therefore do not necessarily reflect truths about the world but, primarily, truths about the data. To raise awareness about the relationship between modern algorithms and the data that shape them, we use a conditional invertible neural network to disentangle the dataset-specific information from the information which is shared across different datasets. In this way, we can project the same image onto different datasets, thereby revealing their inherent biases. We use this methodology to (i) investigate the impact of dataset quality on the performance of generative models, (ii) show how societal biases of datasets are replicated by generative models, and (iii) present creative applications through unpaired transfer between diverse datasets such as photographs, oil portraits, and animes. Our code and an interactive demonstration are available at https://github.com/CompVis/net2net .

下载PDF全文

下载文献需遵守相关版权规定

论文标题