论文标题

dall-eval:探索文本到图像生成模型的推理技能和社会偏见

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models

论文作者

Cho, Jaemin, Zala, Abhay, Bansal, Mohit

论文摘要

最近,Dall-E是一种多模式变压器语言模型,其变体(包括扩散模型)显示出高质量的文本对图像生成能力。但是,尽管产生了现实的图像产生,但尚未对如何评估此类模型进行详细的分析。在这项工作中,我们研究了不同文本对图像模型的视觉推理功能和社会偏见,涵盖了多模式变压器语言模型和扩散模型。首先,我们衡量三个视觉推理技能:对象识别,对象计数和空间关系理解。为此,我们建议使用Paintskills,这是一个衡量这些技能的组成诊断评估数据集。尽管图像产生能力很高,但最近模型的性能与对象计数和空间关系理解技能的上限精度之间存在巨大差距。其次,我们通过测量各种专业和属性的产生图像的性别/肤色分布来评估性别和肤色偏见。我们证明,最近的文本到图像生成模型从Web Image-Text对中学习了有关性别和肤色的特定偏见。我们希望我们的工作将有助于指导未来的进步,以改善有关视觉推理技能的文本到图像生成模型和学习社会公正的表示。代码和数据:https://github.com/j-min/dalleval

Recently, DALL-E, a multimodal transformer language model, and its variants, including diffusion models, have shown high-quality text-to-image generation capabilities. However, despite the realistic image generation results, there has not been a detailed analysis of how to evaluate such models. In this work, we investigate the visual reasoning capabilities and social biases of different text-to-image models, covering both multimodal transformer language models and diffusion models. First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding. For this, we propose PaintSkills, a compositional diagnostic evaluation dataset that measures these skills. Despite the high-fidelity image generation capability, a large gap exists between the performance of recent models and the upper bound accuracy in object counting and spatial relation understanding skills. Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images across various professions and attributes. We demonstrate that recent text-to-image generation models learn specific biases about gender and skin tone from web image-text pairs. We hope our work will help guide future progress in improving text-to-image generation models on visual reasoning skills and learning socially unbiased representations. Code and data: https://github.com/j-min/DallEval

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源