论文标题

diffusionDB:用于文本到图像生成模型的大型提示库数据集

DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models

论文作者

Wang, Zijie J., Montoya, Evan, Munechika, David, Yang, Haoyang, Hoover, Benjamin, Chau, Duen Horng

论文摘要

随着扩散模型的最新进展,用户可以通过以自然语言编写文本提示来生成高质量的图像。但是,生成带有所需细节的图像需要适当的提示,而且通常不清楚模型对不同的提示或最佳提示是什么反应。为了帮助研究人员应对这些关键挑战,我们介绍了diffusionDB,这是第一个大规模的文本图提示提示数据集,总计6.5TB,其中包含由稳定扩散产生的1400万张图像,180万个独特的提示以及由真实用户指定的超参数。我们分析提示的句法和语义特征。我们指出了特定的高参数值和及时的样式,这些值可能导致模型错误并提供潜在有害模型使用情况的证据,例如产生错误信息。这个人类驱动的数据集的前所未有的规模和多样性为理解提示和生成模型之间的相互作用,检测深击和设计人类交互工具提供了令人兴奋的研究机会,以帮助用户更轻松地使用这些模型。 DifFusionDB可公开可用:https://poloclub.github.io/diffusiondb。

With recent advancements in diffusion models, users can generate high-quality images by writing text prompts in natural language. However, generating images with desired details requires proper prompts, and it is often unclear how a model reacts to different prompts or what the best prompts are. To help researchers tackle these critical challenges, we introduce DiffusionDB, the first large-scale text-to-image prompt dataset totaling 6.5TB, containing 14 million images generated by Stable Diffusion, 1.8 million unique prompts, and hyperparameters specified by real users. We analyze the syntactic and semantic characteristics of prompts. We pinpoint specific hyperparameter values and prompt styles that can lead to model errors and present evidence of potentially harmful model usage, such as the generation of misinformation. The unprecedented scale and diversity of this human-actuated dataset provide exciting research opportunities in understanding the interplay between prompts and generative models, detecting deepfakes, and designing human-AI interaction tools to help users more easily use these models. DiffusionDB is publicly available at: https://poloclub.github.io/diffusiondb.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源