GLAMI-1M：多语言图像文本时尚数据集

论文标题

GLAMI-1M：多语言图像文本时尚数据集

GLAMI-1M: A Multilingual Image-Text Fashion Dataset

论文作者

Kosar, Vaclav, Hoskovec, Antonín, Šulc, Milan, Bartyzal, Radek

论文摘要

我们介绍Glami-1m：最大的多语言图像文本分类数据集和基准。该数据集包含带有项目描述的时尚产品图像，每种13种语言中的每种。分为191个类的分类具有高质量的注释：测试集中的所有100K图像和1M训练集中的75％均被人体标记。该论文介绍了用于图像文本分类的基准，表明数据集提出了一个具有挑战性的细粒度分类问题：使用视觉和文本功能达到69.7％精度的最佳评分超级网络模型。使用修改成像模型的实验显示，数据集也适用于文本条件的图像生成。数据集，源代码和模型检查点发表在https://github.com/glami/glami/glami-1m上

We introduce GLAMI-1M: the largest multilingual image-text classification dataset and benchmark. The dataset contains images of fashion products with item descriptions, each in 1 of 13 languages. Categorization into 191 classes has high-quality annotations: all 100k images in the test set and 75% of the 1M training set were human-labeled. The paper presents baselines for image-text classification showing that the dataset presents a challenging fine-grained classification problem: The best scoring EmbraceNet model using both visual and textual features achieves 69.7% accuracy. Experiments with a modified Imagen model show the dataset is also suitable for image generation conditioned on text. The dataset, source code and model checkpoints are published at https://github.com/glami/glami-1m

下载PDF全文

下载文献需遵守相关版权规定

论文标题