论文标题
使用VIT夹检索开放世界图像检索的一般图像描述符
General Image Descriptors for Open World Image Retrieval using ViT CLIP
论文作者
论文摘要
Google通用图像嵌入(GUIE)挑战是野外多域图像表示的首批比赛之一,涵盖了广泛的对象:地标,艺术品,食品等。这是一个基本的计算机视觉问题,具有在图像检索,搜索引擎和电子商务中的显着应用。在这项工作中,我们向Guie Challenge的第四名解决方案以及使用剪辑预先训练的“零拍动视觉变压器(VIT)进行微调的“技巧”。
The Google Universal Image Embedding (GUIE) Challenge is one of the first competitions in multi-domain image representations in the wild, covering a wide distribution of objects: landmarks, artwork, food, etc. This is a fundamental computer vision problem with notable applications in image retrieval, search engines and e-commerce. In this work, we explain our 4th place solution to the GUIE Challenge, and our "bag of tricks" to fine-tune zero-shot Vision Transformers (ViT) pre-trained using CLIP.