论文标题
关于学习数百万级自由手素描的语义表示
On Learning Semantic Representations for Million-Scale Free-Hand Sketches
论文作者
论文摘要
在本文中,我们研究了学习数百万级自由手素描的语义表示。由于草图的独特特征,例如多样,稀疏,抽象,嘈杂,这是高度挑战的。我们提出了一个双分支CNNRNN网络体系结构来表示草图,该草图同时编码了草图的静态和时间模式。基于这种体系结构,我们进一步探索了在两个具有挑战性但实用的设置中学习面向素描的语义表示,即在百万尺度的草图上进行哈希检索和零拍的识别。具体来说,我们将双分支结构用作通用表示框架来设计两个特定于草图的深层模型:(i)我们为草图检索提供了深层的散列模型,其中专门设计了新颖的哈希损失,以适应素描的抽象和凌乱的特征。 (ii)我们通过收集大规模的边缘映射数据集并提议从边缘映射提取一组语义向量作为草图零量域的语义知识来提取一组语义向量,以提议草图零拍识别的深层嵌入模型。这两个深层模型均通过对百万级草图的全面实验进行评估,并胜过最先进的竞争对手。
In this paper, we study learning semantic representations for million-scale free-hand sketches. This is highly challenging due to the domain-unique traits of sketches, e.g., diverse, sparse, abstract, noisy. We propose a dual-branch CNNRNN network architecture to represent sketches, which simultaneously encodes both the static and temporal patterns of sketch strokes. Based on this architecture, we further explore learning the sketch-oriented semantic representations in two challenging yet practical settings, i.e., hashing retrieval and zero-shot recognition on million-scale sketches. Specifically, we use our dual-branch architecture as a universal representation framework to design two sketch-specific deep models: (i) We propose a deep hashing model for sketch retrieval, where a novel hashing loss is specifically designed to accommodate both the abstract and messy traits of sketches. (ii) We propose a deep embedding model for sketch zero-shot recognition, via collecting a large-scale edge-map dataset and proposing to extract a set of semantic vectors from edge-maps as the semantic knowledge for sketch zero-shot domain alignment. Both deep models are evaluated by comprehensive experiments on million-scale sketches and outperform the state-of-the-art competitors.