无注释的学习深度表示，用于使用合成数据和自我标签

论文标题

无注释的学习深度表示，用于使用合成数据和自我标签

Annotation-free Learning of Deep Representations for Word Spotting using Synthetic Data and Self Labeling

论文作者

Wolf, Fabian, Fink, Gernot A.

论文摘要

单词斑点是支持历史悠久的手写文档收集的首次探索的流行工具。如今，表现最好的方法依赖于机器学习技术，这些技术需要大量注释的培训材料。由于培训数据通常在应用程序方案中不可用，因此无注释的方法旨在解决未经代表性培训样本的检索任务。在这项工作中，我们提出了一种无注释的方法，该方法仍然采用机器学习技术，因此优于其他无学习方法。弱监督的培训计划依赖于词典，这不需要完全适合数据集。结合基于伪标记的训练样本的基于置信度的选择，我们实现了最新的逐个示例性能。此外，我们的方法允许执行逐弦的查询，这通常不是其他无注释方法的情况。

Word spotting is a popular tool for supporting the first exploration of historic, handwritten document collections. Today, the best performing methods rely on machine learning techniques, which require a high amount of annotated training material. As training data is usually not available in the application scenario, annotation-free methods aim at solving the retrieval task without representative training samples. In this work, we present an annotation-free method that still employs machine learning techniques and therefore outperforms other learning-free approaches. The weakly supervised training scheme relies on a lexicon, that does not need to precisely fit the dataset. In combination with a confidence based selection of pseudo-labeled training samples, we achieve state-of-the-art query-by-example performances. Furthermore, our method allows to perform query-by-string, which is usually not the case for other annotation-free methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题