论文标题
无注释的学习深度表示,用于使用合成数据和自我标签
Annotation-free Learning of Deep Representations for Word Spotting using Synthetic Data and Self Labeling
论文作者
论文摘要
单词斑点是支持历史悠久的手写文档收集的首次探索的流行工具。如今,表现最好的方法依赖于机器学习技术,这些技术需要大量注释的培训材料。由于培训数据通常在应用程序方案中不可用,因此无注释的方法旨在解决未经代表性培训样本的检索任务。在这项工作中,我们提出了一种无注释的方法,该方法仍然采用机器学习技术,因此优于其他无学习方法。弱监督的培训计划依赖于词典,这不需要完全适合数据集。结合基于伪标记的训练样本的基于置信度的选择,我们实现了最新的逐个示例性能。此外,我们的方法允许执行逐弦的查询,这通常不是其他无注释方法的情况。
Word spotting is a popular tool for supporting the first exploration of historic, handwritten document collections. Today, the best performing methods rely on machine learning techniques, which require a high amount of annotated training material. As training data is usually not available in the application scenario, annotation-free methods aim at solving the retrieval task without representative training samples. In this work, we present an annotation-free method that still employs machine learning techniques and therefore outperforms other learning-free approaches. The weakly supervised training scheme relies on a lexicon, that does not need to precisely fit the dataset. In combination with a confidence based selection of pseudo-labeled training samples, we achieve state-of-the-art query-by-example performances. Furthermore, our method allows to perform query-by-string, which is usually not the case for other annotation-free methods.