使用多个假设和深层嵌入来改善单词识别

论文标题

使用多个假设和深层嵌入来改善单词识别

Improving Word Recognition using Multiple Hypotheses and Deep Embeddings

论文作者

Bansal, Siddhant, Krishnan, Praveen, Jawahar, C. V.

论文摘要

我们提出了一种新的方案，以使用单词图像嵌入来提高识别准确性。我们使用训练有素的文本识别器，可以预测给定单词图像的多个文本假设。我们的融合方案通过利用从训练有素的单词图像嵌入网络获得的单词图像和文本嵌入来改善识别过程。我们提出了嵌入网络，该嵌入方式是使用三胞胎损失来学习合适的嵌入空间进行训练的，其中图像的嵌入更接近相应的文本转录的嵌入。因此，更新的嵌入空间有助于以更高的信心选择正确的预测。为了进一步提高准确性，我们提出了一个称为基于置信精度助推器（CAB）的插件模块。 CAB模块获得从嵌入式之间的文本识别器和欧几里得距离获得的置信分数，以生成更新的距离向量。更新的距离向量具有较低的距离值的正确单词和不正确单词的较高距离值。我们严格地在印地语语言的一系列书籍中系统地评估了我们提出的方法。我们的方法在单词识别准确性方面的绝对提高了约10％。

We propose a novel scheme for improving the word recognition accuracy using word image embeddings. We use a trained text recognizer, which can predict multiple text hypothesis for a given word image. Our fusion scheme improves the recognition process by utilizing the word image and text embeddings obtained from a trained word image embedding network. We propose EmbedNet, which is trained using a triplet loss for learning a suitable embedding space where the embedding of the word image lies closer to the embedding of the corresponding text transcription. The updated embedding space thus helps in choosing the correct prediction with higher confidence. To further improve the accuracy, we propose a plug-and-play module called Confidence based Accuracy Booster (CAB). The CAB module takes in the confidence scores obtained from the text recognizer and Euclidean distances between the embeddings to generate an updated distance vector. The updated distance vector has lower distance values for the correct words and higher distance values for the incorrect words. We rigorously evaluate our proposed method systematically on a collection of books in the Hindi language. Our method achieves an absolute improvement of around 10 percent in terms of word recognition accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题