端到端的文档识别和理解Dessurt

论文标题

端到端的文档识别和理解Dessurt

End-to-end Document Recognition and Understanding with Dessurt

论文作者

Davis, Brian, Morse, Bryan, Price, Bryan, Tensmeyer, Chris, Wigington, Curtis, Morariu, Vlad

论文摘要

我们介绍了Dessurt，这是一个相对简单的文档理解变压器，能够在各种文档任务上进行微调，而不是先前的方法。它接收文档映像和任务字符串作为输入，并作为输出以任意文本自动添加。由于dessurt是一种端到端体系结构，除了文档理解外，还可以执行文本识别，因此它不需要像先前方法那样需要外部识别模型。 Dessurt比先前的方法更灵活，并且能够处理各种文档域和任务。我们表明，该模型可在9种不同的数据集任务组合中有效。

We introduce Dessurt, a relatively simple document understanding transformer capable of being fine-tuned on a greater variety of document tasks than prior methods. It receives a document image and task string as input and generates arbitrary text autoregressively as output. Because Dessurt is an end-to-end architecture that performs text recognition in addition to the document understanding, it does not require an external recognition model as prior methods do. Dessurt is a more flexible model than prior methods and is able to handle a variety of document domains and tasks. We show that this model is effective at 9 different dataset-task combinations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题