论文标题

端到端的文档识别和理解Dessurt

End-to-end Document Recognition and Understanding with Dessurt

论文作者

Davis, Brian, Morse, Bryan, Price, Bryan, Tensmeyer, Chris, Wigington, Curtis, Morariu, Vlad

论文摘要

我们介绍了Dessurt,这是一个相对简单的文档理解变压器,能够在各种文档任务上进行微调,而不是先前的方法。它接收文档映像和任务字符串作为输入,并作为输出以任意文本自动添加。由于dessurt是一种端到端体系结构,除了文档理解外,还可以执行文本识别,因此它不需要像先前方法那样需要外部识别模型。 Dessurt比先前的方法更灵活,并且能够处理各种文档域和任务。我们表明,该模型可在9种不同的数据集任务组合中有效。

We introduce Dessurt, a relatively simple document understanding transformer capable of being fine-tuned on a greater variety of document tasks than prior methods. It receives a document image and task string as input and generates arbitrary text autoregressively as output. Because Dessurt is an end-to-end architecture that performs text recognition in addition to the document understanding, it does not require an external recognition model as prior methods do. Dessurt is a more flexible model than prior methods and is able to handle a variety of document domains and tasks. We show that this model is effective at 9 different dataset-task combinations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源