端到端手语生产的渐进变压器

论文标题

端到端手语生产的渐进变压器

Progressive Transformers for End-to-End Sign Language Production

论文作者

Saunders, Ben, Camgoz, Necati Cihan, Bowden, Richard

论文摘要

自动手语制作（SLP）的目的是将口语翻译成与人类翻译相当的级别的连续手语视频流。如果这是可以实现的，那么它将彻底改变聋人的听力通信。以前的主要隔离SLP的工作表明，需要更适合完整符号序列的连续域的体系结构。在本文中，我们提出了渐进式变形金刚，这是一种新颖的体系结构，可以从离散的口语句子转化为代表手语的连续3D骨架姿势输出。我们提出了两种模型配置，这是一个端到端网络，该网络可直接从文本中产生符号，并且使用了光泽中介机构的堆叠网络。我们的变压器网络体系结构引入了一个计数器，该计数器可以在训练和推理中连续生成。我们还提供了几种数据增强过程，以克服漂移和改善SLP模型的性能的问题。我们为SLP提出了一种背面翻译评估机制，对挑战性的RWTH-PHOENIX-WEATER-2014T（Phoenix14T）数据集提出了基准定量结果，并为将来的研究设置了基准。

The goal of automatic Sign Language Production (SLP) is to translate spoken language to a continuous stream of sign language video at a level comparable to a human translator. If this was achievable, then it would revolutionise Deaf hearing communications. Previous work on predominantly isolated SLP has shown the need for architectures that are better suited to the continuous domain of full sign sequences. In this paper, we propose Progressive Transformers, a novel architecture that can translate from discrete spoken language sentences to continuous 3D skeleton pose outputs representing sign language. We present two model configurations, an end-to-end network that produces sign direct from text and a stacked network that utilises a gloss intermediary. Our transformer network architecture introduces a counter that enables continuous sequence generation at training and inference. We also provide several data augmentation processes to overcome the problem of drift and improve the performance of SLP models. We propose a back translation evaluation mechanism for SLP, presenting benchmark quantitative results on the challenging RWTH-PHOENIX-Weather-2014T(PHOENIX14T) dataset and setting baselines for future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题