对手语产生的建模强化：一种计算方法

论文标题

对手语产生的建模强化：一种计算方法

Modeling Intensification for Sign Language Generation: A Computational Approach

论文作者

İnan, Mert, Zhong, Yang, Hassan, Sabit, Quandt, Lorna, Alikhani, Malihe

论文摘要

端到端的手语生成模型不能准确地表示手语中的韵律。缺乏时间和空间变化会导致质量不佳的表现，使人类口译员感到困惑。在本文中，我们旨在通过以数据驱动方式对加强进行建模，以改善生成的符号语言中的韵律。我们提出了基于手语的语言学的不同策略，这些策略告知如何在光泽注释中表示强度修饰符。为了采用我们的策略，我们首先注释了德国手语数据集基准Phoenix-14T的子集，具有不同级别的强化。然后，我们使用监督的强度标记器扩展了注释的数据集并为其其余部分获得标签。然后，该增强的数据集用于训练最先进的变压器模型以生成手语。我们发现，使用自动指标评估时，我们在强化建模方面的努力会产生更好的结果。人类评估还表明，使用我们的模型生成的视频较高的偏好。

End-to-end sign language generation models do not accurately represent the prosody in sign language. A lack of temporal and spatial variations leads to poor-quality generated presentations that confuse human interpreters. In this paper, we aim to improve the prosody in generated sign languages by modeling intensification in a data-driven manner. We present different strategies grounded in linguistics of sign language that inform how intensity modifiers can be represented in gloss annotations. To employ our strategies, we first annotate a subset of the benchmark PHOENIX-14T, a German Sign Language dataset, with different levels of intensification. We then use a supervised intensity tagger to extend the annotated dataset and obtain labels for the remaining portion of it. This enhanced dataset is then used to train state-of-the-art transformer models for sign language generation. We find that our efforts in intensification modeling yield better results when evaluated with automatic metrics. Human evaluation also indicates a higher preference of the videos generated using our model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题