HAM2POSE：将手语符号动画为姿势序列

论文标题

HAM2POSE：将手语符号动画为姿势序列

Ham2Pose: Animating Sign Language Notation into Pose Sequences

论文作者

Shalev-Arkushin, Rotem, Moryossef, Amit, Fried, Ohad

论文摘要

将口语翻译成符号语言对于听力和听力障碍社区之间的公开交流是必需的。为了实现这一目标，我们提出了第一种将用hamnosys（词汇手语符号）（词汇符号）编写为签名姿势序列的方法。由于Hamnosys是通用的，我们提出的方法为目标手语提供了通用的解决方案。我们的方法使用变压器编码器逐渐生成姿势预测，这些编码器在考虑其空间和时间信息的同时创建有意义的文本表示并摆姿势。我们在培训过程中使用薄弱的监督，并表明我们的方法成功地从部分和不准确的数据中学习。此外，我们为姿势序列，归一化的动态时间扭曲（NDTW）提供了新的距离测量，基于DTW，基于归一化关键点轨迹，并使用AUTSL（一种大型手语数据集）验证其正确性。我们表明，它比现有测量值更准确地测量姿势序列之间的距离，并使用它来评估我们生成的姿势序列的质量。数据预处理，模型和距离测量的代码将公开发布以供将来的研究。

Translating spoken languages into Sign languages is necessary for open communication between the hearing and hearing-impaired communities. To achieve this goal, we propose the first method for animating a text written in HamNoSys, a lexical Sign language notation, into signed pose sequences. As HamNoSys is universal, our proposed method offers a generic solution invariant to the target Sign language. Our method gradually generates pose predictions using transformer encoders that create meaningful representations of the text and poses while considering their spatial and temporal information. We use weak supervision for the training process and show that our method succeeds in learning from partial and inaccurate data. Additionally, we offer a new distance measurement for pose sequences, normalized Dynamic Time Warping (nDTW), based on DTW over normalized keypoints trajectories, and validate its correctness using AUTSL, a large-scale Sign language dataset. We show that it measures the distance between pose sequences more accurately than existing measurements and use it to assess the quality of our generated pose sequences. Code for the data pre-processing, the model, and the distance measurement is publicly released for future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题