对于学习检测面部动作而言，合成表达比真实表达更好

论文标题

对于学习检测面部动作而言，合成表达比真实表达更好

Synthetic Expressions are Better Than Real for Learning to Detect Facial Actions

论文作者

Niinuma, Koichiro, Ertugrul, Itir Onal, Cohn, Jeffrey F, Jeni, László A

论文摘要

训练分类器检测面部动作的关键障碍是带注释的视频数据库的有限尺寸以及许多动作发生的相对较低的频率。为了解决这些问题，我们提出了一种利用面部表达产生的方法。我们的方法从每个视频框架中重建面部的3D形状，将3D网格与规范视图保持一致，然后训练一个基于GAN的网络，以使用感兴趣的面部动作单位合成新颖的图像。为了评估这种方法，对两个单独的数据集进行了深入的神经网络培训：一个网络接受了FERA17生成的合成面部表情视频的培训；另一个网络接受了来自同一数据库的不变视频的培训。两个网络都使用相同的火车和验证分区，并在FERA17实际视频的测试分区进行了测试。经过合成面部表情的网络的表现优于接受实际面部表情的训练的网络，并且超过了当前的最新方法。

Critical obstacles in training classifiers to detect facial actions are the limited sizes of annotated video databases and the relatively low frequencies of occurrence of many actions. To address these problems, we propose an approach that makes use of facial expression generation. Our approach reconstructs the 3D shape of the face from each video frame, aligns the 3D mesh to a canonical view, and then trains a GAN-based network to synthesize novel images with facial action units of interest. To evaluate this approach, a deep neural network was trained on two separate datasets: One network was trained on video of synthesized facial expressions generated from FERA17; the other network was trained on unaltered video from the same database. Both networks used the same train and validation partitions and were tested on the test partition of actual video from FERA17. The network trained on synthesized facial expressions outperformed the one trained on actual facial expressions and surpassed current state-of-the-art approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题