少更多：面部地标可以识别自发的微笑

论文标题

少更多：面部地标可以识别自发的微笑

Less is More: Facial Landmarks can Recognize a Spontaneous Smile

论文作者

Faroque, Md. Tahrim, Yang, Yan, Hossain, Md Zakir, Naim, Sheikh Motahar, Mohammed, Nabeel, Rahman, Shafin

论文摘要

微笑真实的分类是解释社会互动的任务。从广义上讲，它区分了自发和姿势的微笑。以前的方法使用面部地标的手工设计功能或以端到端的方式进行的原始微笑视频来执行微笑分类任务。基于功能的方法需要人类专家的特征工程和重大处理步骤的干预。相反，馈入端到端型号的原始微笑视频输入为该过程带来了更多的自动化，其成本考虑了许多冗余的面部特征（超越地标地点），这主要与微笑的真实分类无关。尚不清楚以端到端的方式从地标建立歧视性特征。我们提出了一个变压器体系结构的网格塞框架，以解决上述限制。为了消除冗余的面部特征，我们的地标输入是从预先训练的地标探测器的注意网格中提取的。同样，要发现歧视特征，我们考虑了地标的相对性和轨迹。为了相对论，我们汇总了面部标志，从概念上讲，在概念上格式化曲线以建立局部空间特征。对于轨迹，我们通过自我发项机制估算了整个时间的地标组成特征的运动，该机制捕获了对同一地标的轨迹的成对依赖性。这个想法使我们能够在UVA-NEMO，BBC，MMI面部表达和SPOS数据集上实现最先进的性能。

Smile veracity classification is a task of interpreting social interactions. Broadly, it distinguishes between spontaneous and posed smiles. Previous approaches used hand-engineered features from facial landmarks or considered raw smile videos in an end-to-end manner to perform smile classification tasks. Feature-based methods require intervention from human experts on feature engineering and heavy pre-processing steps. On the contrary, raw smile video inputs fed into end-to-end models bring more automation to the process with the cost of considering many redundant facial features (beyond landmark locations) that are mainly irrelevant to smile veracity classification. It remains unclear to establish discriminative features from landmarks in an end-to-end manner. We present a MeshSmileNet framework, a transformer architecture, to address the above limitations. To eliminate redundant facial features, our landmarks input is extracted from Attention Mesh, a pre-trained landmark detector. Again, to discover discriminative features, we consider the relativity and trajectory of the landmarks. For the relativity, we aggregate facial landmark that conceptually formats a curve at each frame to establish local spatial features. For the trajectory, we estimate the movements of landmark composed features across time by self-attention mechanism, which captures pairwise dependency on the trajectory of the same landmark. This idea allows us to achieve state-of-the-art performances on UVA-NEMO, BBC, MMI Facial Expression, and SPOS datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题