端到端的沉默语音识别声音

论文标题

端到端的沉默语音识别声音

End-to-end Silent Speech Recognition with Acoustic Sensing

论文作者

Luo, Jian, Wang, Jianzong, Cheng, Ning, Jiang, Guilin, Xiao, Jing

论文摘要

无声的语音界面（SSI）一直是最近感兴趣的令人兴奋的领域。在本文中，我们提出了一种非侵入性的无声语音界面，该语音界面使用听不清的声学信号在人们讲话时捕获嘴唇动作。我们分别利用智能手机的扬声器和麦克风来发出信号并聆听其反思。这些反射的提取阶段特征被馈入深度学习网络以识别语音。我们还提出了一个端到端识别框架，该框架结合了CNN和基于注意力的编码器 - 编码网络。在有限的词汇量（54个句子）的评估结果中，单词错误率在不依赖说话者和环境独立的环境中为8.4％，而看不见的句子测试为8.1％。

Silent speech interfaces (SSI) has been an exciting area of recent interest. In this paper, we present a non-invasive silent speech interface that uses inaudible acoustic signals to capture people's lip movements when they speak. We exploit the speaker and microphone of the smartphone to emit signals and listen to their reflections, respectively. The extracted phase features of these reflections are fed into the deep learning networks to recognize speech. And we also propose an end-to-end recognition framework, which combines the CNN and attention-based encoder-decoder network. Evaluation results on a limited vocabulary (54 sentences) yield word error rates of 8.4% in speaker-independent and environment-independent settings, and 8.1% for unseen sentence testing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题