HMM vs. CTC用于自动语音识别：基于从头开始的全和培训的比较

论文标题

HMM vs. CTC用于自动语音识别：基于从头开始的全和培训的比较

HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch

论文作者

Raissi, Tina, Zhou, Wei, Berger, Simon, Schlüter, Ralf, Ney, Hermann

论文摘要

在这项工作中，我们比较了隐藏的马尔可夫模型（HMM）和连接派时间分类（CTC）拓扑的从划痕序列序列级别的横向介绍（全-sum）培训中的自动语音识别（ASR）。除准确性外，我们还进一步分析了它们在语音信号和转录之间产生高质量时间对齐的能力，这对于许多随后的应用至关重要。此外，我们提出了几种方法来通过解决对齐建模问题来改善施加全和培训的收敛性。系统比较在CTC，后部HMM和W/O过渡概率以及标准混合HMM上进行了总和CTC，后部HMM和Librispeech Corpora进行系统比较。我们还提供了Viterbi强制分支和Baum-Welch全合作职业概率的详细分析。

In this work, we compare from-scratch sequence-level cross-entropy (full-sum) training of Hidden Markov Model (HMM) and Connectionist Temporal Classification (CTC) topologies for automatic speech recognition (ASR). Besides accuracy, we further analyze their capability for generating high-quality time alignment between the speech signal and the transcription, which can be crucial for many subsequent applications. Moreover, we propose several methods to improve convergence of from-scratch full-sum training by addressing the alignment modeling issue. Systematic comparison is conducted on both Switchboard and LibriSpeech corpora across CTC, posterior HMM with and w/o transition probabilities, and standard hybrid HMM. We also provide a detailed analysis of both Viterbi forced-alignment and Baum-Welch full-sum occupation probabilities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题