使用混合比特率Opus压缩的多通道声学建模

论文标题

使用混合比特率Opus压缩的多通道声学建模

Multi-channel Acoustic Modeling using Mixed Bitrate OPUS Compression

论文作者

Khare, Aparna, Sundaram, Shiva, Wu, Minhua

论文摘要

最近的文献表明，具有多通道音频输入的学识渊博的前端可以优于传统的梁形成算法以自动语音识别（ASR）。在本文中，我们介绍了使用Opus压缩的多通道声学建模的研究，用于不同的渠道。我们将单词错误率（WER）的降解作为编码比特率的音频的函数，并表明与未压缩音频相比，WER降解相对12.6％。我们表明，它始终优先于在有限的带宽下，在单个通道音频输入上具有多通道音频输入。我们的结果表明，对于最好的WER，当两个通道之一可以用高于32kbps的比特率编码时，它的最佳选择是可以用最高比特率编码另一个通道。对于低于该比特率，最好在两个通道之间平均分配比特率。我们进一步表明，通过在混合比特率输入上训练声学模型，可以使用单个模型回收高达50％的降解。

Recent literature has shown that a learned front end with multi-channel audio input can outperform traditional beam-forming algorithms for automatic speech recognition (ASR). In this paper, we present our study on multi-channel acoustic modeling using OPUS compression with different bitrates for the different channels. We analyze the degradation in word error rate (WER) as a function of the audio encoding bitrate and show that the WER degrades by 12.6% relative with 16kpbs as compared to uncompressed audio. We show that its always preferable to have a multi-channel audio input over a single channel audio input given limited bandwidth. Our results show that for the best WER, when one of the two channels can be encoded with a bitrate higher than 32kbps, its optimal to encode the other channel with the highest bitrate possible. For bitrates lower than that, its preferable to distribute the bitrate equally between the two channels. We further show that by training the acoustic model on mixed bitrate input, up to 50% of the degradation can be recovered using a single model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题