具有协作量化的有效且可扩展的神经残留波形编码

论文标题

具有协作量化的有效且可扩展的神经残留波形编码

Efficient And Scalable Neural Residual Waveform Coding With Collaborative Quantization

论文作者

Zhen, Kai, Lee, Mi Suk, Sung, Jongmo, Beack, Seungkwon, Kim, Minje

论文摘要

在神经语音编解码器中需要可伸缩性和效率，这为各种设备上的应用提供了广泛的比特率。我们提出了一个协作量化（CQ）方案，以共同学习LPC系数和相应残差的代码手册。 CQ不仅仅是将Shohorn LPC纳入神经网络，而是以整合的方式桥接了先进的神经网络模型以及传统但高效且特定领域的数字信号处理方法的计算能力。我们证明，CQ的质量比其前任在9 Kbps处具有更高的模型复杂性。我们还表明，CQ最多可以缩放24 kbps，在该kbps优于AMR-WB和Opus。作为神经波形编解码器，CQ模型的参数少于100万，明显少于许多其他生成模型。

Scalability and efficiency are desired in neural speech codecs, which supports a wide range of bitrates for applications on various devices. We propose a collaborative quantization (CQ) scheme to jointly learn the codebook of LPC coefficients and the corresponding residuals. CQ does not simply shoehorn LPC to a neural network, but bridges the computational capacity of advanced neural network models and traditional, yet efficient and domain-specific digital signal processing methods in an integrated manner. We demonstrate that CQ achieves much higher quality than its predecessor at 9 kbps with even lower model complexity. We also show that CQ can scale up to 24 kbps where it outperforms AMR-WB and Opus. As a neural waveform codec, CQ models are with less than 1 million parameters, significantly less than many other generative models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题