TMGAN-PLC：使用时间内存生成对抗网络隐藏音频数据包损失

论文标题

TMGAN-PLC：使用时间内存生成对抗网络隐藏音频数据包损失

TMGAN-PLC: Audio Packet Loss Concealment using Temporal Memory Generative Adversarial Network

论文作者

Guan, Yuansheng, Yu, Guochen, Li, Andong, Zheng, Chengshi, Wang, Jie

论文摘要

分组开关网络中的实时通信已在日常通信中广泛使用，而它们不可避免地会遭受网络延迟和实时条件受限的数据丢失。为了解决这些问题，已经开发了音频数据包丢失（PLC）算法，以通过重建丢失的信息来减轻语音传输故障。受传输延迟和设备内存的限制，PLC仍然使用相对较小的数据包缓冲区完成高质量的语音重建。在本文中，我们提出了一个被称为TMGAN-PLC的音频plc的时间内存生成的对抗网络，该网络由一种新型的嵌套UNET发生器和时间域/频域歧视器组成。具体而言，在发电机中精心设计了嵌套 - UNET和时间特征线性调制的组合，以精心调整框架内信息并建立框架间的时间依赖性。为了补充较长损失突发引起的缺少的语音内容，我们采用多阶段的门控矢量量化器来捕获正确的内容并重建近乎真实的光滑音频。在PLC挑战数据集上进行的广泛实验表明，所提出的方法在语音质量，清晰度和PLCMO方面产生了有希望的表现。

Real-time communications in packet-switched networks have become widely used in daily communication, while they inevitably suffer from network delays and data losses in constrained real-time conditions. To solve these problems, audio packet loss concealment (PLC) algorithms have been developed to mitigate voice transmission failures by reconstructing the lost information. Limited by the transmission latency and device memory, it is still intractable for PLC to accomplish high-quality voice reconstruction using a relatively small packet buffer. In this paper, we propose a temporal memory generative adversarial network for audio PLC, dubbed TMGAN-PLC, which is comprised of a novel nested-UNet generator and the time-domain/frequency-domain discriminators. Specifically, a combination of the nested-UNet and temporal feature-wise linear modulation is elaborately devised in the generator to finely adjust the intra-frame information and establish inter-frame temporal dependencies. To complement the missing speech content caused by longer loss bursts, we employ multi-stage gated vector quantizers to capture the correct content and reconstruct the near-real smooth audio. Extensive experiments on the PLC Challenge dataset demonstrate that the proposed method yields promising performance in terms of speech quality, intelligibility, and PLCMOS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题