深度学习启用语音识别和综合的语义通信

论文标题

深度学习启用语音识别和综合的语义通信

Deep Learning Enabled Semantic Communications with Speech Recognition and Synthesis

论文作者

Weng, Zhenzi, Qin, Zhijin, Tao, Xiaoming, Pan, Chengkang, Liu, Guangyi, Li, Geoffrey Ye

论文摘要

在本文中，我们开发了一种基于深度学习的语义通信系统，用于语音传输，名为DeepSc-st。我们将语音识别和语音综合分别作为通信系统的传输任务。首先，通过接收的语义特征在接收器上恢复了与语音识别相关的语义特征用于传输的传输，并在接收器上恢复了文本，这大大降低了所需的数据传输量而不会降低性能。然后，我们在接收器上执行语音综合，该综合致力于通过将公认的文本和扬声器信息馈送到神经网络模块中来重新生成语音信号。为了使DeepSc-ST适应动态通道环境，我们确定了一个可靠的模型来应对不同的通道条件。根据仿真结果，提出的DEEPSC-ST显着优于常规通信系统和现有的启用DL的通信系统，尤其是在低信噪比（SNR）制度中。作为DEEPSC-ST的概念证明，软件演示被进一步开发。

In this paper, we develop a deep learning based semantic communication system for speech transmission, named DeepSC-ST. We take the speech recognition and speech synthesis as the transmission tasks of the communication system, respectively. First, the speech recognition-related semantic features are extracted for transmission by a joint semantic-channel encoder and the text is recovered at the receiver based on the received semantic features, which significantly reduces the required amount of data transmission without performance degradation. Then, we perform speech synthesis at the receiver, which dedicates to re-generate the speech signals by feeding the recognized text and the speaker information into a neural network module. To enable the DeepSC-ST adaptive to dynamic channel environments, we identify a robust model to cope with different channel conditions. According to the simulation results, the proposed DeepSC-ST significantly outperforms conventional communication systems and existing DL-enabled communication systems, especially in the low signal-to-noise ratio (SNR) regime. A software demonstration is further developed as a proof-of-concept of the DeepSC-ST.

下载PDF全文

下载文献需遵守相关版权规定

论文标题