论文标题
与神经嵌入环境的环境中解开言语
Disentangling speech from surroundings with neural embeddings
论文作者
论文摘要
我们提出了一种方法,可以将语音信号与神经音频编解码器的嵌入空间中的嘈杂环境分开。我们引入了一个新的训练程序,该程序允许我们的模型产生通过嵌入向量给出的音频波形的结构化编码,其中嵌入向量的一个部分代表语音信号,其余的代表环境。我们通过对不同输入波形的嵌入进行分区并训练模型以忠实地从混合分区重建音频,从而确保每个分区编码单独的音频属性,从而实现这一目标。作为用例,我们证明了语音与背景噪声或混响特征的分离。我们的方法还允许对音频输出特性进行有针对性的调整。
We present a method to separate speech signals from noisy environments in the embedding space of a neural audio codec. We introduce a new training procedure that allows our model to produce structured encodings of audio waveforms given by embedding vectors, where one part of the embedding vector represents the speech signal, and the rest represent the environment. We achieve this by partitioning the embeddings of different input waveforms and training the model to faithfully reconstruct audio from mixed partitions, thereby ensuring each partition encodes a separate audio attribute. As use cases, we demonstrate the separation of speech from background noise or from reverberation characteristics. Our method also allows for targeted adjustments of the audio output characteristics.