论文标题

与非毒物卷积的流神经音频合成

Streamable Neural Audio Synthesis With Non-Causal Convolutions

论文作者

Caillon, Antoine, Esling, Philippe

论文摘要

深度学习模型主要以离线推理方式使用。但是,这强烈限制了这些模型在音频生成设置中的使用,因为大多数创意工作流程都是基于实时数字信号处理的。尽管基于经常性网络的方法自然可以适应此基于缓冲的计算,但卷积的使用仍然带来一些严重的挑战。为了解决这个问题,已经提出了因果流卷的使用。但是,这需要特定的复杂训练,并可能影响由此产生的音频质量。 在本文中,我们引入了一种新方法,允许生成非毒物流模型。这允许使任何卷积模型与基于实时缓冲区的处理兼容。由于我们的方法基于模型的训练后重新配置,因此我们表明它能够将未经因果约束的训练的模型转换为流模型。我们展示了如何对我们的方法进行调整以拟合具有并行分支的复杂体系结构。为了评估我们的方法,我们将其应用于最近的Rave模型,该模型提供了高质量的实时音频综合。我们测试了多个音乐和语音数据集的方法,并表明它比重​​叠ADD方法快,同时对发电质量没有影响。最后,我们介绍了两个开源实现,作为Max/MSP和Puredata外观以及VST音频插件。这允许在笔记本电脑CPU上使用实时神经音频合成传统的数字音频工作站。

Deep learning models are mostly used in an offline inference fashion. However, this strongly limits the use of these models inside audio generation setups, as most creative workflows are based on real-time digital signal processing. Although approaches based on recurrent networks can be naturally adapted to this buffer-based computation, the use of convolutions still poses some serious challenges. To tackle this issue, the use of causal streaming convolutions have been proposed. However, this requires specific complexified training and can impact the resulting audio quality. In this paper, we introduce a new method allowing to produce non-causal streaming models. This allows to make any convolutional model compatible with real-time buffer-based processing. As our method is based on a post-training reconfiguration of the model, we show that it is able to transform models trained without causal constraints into a streaming model. We show how our method can be adapted to fit complex architectures with parallel branches. To evaluate our method, we apply it on the recent RAVE model, which provides high-quality real-time audio synthesis. We test our approach on multiple music and speech datasets and show that it is faster than overlap-add methods, while having no impact on the generation quality. Finally, we introduce two open-source implementation of our work as Max/MSP and PureData externals, and as a VST audio plugin. This allows to endow traditional digital audio workstation with real-time neural audio synthesis on a laptop CPU.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源