musika！快速无限波形音乐发电

论文标题

musika！快速无限波形音乐发电

Musika! Fast Infinite Waveform Music Generation

论文作者

Pasini, Marco, Schlüter, Jan

论文摘要

快速和用户控制的音乐生成可以实现创作或表演音乐的新方法。但是，最先进的音乐生成系统需要大量的数据和计算资源来培训，并且推断很慢。这使它们对于实时交互式使用不切实际。在这项工作中，我们介绍了Musika，这是一种音乐发电系统，可以使用单个消费者GPU在数百个小时的音乐上进行培训，并且比消费者CPU上的任意长度的实时发音乐更快。我们首先学习具有对抗性自动编码器的光谱幅度和相位的紧凑型可逆表示，然后在此表示上训练生成的对抗网络（GAN）为特定的音乐领域训练。潜在坐标系可以并行生成任意长的摘录序列，而全局上下文向量使音乐可以在时间上保持风格连贯。我们执行定量评估，以评估生成的样品的质量，并展示钢琴和技术音乐生成的用户控制选项。我们在github.com/marcoppasini/musika上发布源代码和预估计的自动编码器重量，使得可以在几个小时内使用单个GPU的新音乐领域对GAN进行培训。

Fast and user-controllable music generation could enable novel ways of composing or performing music. However, state-of-the-art music generation systems require large amounts of data and computational resources for training, and are slow at inference. This makes them impractical for real-time interactive use. In this work, we introduce Musika, a music generation system that can be trained on hundreds of hours of music using a single consumer GPU, and that allows for much faster than real-time generation of music of arbitrary length on a consumer CPU. We achieve this by first learning a compact invertible representation of spectrogram magnitudes and phases with adversarial autoencoders, then training a Generative Adversarial Network (GAN) on this representation for a particular music domain. A latent coordinate system enables generating arbitrarily long sequences of excerpts in parallel, while a global context vector allows the music to remain stylistically coherent through time. We perform quantitative evaluations to assess the quality of the generated samples and showcase options for user control in piano and techno music generation. We release the source code and pretrained autoencoder weights at github.com/marcoppasini/musika, such that a GAN can be trained on a new music domain with a single GPU in a matter of hours.

下载PDF全文

下载文献需遵守相关版权规定

论文标题