论文标题
连续旋律通过分离的短期表示和结构条件产生
Continuous Melody Generation via Disentangled Short-Term Representations and Structural Conditions
论文作者
论文摘要
自动音乐生成是一个跨学科的研究主题,结合了音乐创造力和语义分析,以创建自动机器即兴创作。这种系统的重要属性是允许用户指定生成音乐的条件和所需属性。在本文中,我们设计了一个模型,用于构成旋律,并给定用户指定的符号场景与以前的音乐上下文相结合。我们添加了手动标记的矢量,该矢量用和弦功能表示外部音乐质量,该功能提供了谐波张力和分辨率的低维表示。我们的模型能够通过将8张音符序列作为基本单元产生长旋律,并与另一首特定的歌曲共享一致的节奏模式结构。该模型包含两个阶段,需要单独的训练,其中第一阶段采用条件变异自动编码器(C-VAE)在音符序列及其潜在表示之间构建两者,而第二阶段则采用长期的短期记忆网络(LSTM),具有结构性条件,以继续写作未来的旋律。我们通过C-VAE进一步利用了分离技术,以允许基于音高轮廓信息的旋律与节奏模式的调节分开产生。最后,我们使用节奏和主观听力研究的定量分析评估了所提出的模型。结果表明,我们的模型产生的音乐往往具有显着的重复结构,丰富的动机和稳定的节奏模式。从分离的表示与语义场景规范条件相结合的较长和更结构的短语的能力显示了我们模型的广泛应用。
Automatic music generation is an interdisciplinary research topic that combines computational creativity and semantic analysis of music to create automatic machine improvisations. An important property of such a system is allowing the user to specify conditions and desired properties of the generated music. In this paper we designed a model for composing melodies given a user specified symbolic scenario combined with a previous music context. We add manual labeled vectors denoting external music quality in terms of chord function that provides a low dimensional representation of the harmonic tension and resolution. Our model is capable of generating long melodies by regarding 8-beat note sequences as basic units, and shares consistent rhythm pattern structure with another specific song. The model contains two stages and requires separate training where the first stage adopts a Conditional Variational Autoencoder (C-VAE) to build a bijection between note sequences and their latent representations, and the second stage adopts long short-term memory networks (LSTM) with structural conditions to continue writing future melodies. We further exploit the disentanglement technique via C-VAE to allow melody generation based on pitch contour information separately from conditioning on rhythm patterns. Finally, we evaluate the proposed model using quantitative analysis of rhythm and the subjective listening study. Results show that the music generated by our model tends to have salient repetition structures, rich motives, and stable rhythm patterns. The ability to generate longer and more structural phrases from disentangled representations combined with semantic scenario specification conditions shows a broad application of our model.