表演深度：得分与奥德音乐表演综合

论文标题

表演深度：得分与奥德音乐表演综合

Deep Performer: Score-to-Audio Music Performance Synthesis

论文作者

Dong, Hao-Wen, Zhou, Cong, Berg-Kirkpatrick, Taylor, McAuley, Julian

论文摘要

音乐表演综合旨在将乐谱综合为自然表演。在本文中，我们借用了文本到语音综合方面的最新进展，并呈现了深刻的表演者，这是一种新颖的Score-Audio音乐性能综合系统。与语音不同，音乐通常包含一声音和长音符。因此，我们提出了两种新技术来处理多形输入并在变压器编码器模型中提供细粒度的调理。为了培训我们提出的系统，我们提出了一个新的小提琴数据集，该数据集由配对的录音和分数以及它们之间的估计对齐。我们表明，我们提出的模型可以通过清晰的一音和谐波结构合成音乐。在听力测试中，我们在音高准确性，音色和噪声水平方面，基线模型（一种有条件的生成音频模型）实现了竞争质量。此外，我们提出的模型在总体质量上大大优于现有钢琴数据集的基线。

Music performance synthesis aims to synthesize a musical score into a natural performance. In this paper, we borrow recent advances in text-to-speech synthesis and present the Deep Performer -- a novel system for score-to-audio music performance synthesis. Unlike speech, music often contains polyphony and long notes. Hence, we propose two new techniques for handling polyphonic inputs and providing a fine-grained conditioning in a transformer encoder-decoder model. To train our proposed system, we present a new violin dataset consisting of paired recordings and scores along with estimated alignments between them. We show that our proposed model can synthesize music with clear polyphony and harmonic structures. In a listening test, we achieve competitive quality against the baseline model, a conditional generative audio model, in terms of pitch accuracy, timbre and noise level. Moreover, our proposed model significantly outperforms the baseline on an existing piano dataset in overall quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题