论文标题

MIDI通道检索使用手机乐谱的手机图片

MIDI Passage Retrieval Using Cell Phone Pictures of Sheet Music

论文作者

Yang, Daniel, Tanprasert, Thitaree, Jenrungrot, Teerapat, Shan, Mengyi, Tsai, TJ

论文摘要

本文研究了一个跨模式检索问题,用户希望通过拍摄单调乐谱音乐的手机图片来从MIDI文件中检索音乐段落。虽然许多作品探索了音频表音乐的检索,但这种情况是新颖的,因为查询是手机图片而不是数字扫描。为了解决这个问题,我们引入了一个名为Bootleg Score的中层功能表示,该表明明确编码了西方音乐符号规则。我们使用音乐和经典计算机视觉技术的确定性规则和检测简单几何形状的经典计算机视觉技术将MIDI和乐谱分数转换为盗版得分。一旦将MIDI和手机图像转换为盗版得分,我们就会使用动态编程估算对齐方式。我们系统最引人注目的特征是它进行测试时间适应性,根本没有可训练的权重 - 只有大约30个超参数。在包含1000个经典钢琴音乐的1000张手机图片的数据集中,我们的系统获得了.869的F量度分数,并且基于商业光学音乐识别软件的基线系统胜过基线系统。

This paper investigates a cross-modal retrieval problem in which a user would like to retrieve a passage of music from a MIDI file by taking a cell phone picture of a physical page of sheet music. While audio-sheet music retrieval has been explored by a number of works, this scenario is novel in that the query is a cell phone picture rather than a digital scan. To solve this problem, we introduce a mid-level feature representation called a bootleg score which explicitly encodes the rules of Western musical notation. We convert both the MIDI and the sheet music into bootleg scores using deterministic rules of music and classical computer vision techniques for detecting simple geometric shapes. Once the MIDI and cell phone image have been converted into bootleg scores, we estimate the alignment using dynamic programming. The most notable characteristic of our system is that it does test-time adaptation and has no trainable weights at all -- only a set of about 30 hyperparameters. On a dataset containing 1000 cell phone pictures taken of 100 scores of classical piano music, our system achieves an F measure score of .869 and outperforms baseline systems based on commercial optical music recognition software.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源