深层神经网络方法，用于分析音乐表演视频

论文标题

深层神经网络方法，用于分析音乐表演视频

Deep Neural Network approaches for Analysing Videos of Music Performances

论文作者

Liwicki, Foteini Simistira, Upadhyay, Richa, Chhipa, Prakash Chandra, Murphy, Killian, Visi, Federico, Östersjö, Stefan, Liwicki, Marcus

论文摘要

本文提出了一个框架，以使用3D卷积神经网络（CNN）自动化音乐表演视频中的标签过程。尽管这一想法是在先前的一项研究中提出的，但本文介绍了几个新颖性：（i）提出了一种克服阶级不平衡挑战的新方法，并通过批处理平衡方法和手势的空间代表来实现共存手势的学习。（ii）对表演（吉他演奏）在音乐录制的音乐作品（吉他演奏）期间产生的7类和18个类别的手势进行了详细研究。（iii）研究使用音频功能的可能性。（iv）将分析扩展到多个视频。与先前的工作相比，这种新方法将手势识别的性能显着提高了12％（这项研究中的51％在先前的工作中超过39％）。我们成功地验证了7个超级类（72％），18个手势/类的合奏和其他视频（75％）的拟议方法。

This paper presents a framework to automate the labelling process for gestures in musical performance videos with a 3D Convolutional Neural Network (CNN). While this idea was proposed in a previous study, this paper introduces several novelties: (i) Presents a novel method to overcome the class imbalance challenge and make learning possible for co-existent gestures by batch balancing approach and spatial-temporal representations of gestures. (ii) Performs a detailed study on 7 and 18 categories of gestures generated during the performance (guitar play) of musical pieces that have been video-recorded. (iii) Investigates the possibility to use audio features. (iv) Extends the analysis to multiple videos. The novel methods significantly improve the performance of gesture identification by 12 %, when compared to the previous work (51 % in this study over 39 % in previous work). We successfully validate the proposed methods on 7 super classes (72 %), an ensemble of the 18 gestures/classes, and additional videos (75 %).

下载PDF全文

下载文献需遵守相关版权规定

论文标题