论文标题
深层神经网络方法,用于分析音乐表演视频
Deep Neural Network approaches for Analysing Videos of Music Performances
论文作者
论文摘要
本文提出了一个框架,以使用3D卷积神经网络(CNN)自动化音乐表演视频中的标签过程。尽管这一想法是在先前的一项研究中提出的,但本文介绍了几个新颖性:(i)提出了一种克服阶级不平衡挑战的新方法,并通过批处理平衡方法和手势的空间代表来实现共存手势的学习。 (ii)对表演(吉他演奏)在音乐录制的音乐作品(吉他演奏)期间产生的7类和18个类别的手势进行了详细研究。 (iii)研究使用音频功能的可能性。 (iv)将分析扩展到多个视频。与先前的工作相比,这种新方法将手势识别的性能显着提高了12%(这项研究中的51%在先前的工作中超过39%)。我们成功地验证了7个超级类(72%),18个手势/类的合奏和其他视频(75%)的拟议方法。
This paper presents a framework to automate the labelling process for gestures in musical performance videos with a 3D Convolutional Neural Network (CNN). While this idea was proposed in a previous study, this paper introduces several novelties: (i) Presents a novel method to overcome the class imbalance challenge and make learning possible for co-existent gestures by batch balancing approach and spatial-temporal representations of gestures. (ii) Performs a detailed study on 7 and 18 categories of gestures generated during the performance (guitar play) of musical pieces that have been video-recorded. (iii) Investigates the possibility to use audio features. (iv) Extends the analysis to multiple videos. The novel methods significantly improve the performance of gesture identification by 12 %, when compared to the previous work (51 % in this study over 39 % in previous work). We successfully validate the proposed methods on 7 super classes (72 %), an ensemble of the 18 gestures/classes, and additional videos (75 %).