论文标题
临床音频数据集的有条件生成数据增强
Conditional Generative Data Augmentation for Clinical Audio Datasets
论文作者
论文摘要
在这项工作中,我们提出了一种基于有条件的Wasserstein生成对抗网络的新型数据增强方法,该方法具有梯度惩罚(CWGAN-GP),并在日志频谱图上运行。为了验证我们的方法,我们创建了一个临床音频数据集,该数据集在总髋关节置换术(THA)过程中记录在现实世界手术室中,并包含典型的声音,与干预的不同阶段相似。我们证明了所提出的方法从数据集分布中生成现实的类调节样本的能力,并表明,生成的增强样品的训练在分类性能方面优于经典音频增强方法。使用RESNET-18分类器对性能进行了评估,该分类器在5倍的交叉验证实验中使用拟议的增强方法显示了平均宏F1得分提高1.70%。由于临床数据通常是昂贵的,因此实际的和高质量的数据增强方法的开发对于提高基于学习的算法的鲁棒性和概括能力至关重要,这对于安全关键的关键医疗应用尤其重要。因此,提出的数据增强方法是改善基于临床音频的机器学习系统的数据瓶颈的重要一步。
In this work, we propose a novel data augmentation method for clinical audio datasets based on a conditional Wasserstein Generative Adversarial Network with Gradient Penalty (cWGAN-GP), operating on log-mel spectrograms. To validate our method, we created a clinical audio dataset which was recorded in a real-world operating room during Total Hip Arthroplasty (THA) procedures and contains typical sounds which resemble the different phases of the intervention. We demonstrate the capability of the proposed method to generate realistic class-conditioned samples from the dataset distribution and show that training with the generated augmented samples outperforms classical audio augmentation methods in terms of classification performance. The performance was evaluated using a ResNet-18 classifier which shows a mean Macro F1-score improvement of 1.70% in a 5-fold cross validation experiment using the proposed augmentation method. Because clinical data is often expensive to acquire, the development of realistic and high-quality data augmentation methods is crucial to improve the robustness and generalization capabilities of learning-based algorithms which is especially important for safety-critical medical applications. Therefore, the proposed data augmentation method is an important step towards improving the data bottleneck for clinical audio-based machine learning systems.