在音频混合中自动均衡的单词嵌入

论文标题

在音频混合中自动均衡的单词嵌入

Word Embeddings for Automatic Equalization in Audio Mixing

论文作者

Venkatesh, Satvik, Moffat, David, Miranda, Eduardo Reck

论文摘要

近年来，机器学习已被广泛采用以自动化音频混合过程。自动混合系统已应用于各种音频效应，例如增益调整，均衡和混响。这些系统可以通过视觉接口来控制，使用旋钮和语义描述符提供音频示例。使用语义描述符或文本信息来控制这些系统是艺术家传达其创意目标的有效方法。在本文中，我们探讨了使用单词嵌入代表语义描述符的新颖想法。通常通过在大型书面文本中培训神经网络来获得单词嵌入。这些嵌入是神经网络的输入层，以创建从单词到eq设置的翻译。使用此技术，机器学习模型还可以生成以前从未见过的语义描述符的EQ设置。我们将人类的EQ设置与神经网络的预测进行比较，以评估预测的质量。结果表明，嵌入层使神经网络能够了解语义描述符。我们观察到，具有嵌入层的模型的性能要比没有嵌入层的模型更好，但仍然不如人类标签。

In recent years, machine learning has been widely adopted to automate the audio mixing process. Automatic mixing systems have been applied to various audio effects such as gain-adjustment, equalization, and reverberation. These systems can be controlled through visual interfaces, providing audio examples, using knobs, and semantic descriptors. Using semantic descriptors or textual information to control these systems is an effective way for artists to communicate their creative goals. In this paper, we explore the novel idea of using word embeddings to represent semantic descriptors. Word embeddings are generally obtained by training neural networks on large corpora of written text. These embeddings serve as the input layer of the neural network to create a translation from words to EQ settings. Using this technique, the machine learning model can also generate EQ settings for semantic descriptors that it has not seen before. We compare the EQ settings of humans with the predictions of the neural network to evaluate the quality of predictions. The results showed that the embedding layer enables the neural network to understand semantic descriptors. We observed that the models with embedding layers perform better than those without embedding layers, but still not as good as human labels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题