自动化音频字幕，并带有划时的课程学习字幕

论文标题

自动化音频字幕，并带有划时的课程学习字幕

Automated Audio Captioning with Epochal Difficult Captions for Curriculum Learning

论文作者

Koh, Andrew, Tiwari, Soham, Siong, Chng Eng

论文摘要

在本文中，我们提出了一种算法，即划时的标题，以补充自动音频字幕任务的任何模型的培训。划界困难标题是对关键字估计任务的优雅发展，以前的工作用来训练AAC模型的编码器。阶段难度标题根据课程和难度水平修改目标字幕，并确定为当前时期的函数。划时数可以与任何模型架构一起使用，并且是一种不会增加训练时间的轻量级功能。我们在三个系统上测试了我们的结果，并表明使用时期困难标题始终提高性能

In this paper, we propose an algorithm, Epochal Difficult Captions, to supplement the training of any model for the Automated Audio Captioning task. Epochal Difficult Captions is an elegant evolution to the keyword estimation task that previous work have used to train the encoder of the AAC model. Epochal Difficult Captions modifies the target captions based on a curriculum and a difficulty level determined as a function of current epoch. Epochal Difficult Captions can be used with any model architecture and is a lightweight function that does not increase training time. We test our results on three systems and show that using Epochal Difficult Captions consistently improves performance

下载PDF全文

下载文献需遵守相关版权规定

论文标题