论文标题

恒星时间分类:带有部分标记数据的序列分类

Star Temporal Classification: Sequence Classification with Partially Labeled Data

论文作者

Pratap, Vineel, Hannun, Awni, Synnaeve, Gabriel, Collobert, Ronan

论文摘要

我们开发了一种算法,该算法可以从部分标记和未分段的顺序数据中学习。当丢失许多标签时,大多数顺序损耗函数,例如连接派时间分类(CTC)。我们使用Star Perimal分类(STC)来解决此问题,该分类使用特殊的星令牌来允许对齐,每当令牌可能丢失时,其中包括所有可能的令牌。我们将STC表示为加权有限态传感器(WFSTS)的组成,并使用GTN(用于与WFST自动分化的框架)来计算梯度。我们对自动语音识别进行广泛的实验。这些实验表明,当缺少高达70%的标签时,STC可以恢复大部分监督基线的性能。我们还在手写识别方面执行实验,以表明我们的方法很容易应用于其他序列分类任务。

We develop an algorithm which can learn from partially labeled and unsegmented sequential data. Most sequential loss functions, such as Connectionist Temporal Classification (CTC), break down when many labels are missing. We address this problem with Star Temporal Classification (STC) which uses a special star token to allow alignments which include all possible tokens whenever a token could be missing. We express STC as the composition of weighted finite-state transducers (WFSTs) and use GTN (a framework for automatic differentiation with WFSTs) to compute gradients. We perform extensive experiments on automatic speech recognition. These experiments show that STC can recover most of the performance of supervised baseline when up to 70% of the labels are missing. We also perform experiments in handwriting recognition to show that our method easily applies to other sequence classification tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源