无监督的口语术语发现的自动表达自动编码器

论文标题

无监督的口语术语发现的自动表达自动编码器

Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery

论文作者

Bhati, Saurabhchand, Villalba, Jesús, Żelasko, Piotr, Dehak, Najim

论文摘要

无监督的口头术语发现包括两个任务：找到具有相同标签的声学段边界和具有相似的段的标签。我们基于以下假设执行分割，即帧特征向量在段中比整个段更相似。因此，对于强烈的分割性能，至关重要的是，特征代表框架的语音特性比其他可变性要多。我们通过自我表达的自动编码器框架实现了这一目标。它由一个单个编码器和两个具有共享权重的解码器组成。编码器将输入功能投射到潜在表示中。其中一个解码器试图从这些潜在表示中重建输入，而另一个则是从它们的自表达版本中重建输入。我们使用获得的功能来细分和聚集语音数据。我们评估了零资源2020挑战单元发现任务中提出方法的性能。提出的系统始终优于基线，证明了该方法在学习表示方面的有用性。

Unsupervised spoken term discovery consists of two tasks: finding the acoustic segment boundaries and labeling acoustically similar segments with the same labels. We perform segmentation based on the assumption that the frame feature vectors are more similar within a segment than across the segments. Therefore, for strong segmentation performance, it is crucial that the features represent the phonetic properties of a frame more than other factors of variability. We achieve this via a self-expressing autoencoder framework. It consists of a single encoder and two decoders with shared weights. The encoder projects the input features into a latent representation. One of the decoders tries to reconstruct the input from these latent representations and the other from the self-expressed version of them. We use the obtained features to segment and cluster the speech data. We evaluate the performance of the proposed method in the Zero Resource 2020 challenge unit discovery task. The proposed system consistently outperforms the baseline, demonstrating the usefulness of the method in learning representations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题