顺序路由框架：完全基于胶囊网络的语音识别

论文标题

顺序路由框架：完全基于胶囊网络的语音识别

Sequential Routing Framework: Fully Capsule Network-based Speech Recognition

论文作者

Lee, Kyungmin, Joe, Hyunwhan, Lim, Hyeontaek, Kim, Kwangyoun, Kim, Sungsoo, Han, Chang Woo, Kim, Hong-Gee

论文摘要

胶囊网络（CAPSNET）最近引起了人们的关注，成为一种新颖的神经结构。本文介绍了顺序路由框架，我们认为这是第一种将仅限capsnet结构适应顺序到序列识别的方法。输入序列被封闭式，然后切成窗口大小。通过迭代路由机制在相应时间将每个切片分类为标签。之后，通过连接主义者时间分类（CTC）计算损失。在路由过程中，所需的参数数量可以由窗口大小控制，而不管序列的长度长度如何，通过在整个切片上共享可学习的权重。我们还提出了一种顺序动态路由算法来替换传统的动态路由。所提出的技术可以最大程度地减少由路线迭代引起的解码速度降解，因为它可以以非文字方式运行而不会降低精度。与双向长期内存的CTC网络相比，该方法在华尔街期刊语料库的单词误差率下降了1.1％，为16.9％。在Timit语料库中，与基于卷积神经网络的CTC网络相比，它在17.5％的电话错误率下降了0.7％（Zhang等，2016）。

Capsule networks (CapsNets) have recently gotten attention as a novel neural architecture. This paper presents the sequential routing framework which we believe is the first method to adapt a CapsNet-only structure to sequence-to-sequence recognition. Input sequences are capsulized then sliced by a window size. Each slice is classified to a label at the corresponding time through iterative routing mechanisms. Afterwards, losses are computed by connectionist temporal classification (CTC). During routing, the required number of parameters can be controlled by the window size regardless of the length of sequences by sharing learnable weights across the slices. We additionally propose a sequential dynamic routing algorithm to replace traditional dynamic routing. The proposed technique can minimize decoding speed degradation caused by the routing iterations since it can operate in a non-iterative manner without dropping accuracy. The method achieves a 1.1% lower word error rate at 16.9% on the Wall Street Journal corpus compared to bidirectional long short-term memory-based CTC networks. On the TIMIT corpus, it attains a 0.7% lower phone error rate at 17.5% compared to convolutional neural network-based CTC networks (Zhang et al., 2016).

下载PDF全文

下载文献需遵守相关版权规定

论文标题