论文标题
KRONECKER CP分解与快速乘法用于压缩RNN
Kronecker CP Decomposition with Fast Multiplication for Compressing RNNs
论文作者
论文摘要
复发性神经网络(RNN)在以序列数据为导向的任务中很强大,例如自然语言处理和视频识别。但是,由于现代RNN(包括长期术语记忆(LSTM)和封闭式复发单元(GRU)网络)具有复杂的拓扑和昂贵的空间/计算复杂性,因此近年来将它们压缩成为一个热门而有希望的话题。在大量的压缩方法中,张量分解,例如张量列(TT),块术语(BT),张量环(TR)和分层塔克(HT),这似乎是最惊人的方法,因为可能会获得非常高的压缩比。然而,这些张量分解格式都无法提供空间和计算效率。在本文中,我们考虑通过提出两种输入和张量强制性的重量之间的两个快速算法,来基于新型的Kronecker Candecomp/Parafac(KCP)分解来压缩RNN,该分解源自Kronecker Tensor(KT)分解。根据我们基于UCF11的实验,YouTube名人面对和UCF50数据集,可以验证,所提出的KCP-RNNs具有可比性的准确性性能与其他张量调制的格式的精度具有可比性的性能,甚至可以通过低级KCP获得278,219x压缩比。更重要的是,与在相似等级下的其他张张量相比,KCP-RNN在空间和计算复杂性方面都是有效的。此外,我们发现KCP具有并行计算的最佳潜力,可以加速神经网络中的计算。
Recurrent neural networks (RNNs) are powerful in the tasks oriented to sequential data, such as natural language processing and video recognition. However, since the modern RNNs, including long-short term memory (LSTM) and gated recurrent unit (GRU) networks, have complex topologies and expensive space/computation complexity, compressing them becomes a hot and promising topic in recent years. Among plenty of compression methods, tensor decomposition, e.g., tensor train (TT), block term (BT), tensor ring (TR) and hierarchical Tucker (HT), appears to be the most amazing approach since a very high compression ratio might be obtained. Nevertheless, none of these tensor decomposition formats can provide both the space and computation efficiency. In this paper, we consider to compress RNNs based on a novel Kronecker CANDECOMP/PARAFAC (KCP) decomposition, which is derived from Kronecker tensor (KT) decomposition, by proposing two fast algorithms of multiplication between the input and the tensor-decomposed weight. According to our experiments based on UCF11, Youtube Celebrities Face and UCF50 datasets, it can be verified that the proposed KCP-RNNs have comparable performance of accuracy with those in other tensor-decomposed formats, and even 278,219x compression ratio could be obtained by the low rank KCP. More importantly, KCP-RNNs are efficient in both space and computation complexity compared with other tensor-decomposed ones under similar ranks. Besides, we find KCP has the best potential for parallel computing to accelerate the calculations in neural networks.