论文标题
张量张量产品,用于最佳表示和压缩
Tensor-Tensor Products for Optimal Representation and Compression
论文作者
论文摘要
在这个大数据,数据分析和机器学习的时代,必须找到压缩大数据集的方法,以使后续分析所需的内在特征不会丢失。用于数据维度降低和特征提取的传统主力是矩阵SVD,它以矩阵格式排列了数据的前提。我们在这项研究中的主要目标是表明,当将高维数据集当作张量(又称多路阵列)时,高维数据集更为可压缩,并在张量张量的产品结构下通过Tensor-SVD压缩(Kilmer和Martin,2011; Kernfeld等人,Kernfeld等人,2015年)。首先,在两种不同的截断策略下,在张量-SVD的家族的家族中证明了ECKART年轻的最优结果。由于矩阵和基于张量的代数都可以证明这种最优性能,因此出现了一个基本问题:张量构造是否以表示效率为基础矩阵结构?答案是肯定的,如我们所示,当我们证明相等尺寸跨度空间的张量张量表示可以优于其矩阵对应物。然后,我们研究了截短的张量-SVD提供的压缩表示如何在理论上和压缩性能与最接近张量的类似物,截断性HOSVD(de Lathauwer等,2000; de lathauwer and de lathauwer and vandewalle,2004年),因此显示了我们的潜在的基于我们的Tensor algorithms。最后,我们提出了新的张量截断的SVD变体,即多向张量SVD,可提供进一步的近似表示效率,并讨论它们认为最佳的条件。我们以一项数字研究结论,证明了该理论的实用性。
In this era of big data, data analytics and machine learning, it is imperative to find ways to compress large data sets such that intrinsic features necessary for subsequent analysis are not lost. The traditional workhorse for data dimensionality reduction and feature extraction has been the matrix SVD, which presupposes that the data has been arranged in matrix format. Our main goal in this study is to show that high-dimensional data sets are more compressible when treated as tensors (aka multiway arrays) and compressed via tensor-SVDs under the tensor-tensor product structures in (Kilmer and Martin, 2011; Kernfeld et al., 2015). We begin by proving Eckart Young optimality results for families of tensor-SVDs under two different truncation strategies. As such optimality properties can be proven in both matrix and tensor-based algebras, a fundamental question arises: does the tensor construct subsume the matrix construct in terms of representation efficiency? The answer is yes, as shown when we prove that a tensor-tensor representation of an equal dimensional spanning space can be superior to its matrix counterpart. We then investigate how the compressed representation provided by the truncated tensor-SVD is related both theoretically and in compression performance to its closest tensor-based analogue, truncated HOSVD (De Lathauwer et al., 2000; De Lathauwer and Vandewalle, 2004), thereby showing the potential advantages of our tensor-based algorithms. Finally, we propose new tensor truncated SVD variants, namely multi-way tensor SVDs, provide further approximated representation efficiency and discuss under which conditions they are considered optimal. We conclude with a numerical study demonstrating the utility of the theory.