论文标题
通过张量学习预测多维数据
Predicting Multidimensional Data via Tensor Learning
论文作者
论文摘要
多维数据的分析正在成为统计和机器学习研究中越来越相关的主题。鉴于它们的复杂性,这些数据对象通常被重塑为矩阵或向量,然后分析。但是,该方法列出了几个缺点。首先,它破坏了多维空间中数据点之间的固有互连,其次,在模型中要估计的参数数量会呈指数增加。我们开发了一个克服此类缺点的模型。特别是,在本文中,我们提出了一个简约的张量回归模型,该模型保留了数据集的内在多维结构。塔克结构被用来实现简约性,并引入了收缩惩罚来处理过度拟合和共线性。为了估计模型参数,开发了交替的最小二乘算法。为了验证模型性能和鲁棒性,会产生模拟练习。此外,我们进行了经验分析,该分析强调了模型在基准模型方面的预测能力。这是通过在Foursquares时空数据集和宏观经济面板数据集上实现自回归规范来实现的。总体而言,所提出的模型能够超过预测文献中存在的基准模型。
The analysis of multidimensional data is becoming a more and more relevant topic in statistical and machine learning research. Given their complexity, such data objects are usually reshaped into matrices or vectors and then analysed. However, this methodology presents several drawbacks. First of all, it destroys the intrinsic interconnections among datapoints in the multidimensional space and, secondly, the number of parameters to be estimated in a model increases exponentially. We develop a model that overcomes such drawbacks. In particular, in this paper, we propose a parsimonious tensor regression model that retains the intrinsic multidimensional structure of the dataset. Tucker structure is employed to achieve parsimony and a shrinkage penalization is introduced to deal with over-fitting and collinearity. To estimate the model parameters, an Alternating Least Squares algorithm is developed. In order to validate the model performance and robustness, a simulation exercise is produced. Moreover, we perform an empirical analysis that highlight the forecasting power of the model with respect to benchmark models. This is achieved by implementing an autoregressive specification on the Foursquares spatio-temporal dataset together with a macroeconomic panel dataset. Overall, the proposed model is able to outperform benchmark models present in the forecasting literature.