论文标题
确定国际贸易流中潜在维度
Determination of Latent Dimensionality in International Trade Flow
论文作者
论文摘要
当前,高维数据在数据科学中无处不在,这需要开发分解和解释这种多维(又称张量)数据集的技术。找到数据的低维表示,即其固有的结构,是一种可以理解隐藏在数据中的低维潜在功能的动力学的方法之一。非负恢复是一种这样的技术,特别适合分析自相关数据,例如在国际贸易流中发现的动态网络。非阴性恢复通过找到包含多种模态的潜在空间来计算低维张量表示。估计该潜在空间的维度对于提取有意义的潜在特征至关重要。在这里,为了确定通过非负恢复的潜在空间的维度,我们提出了一种潜在的维度确定方法,该方法基于非负重分解的多个实现解决方案的聚类。我们证明了模型选择方法在合成数据上的性能,然后我们应用了我们的方法来分解国际货币基金的国际贸易流量网络,并验证了由经济文献的经验事实验证所产生的特征。
Currently, high-dimensional data is ubiquitous in data science, which necessitates the development of techniques to decompose and interpret such multidimensional (aka tensor) datasets. Finding a low dimensional representation of the data, that is, its inherent structure, is one of the approaches that can serve to understand the dynamics of low dimensional latent features hidden in the data. Nonnegative RESCAL is one such technique, particularly well suited to analyze self-relational data, such as dynamic networks found in international trade flows. Nonnegative RESCAL computes a low dimensional tensor representation by finding the latent space containing multiple modalities. Estimating the dimensionality of this latent space is crucial for extracting meaningful latent features. Here, to determine the dimensionality of the latent space with nonnegative RESCAL, we propose a latent dimension determination method which is based on clustering of the solutions of multiple realizations of nonnegative RESCAL decompositions. We demonstrate the performance of our model selection method on synthetic data and then we apply our method to decompose a network of international trade flows data from International Monetary Fund and validate the resulting features against empirical facts from economic literature.