论文标题
基于内核的大量数据分析
Kernel based analysis of massive data
论文作者
论文摘要
处理大量数据是机器学习的一项具有挑战性的任务。机器学习的一个重要方面是功能近似。在大量数据的背景下,用于此目的的一些常用工具是稀疏,分裂和分布式学习。在本文中,我们开发了一个非常通用的网络近似理论,我们称之为以eignet的方式,以实现局部分层的近似。数据的非常巨大的性质使我们能够使用这些以前的问题来解决反问题,例如找到控制数据的概率定律良好的近似值,并在域的不同点附近找到目标函数的局部平滑度。实际上,我们使用我们的以前的人来开发类似小波的表示。我们的理论适用于一般紧凑的度量测量空间上的近似值。特殊示例包括在圆环上按周期性函数,欧几里得球上的区域函数网络(包括光滑的relu网络),高斯网络以及歧管上的近似值进行近似。我们构建预制的网络,因此近似不需要基于数据的培训。
Dealing with massive data is a challenging task for machine learning. An important aspect of machine learning is function approximation. In the context of massive data, some of the commonly used tools for this purpose are sparsity, divide-and-conquer, and distributed learning. In this paper, we develop a very general theory of approximation by networks, which we have called eignets, to achieve local, stratified approximation. The very massive nature of the data allows us to use these eignets to solve inverse problems such as finding a good approximation to the probability law that governs the data, and finding the local smoothness of the target function near different points in the domain. In fact, we develop a wavelet-like representation using our eignets. Our theory is applicable to approximation on a general locally compact metric measure space. Special examples include approximation by periodic basis functions on the torus, zonal function networks on a Euclidean sphere (including smooth ReLU networks), Gaussian networks, and approximation on manifolds. We construct pre-fabricated networks so that no data-based training is required for the approximation.