基于集的随机采样

论文标题

基于集的随机采样

Set Based Stochastic Subsampling

论文作者

Andreis, Bruno, Lee, Seanie, Nguyen, A. Tuan, Lee, Juho, Yang, Eunho, Hwang, Sung Ju

论文摘要

深层模型旨在在大量高维数据（例如图像）上运行。为了减少这些模型必须处理的数据量，我们提出了一个基于集合的两阶段端到端神经子采样模型，该模型通过\ textit {nutionary}下游任务网络（例如分类器）共同优化。在第一阶段，我们使用有条件独立的Bernoulli随机变量有效地将\ textIt {候选元素}通过使用设置的编码功能捕获粗粒度的全局信息，然后通过使用第二阶段的set sect apection网络对候选元素进行分类随机变量的相互作用进行有条件依赖性的自动化元素。我们将我们的方法应用于特征和实例选择，并证明它在低亚采样率下的各种任务上的相关基准在包括图像分类，图像重建，功能重建和几乎没有弹头分类的情况下优于相关基准。此外，对于需要在推理时间利用整个训练数据的神经过程等非参数模型，我们表明我们的方法可以增强这些模型的可扩展性。

Deep models are designed to operate on huge volumes of high dimensional data such as images. In order to reduce the volume of data these models must process, we propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an \textit{arbitrary} downstream task network (e.g. classifier). In the first stage, we efficiently subsample \textit{candidate elements} using conditionally independent Bernoulli random variables by capturing coarse grained global information using set encoding functions, followed by conditionally dependent autoregressive subsampling of the candidate elements using Categorical random variables by modeling pair-wise interactions using set attention networks in the second stage. We apply our method to feature and instance selection and show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification. Additionally, for nonparametric models such as Neural Processes that require to leverage the whole training data at inference time, we show that our method enhances the scalability of these models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题