WAVEMIX：用于图像分析的资源有效神经网络

论文标题

WAVEMIX：用于图像分析的资源有效神经网络

WaveMix: A Resource-efficient Neural Network for Image Analysis

论文作者

Jeevan, Pranav, Viswanathan, Kavitha, S, Anandu A, Sethi, Amit

论文摘要

我们提出了一种用于计算机视觉的新型神经体系结构 - wavemix-资源有效但可概括且可扩展。在使用较少的可训练参数，GPU RAM和计算中，WAVEMIX网络比最先进的卷积神经网络，视觉变压器和令牌混合器获得了可比或更好的精度。这种效率可以转化为及时，成本和能源的节省。为了实现这些收益，我们使用了Wabemix块中的多层次二维离散小波变换（2D-DWT），具有以下优点：（1）它基于三个强大图像先验的三个强大图像先验的空间信息（1）量表，换档，偏移态度，偏移和范围，而无需添加损失的范围（3），同时又不添加spat septers（2）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）（3）地图减少了向前和向后传递所需的内存和时间，并且（4）比卷积更快地扩展接受场。整个体系结构是一组自相似和分辨率的Wavemix块，它允许为各种任务和资源可用性级别进行体系结构灵活性。 Wavemix建立了新的基准，用于对城市景观进行细分；对于Galaxy 10贴花的分类，365 Place-365，5个Emnist数据集和Inat-Mini，并在其他基准测试中进行了竞争性。我们的代码和训练有素的模型已公开可用。

We propose a novel neural architecture for computer vision -- WaveMix -- that is resource-efficient and yet generalizable and scalable. While using fewer trainable parameters, GPU RAM, and computations, WaveMix networks achieve comparable or better accuracy than the state-of-the-art convolutional neural networks, vision transformers, and token mixers for several tasks. This efficiency can translate to savings in time, cost, and energy. To achieve these gains we used multi-level two-dimensional discrete wavelet transform (2D-DWT) in WaveMix blocks, which has the following advantages: (1) It reorganizes spatial information based on three strong image priors -- scale-invariance, shift-invariance, and sparseness of edges -- (2) in a lossless manner without adding parameters, (3) while also reducing the spatial sizes of feature maps, which reduces the memory and time required for forward and backward passes, and (4) expanding the receptive field faster than convolutions do. The whole architecture is a stack of self-similar and resolution-preserving WaveMix blocks, which allows architectural flexibility for various tasks and levels of resource availability. WaveMix establishes new benchmarks for segmentation on Cityscapes; and for classification on Galaxy 10 DECals, Places-365, five EMNIST datasets, and iNAT-mini and performs competitively on other benchmarks. Our code and trained models are publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题