GHN-Q：通过图Hypernetworks的看不见的量化卷积架构的参数预测

论文标题

GHN-Q：通过图Hypernetworks的看不见的量化卷积架构的参数预测

GHN-Q: Parameter Prediction for Unseen Quantized Convolutional Architectures via Graph Hypernetworks

论文作者

Yun, Stone, Wong, Alexander

论文摘要

通过迭代优化的深度卷积神经网络（CNN）培训在寻找最佳参数方面取得了令人难以置信的成功。但是，现代CNN架构通常包含数百万个参数。因此，单个体系结构的任何给定模型都位于一个庞大的参数空间中。具有相似损失的模型可能具有截然不同的特征，例如对抗性鲁棒性，概括性和量化鲁棒性。对于边缘的深度学习，量化鲁棒性通常至关重要。找到一个量化的模型有时可能需要大量的努力。使用Graph Hypernetworks（GHN）的最新作品显示出了出色的性能，可预测CNN体系结构的高性能参数。受这些成功的启发，我们想知道是否还可以利用GHN-2的图表来预测量化 - 射击参数，我们称为GHN-Q。我们进行了有史以来的第一项研究，探讨了图形超网的使用来预测看不见的量化CNN架构的参数。我们专注于减少的CNN搜索空间，并发现GHN-Q实际上可以预测各种8位量化CNN的量化量子参数。即使没有对其进行训练，即使在4位量化的情况下，也可以观察到不错的量化精度。在较低位宽度上量化GHN-Q的固定可能会带来进一步的改进，目前正在探索。

Deep convolutional neural network (CNN) training via iterative optimization has had incredible success in finding optimal parameters. However, modern CNN architectures often contain millions of parameters. Thus, any given model for a single architecture resides in a massive parameter space. Models with similar loss could have drastically different characteristics such as adversarial robustness, generalizability, and quantization robustness. For deep learning on the edge, quantization robustness is often crucial. Finding a model that is quantization-robust can sometimes require significant efforts. Recent works using Graph Hypernetworks (GHN) have shown remarkable performance predicting high-performant parameters of varying CNN architectures. Inspired by these successes, we wonder if the graph representations of GHN-2 can be leveraged to predict quantization-robust parameters as well, which we call GHN-Q. We conduct the first-ever study exploring the use of graph hypernetworks for predicting parameters of unseen quantized CNN architectures. We focus on a reduced CNN search space and find that GHN-Q can in fact predict quantization-robust parameters for various 8-bit quantized CNNs. Decent quantized accuracies are observed even with 4-bit quantization despite GHN-Q not being trained on it. Quantized finetuning of GHN-Q at lower bitwidths may bring further improvements and is currently being explored.

下载PDF全文

下载文献需遵守相关版权规定

论文标题