论文标题
CR-LSO:带有输入凸神经网络的图形自动编码器潜在空间中的凸神经结构优化
CR-LSO: Convex Neural Architecture Optimization in the Latent Space of Graph Variational Autoencoder with Input Convex Neural Networks
论文作者
论文摘要
在基于潜在空间优化(LSO)的神经体系结构搜索(NAS)方法中,对深层生成模型进行了训练,可以将离散的神经体系结构嵌入连续的潜在空间中。在这种情况下,可以在连续空间中运行的不同优化算法可以实现以搜索神经体系结构。但是,潜在变量的优化对于基于梯度的LSO来说是具有挑战性的,因为从潜在空间到体系结构性能的映射通常是非凸面。为了解决此问题,本文开发了一个凸的正规潜在空间优化(CR-LSO)方法,该方法旨在使潜在空间的学习过程正常,以便获得凸体架构性能映射。具体而言,CR-LSO训练图形变分自动编码器(G-VAE)学习离散体系结构的连续表示。同时,潜在空间的学习过程是通过输入凸神经网络(ICNN)的保证凸度正规化的。这样,G-VAE被迫学习从体系结构表示形式到体系结构性能的凸映射。此后,CR-LSO使用ICNN近似性能映射,并利用估计的梯度来优化神经体系结构表示。三个流行的NAS基准测试的实验结果表明,CR-LSO在计算复杂性和体系结构性能方面都能达到竞争性评估结果。
In neural architecture search (NAS) methods based on latent space optimization (LSO), a deep generative model is trained to embed discrete neural architectures into a continuous latent space. In this case, different optimization algorithms that operate in the continuous space can be implemented to search neural architectures. However, the optimization of latent variables is challenging for gradient-based LSO since the mapping from the latent space to the architecture performance is generally non-convex. To tackle this problem, this paper develops a convexity regularized latent space optimization (CR-LSO) method, which aims to regularize the learning process of latent space in order to obtain a convex architecture performance mapping. Specifically, CR-LSO trains a graph variational autoencoder (G-VAE) to learn the continuous representations of discrete architectures. Simultaneously, the learning process of latent space is regularized by the guaranteed convexity of input convex neural networks (ICNNs). In this way, the G-VAE is forced to learn a convex mapping from the architecture representation to the architecture performance. Hereafter, the CR-LSO approximates the performance mapping using the ICNN and leverages the estimated gradient to optimize neural architecture representations. Experimental results on three popular NAS benchmarks show that CR-LSO achieves competitive evaluation results in terms of both computational complexity and architecture performance.