论文标题
自动机:基于梯度的数据子集选择,用于计算高效高参数调整
AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning
论文作者
论文摘要
近年来,深层神经网络取得了巨大的成功。但是,训练深层模型通常具有挑战性,因为其性能在很大程度上取决于使用的超参数。此外,即使使用最先进的(SOTA)高参数优化(HPO)算法,找到最佳的超参数配置也可能是耗时的,需要在整个数据集上进行多次培训,以进行不同的可能的超参数集。我们的中心洞察力是,使用数据集的信息子集进行模型训练涉及超参数优化的模型运行,使我们能够找到最佳的超参数配置明显更快。在这项工作中,我们提出了Automata,这是一个基于梯度的子集选择框架,用于高参数调整。我们通过在文本,视觉和表格域中的现实世界数据集上进行多个实验,通过经验评估自动机在高参数调整中的有效性。我们的实验表明,使用基于梯度的数据子集进行超参数调谐,可以明显更快地实现周转时间和3 $ \ times $ -30 $ \ times $ $的速度,同时与使用整个数据集发现的超级参数相当。
Deep neural networks have seen great success in recent years; however, training a deep model is often challenging as its performance heavily depends on the hyper-parameters used. In addition, finding the optimal hyper-parameter configuration, even with state-of-the-art (SOTA) hyper-parameter optimization (HPO) algorithms, can be time-consuming, requiring multiple training runs over the entire dataset for different possible sets of hyper-parameters. Our central insight is that using an informative subset of the dataset for model training runs involved in hyper-parameter optimization, allows us to find the optimal hyper-parameter configuration significantly faster. In this work, we propose AUTOMATA, a gradient-based subset selection framework for hyper-parameter tuning. We empirically evaluate the effectiveness of AUTOMATA in hyper-parameter tuning through several experiments on real-world datasets in the text, vision, and tabular domains. Our experiments show that using gradient-based data subsets for hyper-parameter tuning achieves significantly faster turnaround times and speedups of 3$\times$-30$\times$ while achieving comparable performance to the hyper-parameters found using the entire dataset.