论文标题
黄油区:完全连接的神经网络中训练动力学的实证研究
The BUTTER Zone: An Empirical Study of Training Dynamics in Fully Connected Neural Networks
论文作者
论文摘要
我们提出了一个经验数据集,该数据集对完全连接的馈送多层感知神经网络上的深度学习现象进行了调查。该数据集现在可以在线免费获得,它记录了4.83亿个不同的架构,任务,深度,网络大小(参数数),学习率,学习率,批次大小和正则惩罚的每类培训和概括性能。重复每个实验平均24倍,导致1100万次训练和400亿个时代记录。累积了1.7 TB数据集,使用了11,000个CPU核心年,72.3 GPU年和163个节点年。在调查数据集时,我们观察到跨任务和拓扑的持久模式。我们旨在激发机器学习技术的科学研究,以此作为催化剂的催化剂,以使该领域超越能源密集型和启发式实践。
We present an empirical dataset surveying the deep learning phenomenon on fully-connected feed-forward multilayer perceptron neural networks. The dataset, which is now freely available online, records the per-epoch training and generalization performance of 483 thousand distinct hyperparameter choices of architectures, tasks, depths, network sizes (number of parameters), learning rates, batch sizes, and regularization penalties. Repeating each experiment an average of 24 times resulted in 11 million total training runs and 40 billion epochs recorded. Accumulating this 1.7 TB dataset utilized 11 thousand CPU core-years, 72.3 GPU-years, and 163 node-years. In surveying the dataset, we observe durable patterns persisting across tasks and topologies. We aim to spark scientific study of machine learning techniques as a catalyst for the theoretical discoveries needed to progress the field beyond energy-intensive and heuristic practices.