翻转教室：有效的时间序列教学预测

论文标题

翻转教室：有效的时间序列教学预测

Flipped Classroom: Effective Teaching for Time Series Forecasting

论文作者

Teutsch, Philipp, Mäder, Patrick

论文摘要

基于LSTM和GRU的序列到序列模型是预测时间序列数据达到最新性能的最受欢迎的选择。培训这样的模型可能会很微妙。在这种情况下，最常见的培训策略是教师强迫（TF）和免费跑步（FR）。 TF可用于帮助模型更快地收敛，但由于训练和推理阶段之间的差异，可能会引起暴露偏见问题。 FR有助于避免这种情况，但不一定会带来更好的结果，因为它往往会使训练缓慢而不稳定。预定的抽样是第一种解决这些问题的方法，即从两者兼而有之并将其结合到课程学习（CL）策略中。尽管预定的采样似乎是FR和TF的令人信服的替代方法，但我们发现，即使仔细参数化，安排的采样可能会导致培训的时间序列预测时的培训过早终止。为了减轻上述方法的问题，我们在培训中正式化策略以及训练迭代量表。我们提出了几个新课程，并在两个实验组中系统地评估了它们的性能。对于我们的实验，我们利用了从突出的混沌系统产生的六个数据集。我们发现，通过概率迭代量表课程的新提出的增加培训量表课程始终优于以前的培训策略，在FR或TF培训中，NRMSE提高了多达81％的NRMSE。对于某些数据集，我们还观察到训练迭代次数减少。我们观察到，所有接受新课程训练的模型产生较高的预测稳定性，从而可以延长预测范围。

Sequence-to-sequence models based on LSTM and GRU are a most popular choice for forecasting time series data reaching state-of-the-art performance. Training such models can be delicate though. The two most common training strategies within this context are teacher forcing (TF) and free running (FR). TF can be used to help the model to converge faster but may provoke an exposure bias issue due to a discrepancy between training and inference phase. FR helps to avoid this but does not necessarily lead to better results, since it tends to make the training slow and unstable instead. Scheduled sampling was the first approach tackling these issues by picking the best from both worlds and combining it into a curriculum learning (CL) strategy. Although scheduled sampling seems to be a convincing alternative to FR and TF, we found that, even if parametrized carefully, scheduled sampling may lead to premature termination of the training when applied for time series forecasting. To mitigate the problems of the above approaches we formalize CL strategies along the training as well as the training iteration scale. We propose several new curricula, and systematically evaluate their performance in two experimental sets. For our experiments, we utilize six datasets generated from prominent chaotic systems. We found that the newly proposed increasing training scale curricula with a probabilistic iteration scale curriculum consistently outperforms previous training strategies yielding an NRMSE improvement of up to 81% over FR or TF training. For some datasets we additionally observe a reduced number of training iterations. We observed that all models trained with the new curricula yield higher prediction stability allowing for longer prediction horizons.

下载PDF全文

下载文献需遵守相关版权规定

论文标题