论文标题
当刚性疼痛时:概率分层时间序列预测的柔软一致性正则化
When Rigidity Hurts: Soft Consistency Regularization for Probabilistic Hierarchical Time Series Forecasting
论文作者
论文摘要
概率分层时间序列预测是时间序列预测的重要变体,其目标是建模和预测具有基本层次关系的多元时间序列。大多数方法都集中在点预测上,并且不提供良好的概率预测分布。最近的最新概率预测方法还对点预测和分布样本施加了层次关系,这并不能说明预测分布的相干性。先前的作品还默默地假设数据集始终与给定的层次关系一致,并且不适应显示出与此假设偏离的现实世界数据集。我们弥合了这些差距,并提出了Profhit,这是一个完全概率的层次预测模型,共同模拟整个层次结构的预测分布。 PROFHIT使用一种灵活的概率贝叶斯方法,并引入了一种新颖的分布相干性正规化,以从层次关系中学习整个预测分布,以实现可靠和校准的预测并适应各种层次结构一致性的数据集。在评估广泛数据集的PROFHIT时,我们观察到准确性的性能提高了41-88%,并且明显更好地校准。由于对完整分布的相干性进行了建模,我们观察到,即使缺少多达10%的输入时间序列数据,其他方法的性能会严重降低70%以上,即使最多10%的输入时间序列数据也可以提供可靠的预测。
Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting, where the goal is to model and forecast multivariate time-series that have underlying hierarchical relations. Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions. Recent state-of-art probabilistic forecasting methods also impose hierarchical relations on point predictions and samples of distribution which does not account for coherency of forecast distributions. Previous works also silently assume that datasets are always consistent with given hierarchical relations and do not adapt to real-world datasets that show deviation from this assumption. We close both these gap and propose PROFHiT, which is a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy. PROFHiT uses a flexible probabilistic Bayesian approach and introduces a novel Distributional Coherency regularization to learn from hierarchical relations for entire forecast distribution that enables robust and calibrated forecasts as well as adapt to datasets of varying hierarchical consistency. On evaluating PROFHiT over wide range of datasets, we observed 41-88% better performance in accuracy and significantly better calibration. Due to modeling the coherency over full distribution, we observed that PROFHiT can robustly provide reliable forecasts even if up to 10% of input time-series data is missing where other methods' performance severely degrade by over 70%.