论文标题

阿富汗不良事件的预测:时间序列数据的回归分析不是由地理依赖性分组的

Prediction of adverse events in Afghanistan: regression analysis of time series data grouped not by geographic dependencies

论文作者

Fiok, Krzysztof, Karwowski, Waldemar, Wilamowski, Maciej

论文摘要

这项研究的目的是在高度不平衡的数据上处理有关阿富汗战争剧院的艰难回归任务。我们的重点是预测负面事件的数字,而没有区分鉴于预定的400个阿富汗地区的投资和负面事件的历史数据的确切性质。与以前关于此问题的研究相反,我们提出了一种分析时间序列数据的方法,该方法受益于这些领土实体的非规定聚集。通过进行初始的探索数据分析,我们证明,根据我们的建议对数据进行划分,可以在所选目标变量中识别强大的趋势和季节性组件。利用这种方法,我们还试图估算哪些有关投资的数据对于预测绩效最重要。基于我们的探索性分析和先前的研究,我们准备了5组自变量,这些变量被馈送到3个机器学习回归模型。通过平均绝对和平方误差表示的结果表明,利用有关目标变量的历史数据允许合理的性能,但是不幸的是,其他提出的自变量似乎并不能提高预测质量。

The aim of this study was to approach a difficult regression task on highly unbalanced data regarding active theater of war in Afghanistan. Our focus was set on predicting the negative events number without distinguishing precise nature of the events given historical data on investment and negative events per each of predefined 400 Afghanistan districts. In contrast with previous research on the matter, we propose an approach to analysis of time series data that benefits from non-conventional aggregation of these territorial entities. By carrying out initial exploratory data analysis we demonstrate that dividing data according to our proposal allows to identify strong trend and seasonal components in the selected target variable. Utilizing this approach we also tried to estimate which data regarding investments is most important for prediction performance. Based on our exploratory analysis and previous research we prepared 5 sets of independent variables that were fed to 3 machine learning regression models. The results expressed by mean absolute and mean square errors indicate that leveraging historical data regarding target variable allows for reasonable performance, however unfortunately other proposed independent variables does not seem to improve prediction quality.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源