论文标题
时间序列预测的语言方法
Linguistic Approach to Time Series Forecasting
论文作者
论文摘要
本文提出了基于语言方法预测动态时间序列(包括非平稳的时间序列)的方法,即对事件的研究和重复所谓的N-grams的研究。这种方法用于计算语言学来创建统计翻译,检测窃和重复文档。但是,通过考虑稳定的单词组合序列以及趋势的相关性,可以将应用范围扩展到语言学之外。所提出的方法不需要初步研究和确定时间序列的特征或预测模型输入参数的复杂调整。它们具有高水平的自动化,可以进行时间序列的短期和中期预测,其特征是趋势和周期性,尤其是内容监视系统中的一系列出版物动态。同样,提出的方法可用于预测大型复杂系统的参数的值,目的是监视其状态,当此类参数的数量显着,因此需要高度的预测过程自动化。该方法的一个重要优点是没有时间序列平稳性和少量调整参数的要求。进一步的研究可能着重于研究时间序列片段相似性的各种标准,非线性相似性标准的使用,寻找自动确定时间序列量化的合理步骤的方法。
This paper proposes methods of predicting dynamic time series (including non-stationary ones) based on a linguistic approach, namely, the study of occurrences and repetition of so-called N-grams. This approach is used in computational linguistics to create statistical translators, detect plagiarism and duplicate documents. However, the scope of application can be extended beyond linguistics by taking into account the correlations of sequences of stable word combinations, as well as trends. The proposed methods do not require a preliminary study and determination of the characteristics of time series or complex tuning of the input parameters of the forecasting model. They allow, with a high level of automation, to carry out short-term and medium-term forecasts of time series, characterized by trends and cyclicality, in particular, series of publication dynamics in content monitoring systems. Also, the proposed methods can be used to predict the values of the parameters of a large complex system with the aim of monitoring its state, when the number of such parameters is significant, and therefore a high level of automation of the forecasting process is desirable. A significant advantage of the approach is the absence of requirements for time series stationarity and a small number of tuning parameters. Further research may focus on the study of various criteria for the similarity of time series fragments, the use of nonlinear similarity criteria, the search for ways to automatically determine the rational step of quantization of the time series.