论文标题
与重复观察的数据的无渐近分布更改点检测
Asymptotic distribution-free change-point detection for data with repeated observations
论文作者
论文摘要
在变更点检测的制度中,基于扫描统计数据的非参数框架使用代表观测值相似性的图形的统计数据,由于其灵活性和良好的性能以及在这个大数据时代无处不在的高维和非欧基人数据序列的良好性能。但是,当序列中有重复的观察结果时,这种基于图的框架会遇到问题,而这些框架通常会发生在离散数据(例如网络数据)中。在这项工作中,我们扩展了基于图形的框架来通过平均或将所有可能的最佳图形结合来解决此问题。我们同时考虑单个变更点的替代方案和更换间隙的替代方案,并得出分析公式以控制新方法的类型I错误,从而使其快速适用于大型数据集。随着时间的推移,在检测一系列动态网络的变化时,在应用程序上说明了扩展方法。所有提出的方法均在CRAN上可用的R软件包中实现。
In the regime of change-point detection, a nonparametric framework based on scan statistics utilizing graphs representing similarities among observations is gaining attention due to its flexibility and good performances for high-dimensional and non-Euclidean data sequences, which are ubiquitous in this big data era. However, this graph-based framework encounters problems when there are repeated observations in the sequence, which often happens for discrete data, such as network data. In this work, we extend the graph-based framework to solve this problem by averaging or taking union of all possible optimal graphs resulted from repeated observations. We consider both the single change-point alternative and the changed-interval alternative, and derive analytic formulas to control the type I error for the new methods, making them fast applicable to large datasets. The extended methods are illustrated on an application in detecting changes in a sequence of dynamic networks over time. All proposed methods are implemented in an R package gSeg available on CRAN.