论文标题

物联网数据流的模式识别和事件检测

Pattern Recognition and Event Detection on IoT Data-streams

论文作者

Karras, Christos, Karras, Aristeidis, Sioutas, Spyros

论文摘要

大数据流可能是最重要的基本观念之一。但是,由于数据流的快速速度和有限的信息寿命,数据流通常具有挑战性。在存储,传输和计算整个流甚至大部分段的函数时,很难收集和传达流样本。为了回答这一研究问题,开发了许多特定于流的解决方案。流技术意味着一个或多个资源的有限容量,例如计算能力和内存以及时间或准确性限制。储层抽样算法选择和存储概率具有重要意义的结果。这项工作的关键研究目标是使用广义采样算法框架来检测独特事件的加权随机抽样方法。简而言之,对所有可行组件的联合流分布的逐渐发展为k型流元素,以判断全流的代表。一旦估计置信度很高,k样品就会均匀地选择。复杂性为O(min(k,n-k)),其中n是所检查的项目数。由于事件通常被视为离群值,因此可以提取元素模式并将其推向此处提出的k-均值的替代版本是足够的。建议的技术计算每个集群的平方误差(SSE)之和,这不仅被用作收敛的度量,而且还用作对元素分布近似准确性的量化和间接评估。这种聚类可以根据与通常的事件质心的距离在流中检测到离群值的检测。研究结果表明,加权采样和重新分配的表现优于流式事件识别的典型方法。检测到的事件显示为知识图,以及典型的事件簇。

Big data streams are possibly one of the most essential underlying notions. However, data streams are often challenging to handle owing to their rapid pace and limited information lifetime. It is difficult to collect and communicate stream samples while storing, transmitting and computing a function across the whole stream or even a large segment of it. In answer to this research issue, many streaming-specific solutions were developed. Stream techniques imply a limited capacity of one or more resources such as computing power and memory, as well as time or accuracy limits. Reservoir sampling algorithms choose and store results that are probabilistically significant. A weighted random sampling approach using a generalised sampling algorithmic framework to detect unique events is the key research goal of this work. Briefly, a gradually developed estimate of the joint stream distribution across all feasible components keeps k stream elements judged representative for the full stream. Once estimate confidence is high, k samples are chosen evenly. The complexity is O(min(k,n-k)), where n is the number of items inspected. Due to the fact that events are usually considered outliers, it is sufficient to extract element patterns and push them to an alternate version of k-means as proposed here. The suggested technique calculates the sum of squared errors (SSE) for each cluster, and this is utilised not only as a measure of convergence, but also as a quantification and an indirect assessment of the element distribution's approximation accuracy. This clustering enables for the detection of outliers in the stream based on their distance from the usual event centroids. The findings reveal that weighted sampling and res-means outperform typical approaches for stream event identification. Detected events are shown as knowledge graphs, along with typical clusters of events.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源