论文标题
连续时间事件序列的概率查询
Probabilistic Querying of Continuous-Time Event Sequences
论文作者
论文摘要
连续时间事件序列,即由连续时间戳记和相关事件类型组成的序列(“标记”)是一种重要类型的顺序数据类型,例如,在临床医学或用户行为模型中,包括许多应用程序。由于这些数据通常是对自动调查进行建模的(例如,使用神经鹰队的过程或其经典同行),因此自然地询问有关未来场景的问题,例如“下一步会发生什么样的事件”或“将在$ b $中发生的一个类型$ a $ a $ a $ a $ a $发生”。不幸的是,众所周知,这些查询中的一些很难解决,因为当前方法仅限于幼稚的模拟,这可能是高效的。本文介绍了查询类型的新类型,以及使用重要性抽样来解决它们的框架。示例查询包括预测$ n^\ text {th} $ event键入序列和一个或多个事件类型的击中时间分布。我们还利用这些发现进一步适用于估算一般“ $ a $ $ b $之前”的查询类型的查询。从理论上讲,我们证明我们的估计方法总是比幼稚的模拟更好,并且基于三个现实世界数据集的经验表明,其效率平均比现有方法高1000倍。
Continuous-time event sequences, i.e., sequences consisting of continuous time stamps and associated event types ("marks"), are an important type of sequential data with many applications, e.g., in clinical medicine or user behavior modeling. Since these data are typically modeled autoregressively (e.g., using neural Hawkes processes or their classical counterparts), it is natural to ask questions about future scenarios such as "what kind of event will occur next" or "will an event of type $A$ occur before one of type $B$". Unfortunately, some of these queries are notoriously hard to address since current methods are limited to naive simulation, which can be highly inefficient. This paper introduces a new typology of query types and a framework for addressing them using importance sampling. Example queries include predicting the $n^\text{th}$ event type in a sequence and the hitting time distribution of one or more event types. We also leverage these findings further to be applicable for estimating general "$A$ before $B$" type of queries. We prove theoretically that our estimation method is effectively always better than naive simulation and show empirically based on three real-world datasets that it is on average 1,000 times more efficient than existing approaches.