论文标题
信息传播中的互动
Interactions in Information Spread
论文作者
论文摘要
自5000年前写作的发展以来,人类生成的数据以不断增长的速度产生。旨在放松信息检索的古典档案方法。如今,归档还不够了。每天生成的数据量超出了人类的理解,并呼吁新信息检索策略。与其将每一个数据件称为传统档案技术中,而是将一种更相关的方法组成,在理解数据流中传达的总体思想。要发现这种一般趋势,需要对基本数据生成机制的精确理解。在解决这个问题的丰富文献中,信息互动的问题几乎尚未探索。首先,我们研究了这种相互作用的频率。在随机块建模中取得的最新进展的基础上,我们探索了几个社交网络中互动的作用。我们发现在这些数据集中相互作用很少。然后,我们想知道互动如何随着时间的流逝而发展。较早的数据作品不应对别有用心的数据生成机制产生永恒的影响。我们使用动态网络推理进步对此进行建模。我们得出的结论是简短的。最后,我们设计了一个框架,该框架将基于Dirichlet-Hawkes过程的稀有和简短互动进行建模。我们认为,这种新的模型适合简短而稀疏的交互作用建模。我们在Reddit上进行了大规模应用,发现交互在此数据集中起着较小的作用。从更广泛的角度来看,我们的工作导致了一系列高度灵活的模型,并重新思考了机器学习的核心概念。因此,我们在现实世界的应用以及对机器学习的技术贡献方面都开设了一系列新颖的观点。
Since the development of writing 5000 years ago, human-generated data gets produced at an ever-increasing pace. Classical archival methods aimed at easing information retrieval. Nowadays, archiving is not enough anymore. The amount of data that gets generated daily is beyond human comprehension, and appeals for new information retrieval strategies. Instead of referencing every single data piece as in traditional archival techniques, a more relevant approach consists in understanding the overall ideas conveyed in data flows. To spot such general tendencies, a precise comprehension of the underlying data generation mechanisms is required. In the rich literature tackling this problem, the question of information interaction remains nearly unexplored. First, we investigate the frequency of such interactions. Building on recent advances made in Stochastic Block Modelling, we explore the role of interactions in several social networks. We find that interactions are rare in these datasets. Then, we wonder how interactions evolve over time. Earlier data pieces should not have an everlasting influence on ulterior data generation mechanisms. We model this using dynamic network inference advances. We conclude that interactions are brief. Finally, we design a framework that jointly models rare and brief interactions based on Dirichlet-Hawkes Processes. We argue that this new class of models fits brief and sparse interaction modelling. We conduct a large-scale application on Reddit and find that interactions play a minor role in this dataset. From a broader perspective, our work results in a collection of highly flexible models and in a rethinking of core concepts of machine learning. Consequently, we open a range of novel perspectives both in terms of real-world applications and in terms of technical contributions to machine learning.