论文标题

预测COVID-19使用无监督的社交媒体帖子群的​​插座

Forecasting COVID-19 Caseloads Using Unsupervised Embedding Clusters of Social Media Posts

论文作者

Drinkall, Felix, Zohren, Stefan, Pierrehumbert, Janet B.

论文摘要

我们提出了一种新颖的方法,将基于变压器的语言模型纳入传染病模型。文本衍生的特征通过跟踪在特定美国州的COVID-19子reddits中的Reddit帖子的句子级表示的高密度簇来量化。我们根据其他高质量数据集提取的功能对这些聚类的嵌入功能进行基准测试。在阈值分类任务中,我们表明它们在预测向上趋势信号方面的表现优于所有其他特征类型,这是流行病学数据不可靠的地区传染病建模的重要结果。随后,在时间序列预测任务中,我们充分利用了案件的预测能力,并比较在基于变压器的时间序列模型中使用不同补充数据集作为协变量集的相对强度。

We present a novel approach incorporating transformer-based language models into infectious disease modelling. Text-derived features are quantified by tracking high-density clusters of sentence-level representations of Reddit posts within specific US states' COVID-19 subreddits. We benchmark these clustered embedding features against features extracted from other high-quality datasets. In a threshold-classification task, we show that they outperform all other feature types at predicting upward trend signals, a significant result for infectious disease modelling in areas where epidemiological data is unreliable. Subsequently, in a time-series forecasting task we fully utilise the predictive power of the caseload and compare the relative strengths of using different supplementary datasets as covariate feature sets in a transformer-based time-series model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源