论文标题

小型移动传感数据集中的自我监督预训练和转移学习使流感和共证的预测

Self-supervised Pretraining and Transfer Learning Enable Flu and COVID-19 Predictions in Small Mobile Sensing Datasets

论文作者

Merrill, Mike A., Althoff, Tim

论文摘要

来自手机,手表和健身追踪器的详细移动传感数据提供了无与伦比的机会来量化和采取以前无法衡量的行为改变,以改善个人健康状况并加速对新兴疾病的反应。与自然语言处理和计算机视觉不同,深层表示学习尚未广泛影响该领域,在该领域中,绝大多数研究和临床应用仍然依赖手动定义的特征和增强的树型,甚至完全放弃了预测性建模,因此由于准确性不足。这是由于行为健康领域中的独特挑战,包括非常小的数据集(〜10^1参与者),这些数据集经常包含缺失的数据,包括具有关键的长期依赖性(长度> 10^4)的长期序列和极端类失衡(> 10^3:1)。在这里,我们介绍了一种用于多元时间序列分类的神经体系结构,旨在应对这些独特的领域挑战。我们提出的行为表示学习方法结合了自我监督预处理和转移学习以解决数据稀缺的新任务,并通过变压器的自我发作来捕获长时间的长时间依赖性,以卷积神经网络的降低降低。我们提出了一个评估框架,旨在反映合理的部署方案中的预期现实性能。具体而言,我们证明了(1)在五个预测任务中,基线对高达0.15 ROC AUC的性能提高,(2)在小数据场景中转移学习诱导的16%PR AUC的性能提高,以及(3)通过在独立于19个独立数据预测中的零份量covid covid covid-coovid coovid covid covid covientiact fromporatival案例研究中,在新型疾病场景中转移学习的潜力。最后,我们讨论了对医疗监视测试的潜在影响。

Detailed mobile sensing data from phones, watches, and fitness trackers offer an unparalleled opportunity to quantify and act upon previously unmeasurable behavioral changes in order to improve individual health and accelerate responses to emerging diseases. Unlike in natural language processing and computer vision, deep representation learning has yet to broadly impact this domain, in which the vast majority of research and clinical applications still rely on manually defined features and boosted tree models or even forgo predictive modeling altogether due to insufficient accuracy. This is due to unique challenges in the behavioral health domain, including very small datasets (~10^1 participants), which frequently contain missing data, consist of long time series with critical long-range dependencies (length>10^4), and extreme class imbalances (>10^3:1). Here, we introduce a neural architecture for multivariate time series classification designed to address these unique domain challenges. Our proposed behavioral representation learning approach combines novel tasks for self-supervised pretraining and transfer learning to address data scarcity, and captures long-range dependencies across long-history time series through transformer self-attention following convolutional neural network-based dimensionality reduction. We propose an evaluation framework aimed at reflecting expected real-world performance in plausible deployment scenarios. Concretely, we demonstrate (1) performance improvements over baselines of up to 0.15 ROC AUC across five prediction tasks, (2) transfer learning-induced performance improvements of 16% PR AUC in small data scenarios, and (3) the potential of transfer learning in novel disease scenarios through an exploratory case study of zero-shot COVID-19 prediction in an independent data set. Finally, we discuss potential implications for medical surveillance testing.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源