社交媒体纵向监测的语义转变问题：Covid-19期间心理健康的案例研究

论文标题

社交媒体纵向监测的语义转变问题：Covid-19期间心理健康的案例研究

The Problem of Semantic Shift in Longitudinal Monitoring of Social Media: A Case Study on Mental Health During the COVID-19 Pandemic

论文作者

Harrigian, Keith, Dredze, Mark

论文摘要

社交媒体使研究人员可以根据语言分析工具来跟踪社会和文化变化。这些工具中的许多工具都依靠统计算法需要调整为特定类型的语言。最近的研究表明，没有适当的调整，特别是在语义转移的情况下，可能会阻碍潜在方法的鲁棒性。但是，对于这种敏感性可能对下游纵向分析的实际影响知之甚少。我们通过及时的案例研究在文献中探讨了这一差距：在199年大流行期间，了解抑郁症的转变。我们发现，仅包含少数语义不稳定的特征可以促进目标结局的纵向估计值的重大变化。同时，我们证明了最近引入的测量语义转移方法可用于主动识别基于语言的模型的失败点，从而改善了预测性概括。

Social media allows researchers to track societal and cultural changes over time based on language analysis tools. Many of these tools rely on statistical algorithms which need to be tuned to specific types of language. Recent studies have shown the absence of appropriate tuning, specifically in the presence of semantic shift, can hinder robustness of the underlying methods. However, little is known about the practical effect this sensitivity may have on downstream longitudinal analyses. We explore this gap in the literature through a timely case study: understanding shifts in depression during the course of the COVID-19 pandemic. We find that inclusion of only a small number of semantically-unstable features can promote significant changes in longitudinal estimates of our target outcome. At the same time, we demonstrate that a recently-introduced method for measuring semantic shift may be used to proactively identify failure points of language-based models and, in turn, improve predictive generalization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题