时间段：Twitter的直接语言模型

论文标题

时间段：Twitter的直接语言模型

TimeLMs: Diachronic Language Models from Twitter

论文作者

Loureiro, Daniel, Barbieri, Francesco, Neves, Leonardo, Anke, Luis Espinosa, Camacho-Collados, Jose

论文摘要

尽管它很重要，但在NLP和语言模型文献中，时间变量在很大程度上被忽略了。在本文中，我们提出时间表，这是一组专门用于简介Twitter数据的语言模型。我们表明，一种持续的学习策略有助于增强基于Twitter的语言模型的能力，以应对未来和分布的推文，同时使它们与标准化和更单一的基准测试竞争。我们还进行了许多定性分析，以表明它们如何应对涉及特定命名实体或概念漂移的活动的趋势和峰值。

Despite its importance, the time variable has been largely neglected in the NLP and language model literature. In this paper, we present TimeLMs, a set of language models specialized on diachronic Twitter data. We show that a continual learning strategy contributes to enhancing Twitter-based language models' capacity to deal with future and out-of-distribution tweets, while making them competitive with standardized and more monolithic benchmarks. We also perform a number of qualitative analyses showing how they cope with trends and peaks in activity involving specific named entities or concept drift.

下载PDF全文

下载文献需遵守相关版权规定

论文标题