论文标题

时间段:Twitter的直接语言模型

TimeLMs: Diachronic Language Models from Twitter

论文作者

Loureiro, Daniel, Barbieri, Francesco, Neves, Leonardo, Anke, Luis Espinosa, Camacho-Collados, Jose

论文摘要

尽管它很重要,但在NLP和语言模型文献中,时间变量在很大程度上被忽略了。在本文中,我们提出时间表,这是一组专门用于简介Twitter数据的语言模型。我们表明,一种持续的学习策略有助于增强基于Twitter的语言模型的能力,以应对未来和分布的推文,同时使它们与标准化和更单一的基准测试竞争。我们还进行了许多定性分析,以表明它们如何应对涉及特定命名实体或概念漂移的活动的趋势和峰值。

Despite its importance, the time variable has been largely neglected in the NLP and language model literature. In this paper, we present TimeLMs, a set of language models specialized on diachronic Twitter data. We show that a continual learning strategy contributes to enhancing Twitter-based language models' capacity to deal with future and out-of-distribution tweets, while making them competitive with standardized and more monolithic benchmarks. We also perform a number of qualitative analyses showing how they cope with trends and peaks in activity involving specific named entities or concept drift.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源