投票潜在意见：一种使用变压器语言模型的计算社会语言学的方法

论文标题

投票潜在意见：一种使用变压器语言模型的计算社会语言学的方法

Polling Latent Opinions: A Method for Computational Sociolinguistics Using Transformer Language Models

论文作者

Feldman, Philip, Dant, Aaron, Foulds, James R., Pan, Shemei

论文摘要

对情感，主题分析和其他分析的社交媒体的文本分析最初取决于关键字和短语的选择，这些关键字和短语将用于创建研究语料库。但是，研究人员选择的关键字可能很少发生，从而导致使用小样本引起的错误。在本文中，我们利用能力进行记忆，插值和推断变压器语言模型（例如GPT系列），以学习Yelp评论较大的Corpora中亚组的语言行为。然后，我们使用基于及时的查询来生成可以分析的合成文本，以对培训模型培训的人群持有的特定意见产生见解。一旦学习，与传统的关键字搜索相比，可以用高度准确的模型进行更具体的情感查询。我们表明，即使在培训语料库中特定的钥匙酶有限或根本不存在的情况下，GPT也能够准确地产生大量具有正确情感的文本。

Text analysis of social media for sentiment, topic analysis, and other analysis depends initially on the selection of keywords and phrases that will be used to create the research corpora. However, keywords that researchers choose may occur infrequently, leading to errors that arise from using small samples. In this paper, we use the capacity for memorization, interpolation, and extrapolation of Transformer Language Models such as the GPT series to learn the linguistic behaviors of a subgroup within larger corpora of Yelp reviews. We then use prompt-based queries to generate synthetic text that can be analyzed to produce insights into specific opinions held by the populations that the models were trained on. Once learned, more specific sentiment queries can be made of the model with high levels of accuracy when compared to traditional keyword searches. We show that even in cases where a specific keyphrase is limited or not present at all in the training corpora, the GPT is able to accurately generate large volumes of text that have the correct sentiment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题