论文标题

双向对抗训练的神经主题建模

Neural Topic Modeling with Bidirectional Adversarial Training

论文作者

Wang, Rui, Hu, Xuemeng, Zhou, Deyu, He, Yulan, Xiong, Yuxuan, Ye, Chenchen, Xu, Haiyang

论文摘要

近年来,使用神经主题模型从文本中提取自动主题的兴趣激增,因为它们避免了模型推断的复杂数学推导,例如在传统主题模型中,例如潜在的dirichlet分配(LDA)。但是,这些模型通常假设潜在主题空间上的先验不当(例如高斯或逻辑正常),或者无法推断给定文档的主题分布。为了解决这些局限性,我们提出了一种神经主题建模方法,称为双向对抗性主题(BAT)模型,该模型代表了将双向对抗训练应用于神经主题建模的首次尝试。拟议的蝙蝠在文档主题分布和文档字分布之间建立了双向预测。它使用发电机来捕获文本的语义模式和主题推断的编码器。此外,为了结合单词相关性信息,与高斯(Gaussian Bat)(高斯蝙蝠)的双向对抗性主题模型从BAT延伸。为了验证蝙蝠和高斯蝙蝠的有效性,我们的实验中使用了三个基准语料库。实验结果表明,蝙蝠和高斯蝙蝠获得了更连贯的主题,表现优于几个竞争基线。此外,在基于提取的主题进行文本聚类时,我们的模型的表现优于所有基准,高斯蝙蝠在准确性上观察到了接近6 \%的增加而取得了更大的改进。

Recent years have witnessed a surge of interests of using neural topic models for automatic topic extraction from text, since they avoid the complicated mathematical derivations for model inference as in traditional topic models such as Latent Dirichlet Allocation (LDA). However, these models either typically assume improper prior (e.g. Gaussian or Logistic Normal) over latent topic space or could not infer topic distribution for a given document. To address these limitations, we propose a neural topic modeling approach, called Bidirectional Adversarial Topic (BAT) model, which represents the first attempt of applying bidirectional adversarial training for neural topic modeling. The proposed BAT builds a two-way projection between the document-topic distribution and the document-word distribution. It uses a generator to capture the semantic patterns from texts and an encoder for topic inference. Furthermore, to incorporate word relatedness information, the Bidirectional Adversarial Topic model with Gaussian (Gaussian-BAT) is extended from BAT. To verify the effectiveness of BAT and Gaussian-BAT, three benchmark corpora are used in our experiments. The experimental results show that BAT and Gaussian-BAT obtain more coherent topics, outperforming several competitive baselines. Moreover, when performing text clustering based on the extracted topics, our models outperform all the baselines, with more significant improvements achieved by Gaussian-BAT where an increase of near 6\% is observed in accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源