神经主题建模通过循环一致的对抗训练

论文标题

神经主题建模通过循环一致的对抗训练

Neural Topic Modeling with Cycle-Consistent Adversarial Training

论文作者

Hu, Xuemeng, Wang, Rui, Zhou, Deyu, Xiong, Yuxuan

论文摘要

深层生成模型的进步引起了对神经主题建模的重大研究兴趣。最近提出的对抗性神经主题模型模型主题具有经过对抗训练的发电机网络，并在捕获潜在主题的语义模式之前使用Dirichlet。它有效地发现连贯的主题，但无法推断给定文档的主题分布或使用可用的文档标签。为了克服此类局限性，我们提出了使用自行车矛盾的对抗训练（TOMCAT）及其监督版Stomcat进行主题建模。 Tomcat采用一个发电机网络来解释主题和编码器网络来推断文档主题。对抗性训练和循环一致的约束用于鼓励发电机和编码器产生彼此协调的现实样本。 Stomcat通过将文档标签纳入主题建模过程来帮助发现更多连贯的主题，从而扩展了Tomcat。对无监督/监督主题建模和文本分类进行评估，对所提出的模型的有效性进行了评估。实验结果表明，我们的模型可以产生连贯且内容丰富的主题，从而超过许多竞争性基线。

Advances on deep generative models have attracted significant research interest in neural topic modeling. The recently proposed Adversarial-neural Topic Model models topics with an adversarially trained generator network and employs Dirichlet prior to capture the semantic patterns in latent topics. It is effective in discovering coherent topics but unable to infer topic distributions for given documents or utilize available document labels. To overcome such limitations, we propose Topic Modeling with Cycle-consistent Adversarial Training (ToMCAT) and its supervised version sToMCAT. ToMCAT employs a generator network to interpret topics and an encoder network to infer document topics. Adversarial training and cycle-consistent constraints are used to encourage the generator and the encoder to produce realistic samples that coordinate with each other. sToMCAT extends ToMCAT by incorporating document labels into the topic modeling process to help discover more coherent topics. The effectiveness of the proposed models is evaluated on unsupervised/supervised topic modeling and text classification. The experimental results show that our models can produce both coherent and informative topics, outperforming a number of competitive baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题