论文标题

HyperMiner:主题分类挖掘和双曲线嵌入

HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding

论文作者

Xu, Yishi, Wang, Dongsheng, Chen, Bo, Lu, Ruiying, Duan, Zhibin, Zhou, Mingyuan

论文摘要

嵌入式主题模型也能够学习可解释的主题,即使有大型和重尾词汇。但是,它们通常拥有欧几里得嵌入空间假设,从而导致捕获层次关系的基本限制。为此,我们提出了一个新颖的框架,该框架引入了双曲线嵌入以表示单词和主题。借助双曲线空间的树木类属性,可以更好地利用单词和主题之间的基本语义层次结构来开采更容易解释的主题。此外,由于代表层次数据的双曲线几何形状的优越性,也可以自然注入树结构知识以指导主题层次结构的学习。因此,我们基于对比度学习的思想进一步制定正规化术语,以有效地注入先前的结构知识。主题分类发现和文档表示的实验表明,对现有的嵌入式主题模型的性能提高了性能。

Embedded topic models are able to learn interpretable topics even with large and heavy-tailed vocabularies. However, they generally hold the Euclidean embedding space assumption, leading to a basic limitation in capturing hierarchical relations. To this end, we present a novel framework that introduces hyperbolic embeddings to represent words and topics. With the tree-likeness property of hyperbolic space, the underlying semantic hierarchy among words and topics can be better exploited to mine more interpretable topics. Furthermore, due to the superiority of hyperbolic geometry in representing hierarchical data, tree-structure knowledge can also be naturally injected to guide the learning of a topic hierarchy. Therefore, we further develop a regularization term based on the idea of contrastive learning to inject prior structural knowledge efficiently. Experiments on both topic taxonomy discovery and document representation demonstrate that the proposed framework achieves improved performance against existing embedded topic models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源