论文标题
多尺度杂交主题建模:通过主题建模分析非结构化文本数据集的管道
Multi-scale Hybridized Topic Modeling: A Pipeline for Analyzing Unstructured Text Datasets via Topic Modeling
论文作者
论文摘要
我们提出了一种多尺度杂交主题建模方法,比传统的主题建模方法更准确,更有效地从转录的访谈中找到隐藏的主题。我们的多尺度杂交主题建模方法(MSHTM)以不同的尺度使用数据,并以层次结构的方式执行主题建模,该方式首先利用经典方法,非负矩阵分解,然后是基于变压器的方法bertopic。它可以利用NMF和伯托的优势。我们的方法可以帮助研究人员和公众更好地提取和解释面试信息。此外,它为基于主题级别的新索引系统提供了见解。然后,我们将我们的方法部署在现实世界的访谈成绩单上,并找到有希望的结果。
We propose a multi-scale hybridized topic modeling method to find hidden topics from transcribed interviews more accurately and efficiently than traditional topic modeling methods. Our multi-scale hybridized topic modeling method (MSHTM) approaches data at different scales and performs topic modeling in a hierarchical way utilizing first a classical method, Nonnegative Matrix Factorization, and then a transformer-based method, BERTopic. It harnesses the strengths of both NMF and BERTopic. Our method can help researchers and the public better extract and interpret the interview information. Additionally, it provides insights for new indexing systems based on the topic level. We then deploy our method on real-world interview transcripts and find promising results.