高斯潜在迪里奇分配分配离散的人类国家发现

论文标题

高斯潜在迪里奇分配分配离散的人类国家发现

Gaussian Latent Dirichlet Allocation for Discrete Human State Discovery

论文作者

Wu, Congyu, Fisher, Aaron, Schnyer, David

论文摘要

在本文中，我们提出并验证了一个无监督的概率模型高斯潜在的迪里奇莱特分配（GLDA），以从反复的，多元的心理生理样本中从多个固有不同的个人中收集的离散状态发现问题。心理学和医学研究在很大程度上涉及测量潜在相关但单独的不确定的变量，从一系列参与者来得出诊断，需要进行聚类分析。传统的概率聚类模型（例如高斯混合物模型（GMM））假定组件分布的全球混合物，这对于不同患者的观察可能并不现实。 GLDA模型借用了自然语言处理中的流行主题模型的潜在Dirichlet分配（LDA）的个体特定混合结构，并将其与GMM的高斯组件分布合并，以适合连续类型数据。我们使用Stan（一种概率的建模语言）实施了GLDA，并将其应用于两个数据集，一个包含生态时刻评估（EMA），以及通过心电图和阻抗心脏图的其他心脏测量。我们发现，在这两个数据集中，GLDA学习的班级体重与临床评估的抑郁，焦虑和压力评分的相关性明显高于基线GMM所产生的抑郁症，焦虑和压力评分。我们的发现证明了GLDA比常规有限混合模型的优势，从重复的多元数据中发现了人类状态的发现，这可能是由于对参与者间差异的潜在潜在差异的表征更好。需要在更广泛的应用程序上验证该模型的实用性需要未来的工作。

In this article we propose and validate an unsupervised probabilistic model, Gaussian Latent Dirichlet Allocation (GLDA), for the problem of discrete state discovery from repeated, multivariate psychophysiological samples collected from multiple, inherently distinct, individuals. Psychology and medical research heavily involves measuring potentially related but individually inconclusive variables from a cohort of participants to derive diagnosis, necessitating clustering analysis. Traditional probabilistic clustering models such as Gaussian Mixture Model (GMM) assume a global mixture of component distributions, which may not be realistic for observations from different patients. The GLDA model borrows the individual-specific mixture structure from a popular topic model Latent Dirichlet Allocation (LDA) in Natural Language Processing and merges it with the Gaussian component distributions of GMM to suit continuous type data. We implemented GLDA using STAN (a probabilistic modeling language) and applied it on two datasets, one containing Ecological Momentary Assessments (EMA) and the other heart measures from electrocardiogram and impedance cardiograph. We found that in both datasets the GLDA-learned class weights achieved significantly higher correlations with clinically assessed depression, anxiety, and stress scores than those produced by the baseline GMM. Our findings demonstrate the advantage of GLDA over conventional finite mixture models for human state discovery from repeated multivariate data, likely due to better characterization of potential underlying between-participant differences. Future work is required to validate the utility of this model on a broader range of applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题