论文标题
使用持续的同源性确定细胞术数据中的临床相关特征
Determining clinically relevant features in cytometry data using persistent homology
论文作者
论文摘要
细胞仪实验产生的高维点云数据很难手动解释。布尔门控技术以及细胞子集相对丰度的比较是当前的细胞仪数据分析标准。但是,这种方法无法捕获隐藏在数据中的更微妙的拓扑特征,尤其是如果这些特征被数据变换或显着的批处理效应或临床数据中的供体变异或供体变化。分析可公开可用的细胞术数据,描述了COVID-19患者中未接受的CD8+ T细胞和健康对照组的分析表明,在Covid-19患者和健康对照组中,单细胞蛋白表达之间存在系统的结构差异。我们通过基于决策树的分类器,样本点随机确定感兴趣的蛋白质,并从这些采样点计算持久图。所得的持久图在不同密度的细胞仪数据集中识别区域,并确定诸如“肘部”之类的突出结构。我们计算了这些持久性图表的健康对照组和199名患者的持久性图之间的瓦斯恒星距离,发现COVID-19患者与T-BET,EOMES和KI-67的表达数据中存在系统的结构差异。进一步的分析表明,与健康对照组相比,在COVID-19患者非不接受CD8+ T细胞的患者中,T-BET和EOME的表达显着下调。这种违反直觉的发现可能表明,在199例患者中,规范效应子CD8+ T细胞比健康对照组少。此方法适用于任何细胞仪数据集,用于通过拓扑数据分析发现新的见解,这些分析可能很难通过标准的门控策略或现有的生物信息学工具来确定。
Cytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more subtle topological features hidden in data, especially if those features are further masked by data transforms or significant batch effects or donor-to-donor variations in clinical data. Analysis of publicly available cytometry data describing non-naïve CD8+ T cells in COVID-19 patients and healthy controls shows that systematic structural differences exist between single cell protein expressions in COVID-19 patients and healthy controls. We identify proteins of interest by a decision-tree based classifier, sample points randomly and compute persistence diagrams from these sampled points. The resulting persistence diagrams identify regions in cytometry datasets of varying density and identify protruded structures such as `elbows'. We compute Wasserstein distances between these persistence diagrams for random pairs of healthy controls and COVID-19 patients and find that systematic structural differences exist between COVID-19 patients and healthy controls in the expression data for T-bet, Eomes, and Ki-67. Further analysis shows that expression of T-bet and Eomes are significantly downregulated in COVID-19 patient non-naïve CD8+ T cells compared to healthy controls. This counter-intuitive finding may indicate that canonical effector CD8+ T cells are less prevalent in COVID-19 patients than healthy controls. This method is applicable to any cytometry dataset for discovering novel insights through topological data analysis which may be difficult to ascertain otherwise with a standard gating strategy or existing bioinformatic tools.