论文标题

单细胞CRISPR屏幕的指数家庭测量误差模型

Exponential family measurement error models for single-cell CRISPR screens

论文作者

Barry, Timothy, Roeder, Kathryn, Katsevich, Eugene

论文摘要

CRISPR基因组工程和单细胞RNA测序具有加速的生物学发现。单细胞CRISPR筛选这两种技术,将单个细胞中的遗传扰动与基因表达的变化联系起来,并照亮了基础疾病的调节网络。尽管有希望,单细胞CRISPR屏幕仍面临着重大的统计挑战。我们通过理论和真实数据分析证明,单细胞CRISPR屏幕的估计和推断的标准方法(“阈值回归”)表现出衰减偏见和偏见差异的权衡,这是内在的,具有挑战性的对选择性调音参数的函数。为了克服这些困难,我们介绍了GLM-EIV(“基于GLM的错误中的错误”),这是一种用于单细胞CRISPR屏幕分析的新方法。 GLM-EIV将经典错误模型扩展到响应和嘈杂的预测因子,这些响应和嘈杂的预测因子是指数级的家庭分布,并可能受到相同的混杂变量的影响。我们开发了一个计算基础架构,以在数百个处理器(例如Microsoft Azure)和高性能簇上部署GLM-EIV。利用此基础架构,我们应用GLM-EIV来分析两个最近的大型单细胞CRISPR屏幕数据集,从而产生了几种新颖的见解。

CRISPR genome engineering and single-cell RNA sequencing have accelerated biological discovery. Single-cell CRISPR screens unite these two technologies, linking genetic perturbations in individual cells to changes in gene expression and illuminating regulatory networks underlying diseases. Despite their promise, single-cell CRISPR screens present substantial statistical challenges. We demonstrate through theoretical and real data analyses that a standard method for estimation and inference in single-cell CRISPR screens -- "thresholded regression" -- exhibits attenuation bias and a bias-variance tradeoff as a function of an intrinsic, challenging-to-select tuning parameter. To overcome these difficulties, we introduce GLM-EIV ("GLM-based errors-in-variables"), a new method for single-cell CRISPR screen analysis. GLM-EIV extends the classical errors-in-variables model to responses and noisy predictors that are exponential family-distributed and potentially impacted by the same set of confounding variables. We develop a computational infrastructure to deploy GLM-EIV across hundreds of processors on clouds (e.g., Microsoft Azure) and high-performance clusters. Leveraging this infrastructure, we apply GLM-EIV to analyze two recent, large-scale, single-cell CRISPR screen datasets, yielding several novel insights.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源