论文标题
从单细胞显微镜数据中学习蛋白质的多尺度功能表示
Learning multi-scale functional representations of proteins from single-cell microscopy data
论文作者
论文摘要
蛋白质功能固有地与其在细胞中的定位有关,荧光显微镜数据是学习蛋白质的学习代表必不可少的资源。尽管分子表示学习方面有重大发展,但从生物图像中提取功能信息仍然是一项非平凡的计算任务。当前的最新方法使用自动编码器模型通过重建图像来学习高质量的功能。但是,这种方法容易捕获噪声和成像工件。在这项工作中,我们重新审视用于对主要亚细胞定位进行分类的深度学习模型,并评估从其最终层中提取的表示。我们表明,经过本地化分类培训的简单卷积网络可以学习构建各种功能信息的蛋白质表示,并明显超过基于自动编码器的模型。我们还提出了一种强大的评估策略,以评估不同生物学功能不同尺度的蛋白质表示质量。
Protein function is inherently linked to its localization within the cell, and fluorescent microscopy data is an indispensable resource for learning representations of proteins. Despite major developments in molecular representation learning, extracting functional information from biological images remains a non-trivial computational task. Current state-of-the-art approaches use autoencoder models to learn high-quality features by reconstructing images. However, such methods are prone to capturing noise and imaging artifacts. In this work, we revisit deep learning models used for classifying major subcellular localizations, and evaluate representations extracted from their final layers. We show that simple convolutional networks trained on localization classification can learn protein representations that encapsulate diverse functional information, and significantly outperform autoencoder-based models. We also propose a robust evaluation strategy to assess quality of protein representations across different scales of biological function.