CIDMP：使用低维特征空间对红细胞中疟疾寄生虫的完全解释性检测

论文标题

CIDMP：使用低维特征空间对红细胞中疟疾寄生虫的完全解释性检测

CIDMP: Completely Interpretable Detection of Malaria Parasite in Red Blood Cells using Lower-dimensional Feature Space

论文作者

Khan, Anik, Gupta, Kishor Datta, Venugopal, Deepak, Kumar, Nirman

论文摘要

预测红细胞（RBC）是否被疟疾寄生虫感染是病理学中的重要问题。最近，有监督的机器学习方法已用于此问题，并且取得了合理的成功。特别是，诸如卷积神经网络之类的最新方法自动从图像像素中提取越来越复杂的特征层次结构。尽管这种通用的自动特征提取方法已大大减轻了许多领域的特征工程负担，但对于我们在本文中考虑的一项域名任务（例如我们考虑的一项），它们导致了两个主要问题。首先，他们使用大量功能（可能是或可能不相关的功能），因此培训在计算上是昂贵的。此外，更重要的是，较大的功能空间使很难解释哪些功能对于预测确实很重要。因此，对这种方法的批评是，在这种情况下，学习算法对其用户构成了不透明的黑匣子。这种算法的建议可以很容易地理解，但是其建议的原因尚不清楚。这是模型不泄露性的问题，而且表现最佳的算法通常是最不可解释的。为了解决这些问题，在本文中，我们提出了一种方法来提取较少数量的聚合功能，这些功能易于解释和计算，并且从经验上表明，即使有大幅度降低的功能空间，我们也获得了高预测准确性。

Predicting if red blood cells (RBC) are infected with the malaria parasite is an important problem in Pathology. Recently, supervised machine learning approaches have been used for this problem, and they have had reasonable success. In particular, state-of-the-art methods such as Convolutional Neural Networks automatically extract increasingly complex feature hierarchies from the image pixels. While such generalized automatic feature extraction methods have significantly reduced the burden of feature engineering in many domains, for niche tasks such as the one we consider in this paper, they result in two major problems. First, they use a very large number of features (that may or may not be relevant) and therefore training such models is computationally expensive. Further, more importantly, the large feature-space makes it very hard to interpret which features are truly important for predictions. Thus, a criticism of such methods is that learning algorithms pose opaque black boxes to its users, in this case, medical experts. The recommendation of such algorithms can be understood easily, but the reason for their recommendation is not clear. This is the problem of non-interpretability of the model, and the best-performing algorithms are usually the least interpretable. To address these issues, in this paper, we propose an approach to extract a very small number of aggregated features that are easy to interpret and compute, and empirically show that we obtain high prediction accuracy even with a significantly reduced feature-space.

下载PDF全文

下载文献需遵守相关版权规定

论文标题