COVID-19的可解释的机器学习：一项关于严重性预测任务的实证研究

论文标题

COVID-19的可解释的机器学习：一项关于严重性预测任务的实证研究

Interpretable Machine Learning for COVID-19: An Empirical Study on Severity Prediction Task

论文作者

Wu, Han, Ruan, Wenjie, Wang, Jiangtao, Zheng, Dingchang, Liu, Bei, Gen, Yayuan, Chai, Xiangfei, Chen, Jian, Li, Kunwei, Li, Shaolin, Helal, Sumi

论文摘要

机器学习模型的黑盒性质阻碍了医学诊断中一些高临界模型的部署。将自己的生命置于医学研究人员无法完全理解的模型手中是有风险的。但是，通过模型的解释，黑盒模型可以迅速揭示出大量的生物标志物，因为COVID-19-19大流行中受感染患者的激增可能忽略了医生可能忽略了。这项研究利用了一个数据库，该数据库由92名患者进行了确认的SARS-COV-2实验室测试，从2020年1月18日至2020年3月5日，在中国珠海，以确定表明严重性预测的生物标志物。 Through the interpretation of four machine learning models, decision tree, random forests, gradient boosted trees, and neural networks using permutation feature importance, Partial Dependence Plot (PDP), Individual Conditional Expectation (ICE), Accumulated Local Effects (ALE), Local Interpretable Model-agnostic Explanations (LIME), and Shapley Additive Explanation (SHAP), we identify an increase in N-Terminal pro-Brain Natriuretic Peptide （NTPROBNP），C-反应蛋白（CRP）和乳酸脱氢酶（LDH），淋巴细胞（LYM）的降低与严重感染和死亡风险增加有关，这与使用专用模型的COVID-19和其他研究有关的医学研究是一致的。我们在一个大型开放数据集上进一步验证了我们的方法，其中有5644名确认来自以色列阿尔伯特·爱因斯坦医院的患者，位于卡格格勒的巴西圣保罗，以及揭开白血病细胞，嗜酸性粒细胞和血量为三种指示性生物标志物的COVID-COVID-19。

The black-box nature of machine learning models hinders the deployment of some high-accuracy models in medical diagnosis. It is risky to put one's life in the hands of models that medical researchers do not fully understand. However, through model interpretation, black-box models can promptly reveal significant biomarkers that medical practitioners may have overlooked due to the surge of infected patients in the COVID-19 pandemic. This research leverages a database of 92 patients with confirmed SARS-CoV-2 laboratory tests between 18th Jan. 2020 and 5th Mar. 2020, in Zhuhai, China, to identify biomarkers indicative of severity prediction. Through the interpretation of four machine learning models, decision tree, random forests, gradient boosted trees, and neural networks using permutation feature importance, Partial Dependence Plot (PDP), Individual Conditional Expectation (ICE), Accumulated Local Effects (ALE), Local Interpretable Model-agnostic Explanations (LIME), and Shapley Additive Explanation (SHAP), we identify an increase in N-Terminal pro-Brain Natriuretic Peptide (NTproBNP), C-Reaction Protein (CRP), and lactic dehydrogenase (LDH), a decrease in lymphocyte (LYM) is associated with severe infection and an increased risk of death, which is consistent with recent medical research on COVID-19 and other research using dedicated models. We further validate our methods on a large open dataset with 5644 confirmed patients from the Hospital Israelita Albert Einstein, at São Paulo, Brazil from Kaggle, and unveil leukocytes, eosinophils, and platelets as three indicative biomarkers for COVID-19.

下载PDF全文

下载文献需遵守相关版权规定

论文标题