一个集成的优化和机器学习模型，以预测紧急患者的入院状态

论文标题

一个集成的优化和机器学习模型，以预测紧急患者的入院状态

An Integrated Optimization and Machine Learning Models to Predict the Admission Status of Emergency Patients

论文作者

Ahmed, Abdulaziz, Ashour, Omar, Ali, Haneen, Firouz, Mohammad

论文摘要

这项工作提出了一个优化机器学习算法的框架。使用医疗保健领域的重要案例研究来说明该框架的实用性，该研究预测了急诊科（ED）患者的入院状态（例如，被录取率与已释放的）使用患者数据在分类时的患者数据。提出的框架可以通过主动计划患者登机过程来减轻拥挤问题。从美国中西部的医疗保健提供商的三个主要地点，从所有ED访问的电子健康记录数据库中获得了大量的患者记录数据集。提出了三种机器学习算法：T-XGB，T-ADAB和T-MLP。 T-XGB集成了极端梯度提升（XGB）和TABU搜索（TS），T-ADAB集成了Adaboost和TS，并且T-MLP集成了多层Perceptron（MLP）和TS。将所提出的算法与传统算法进行比较：XGB，ADAB和MLP，其中使用网格搜索对其参数进行了调整。提出的三种算法和原始算法是使用从不同特征选择方法获得的九个数据组训练和测试的。换句话说，开发了54个模型。使用五种措施评估了性能：曲线下的面积（AUC），灵敏度，特异性，F1和准确性。结果表明，新提出的算法导致高度AUC，并且表现优于传统算法。 T-Adab在新开发的算法中表现最好。最佳模型的AUC，敏感性，特异性，F1和准确性分别为95.4％，99.3％，91.4％，95.2％，97.2％。

This work proposes a framework for optimizing machine learning algorithms. The practicality of the framework is illustrated using an important case study from the healthcare domain, which is predicting the admission status of emergency department (ED) patients (e.g., admitted vs. discharged) using patient data at the time of triage. The proposed framework can mitigate the crowding problem by proactively planning the patient boarding process. A large retrospective dataset of patient records is obtained from the electronic health record database of all ED visits over three years from three major locations of a healthcare provider in the Midwest of the US. Three machine learning algorithms are proposed: T-XGB, T-ADAB, and T-MLP. T-XGB integrates extreme gradient boosting (XGB) and Tabu Search (TS), T-ADAB integrates Adaboost and TS, and T-MLP integrates multi-layer perceptron (MLP) and TS. The proposed algorithms are compared with the traditional algorithms: XGB, ADAB, and MLP, in which their parameters are tunned using grid search. The three proposed algorithms and the original ones are trained and tested using nine data groups that are obtained from different feature selection methods. In other words, 54 models are developed. Performance was evaluated using five measures: Area under the curve (AUC), sensitivity, specificity, F1, and accuracy. The results show that the newly proposed algorithms resulted in high AUC and outperformed the traditional algorithms. The T-ADAB performs the best among the newly developed algorithms. The AUC, sensitivity, specificity, F1, and accuracy of the best model are 95.4%, 99.3%, 91.4%, 95.2%, 97.2%, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题