论文标题
关于乳腺癌治疗和生育数据缺失值的插补技术
Imputation techniques on missing values in breast cancer treatment and fertility data
论文作者
论文摘要
使用数据挖掘技术的临床决策支持提供了更聪明的方法来减少过去几年的决策错误。但是,临床数据集通常会遭受高度缺失,如果处理不当,这会对建模的质量产生不利影响。归咎于缺失的值提供了解决问题的机会。常规的插补方法采用了简单的统计分析,例如平均插补或丢弃缺失的病例,这些案例具有许多局限性,从而降低了学习的表现。这项研究研究了一系列基于机器的插补方法,并提出了一种有效的方法来准备高质量的乳腺癌(BC)数据集,以找到BC治疗与与化学疗法相关的闭经乳腺癌之间的关系,在该数据中,以预测的准确性评估了性能。
Clinical decision support using data mining techniques offers more intelligent way to reduce the decision error in the last few years. However, clinical datasets often suffer from high missingness, which adversely impacts the quality of modelling if handled improperly. Imputing missing values provides an opportunity to resolve the issue. Conventional imputation methods adopt simple statistical analysis, such as mean imputation or discarding missing cases, which have many limitations and thus degrade the performance of learning. This study examines a series of machine learning based imputation methods and suggests an efficient approach to in preparing a good quality breast cancer (BC) dataset, to find the relationship between BC treatment and chemotherapy-related amenorrhoea, where the performance is evaluated with the accuracy of the prediction.