论文标题
创建分析的方法,以及用于医疗保健机器学习分析的火车和测试记录
Methodology to Create Analysis-Naive Holdout Records as well as Train and Test Records for Machine Learning Analyses in Healthcare
论文作者
论文摘要
研究人员通常将研究池中的数据用于外部验证以及未来的研究,并且对于那些使用机器学习建模研究的人来说,同样的愿望也是如此。对于此讨论,保留样本的目的是为研究保留数据,这些研究将是分析性的,并从完整数据集中随机选择。 Analysis-Neive是未用于测试或培训机器学习(ML)模型的记录和未参与当前机器学习研究任何方面的记录。建议创建持有的方法是对K折的交叉验证的修改,该方法考虑了随机化并有效地允许三向拆分(保留,测试和训练)作为该方法的一部分而无需强迫。该论文还提供了一个有效的示例,该示例使用Python中的一组自动化功能以及在医疗保健中适用的一些方案。
It is common for researchers to holdout data from a study pool to be used for external validation as well as for future research, and the same desire is true to those using machine learning modeling research. For this discussion, the purpose of the holdout sample it is preserve data for research studies that will be analysis-naive and randomly selected from the full dataset. Analysis-naive are records that are not used for testing or training machine learning (ML) models and records that do not participate in any aspect of the current machine learning study. The methodology suggested for creating holdouts is a modification of k-fold cross validation, which takes into account randomization and efficiently allows a three-way split (holdout, test and training) as part of the method without forcing. The paper also provides a working example using set of automated functions in Python and some scenarios for applicability in healthcare.