论文标题
survset:一个开源时间到事件的数据集存储库
SurvSet: An open-source time-to-event dataset repository
论文作者
论文摘要
事件时间(T2E)分析是统计的一个分支,该分支对事件发生的时间持续时间进行了建模。这些事件可以包括死亡,失业或产品失败等结果。大多数现代机器学习(ML)算法,例如决策树和内核方法,都使用数据科学软件(Python和R)进行T2E建模。为了补充这些发展,Survset是第一个旨在快速基准ML算法和统计方法的开源T2E数据集存储库。幸存集中的数据已始终如一地格式化,以便单个预处理方法适用于所有数据集。 Survet当前的数据集在维度,时间依赖性和背景方面有所不同(其中大多数来自生物医学)。可在PYPI上获得Survset,可以使用PIP安装存活器安装。 R用户可以直接从相应的GIT存储库下载数据。
Time-to-event (T2E) analysis is a branch of statistics that models the duration of time it takes for an event to occur. Such events can include outcomes like death, unemployment, or product failure. Most modern machine learning (ML) algorithms, like decision trees and kernel methods, are supported for T2E modelling with data science software (python and R). To complement these developments, SurvSet is the first open-source T2E dataset repository designed for a rapid benchmarking of ML algorithms and statistical methods. The data in SurvSet have been consistently formatted so that a single preprocessing method will work for all datasets. SurvSet currently has 76 datasets which vary in dimensionality, time dependency, and background (the majority of which come from biomedicine). SurvSet is available on PyPI and can be installed with pip install SurvSet. R users can download the data directly from the corresponding git repository.