论文标题
探索数据理论的学习难度和测量
Exploring the Learning Difficulty of Data Theory and Measure
论文作者
论文摘要
由于学习难度对于机器学习至关重要(例如,基于难度的加权学习策略),以前的文献提出了许多学习困难措施。但是,迄今为止,没有针对学习难度的全面调查,导致几乎所有现有的措施都在没有严格的理论基础的情况下进行了启发性定义。此外,即使在许多研究中至关重要,也没有正式的简单和硬样品定义。这项研究试图进行一项试验理论研究,以实现样本的学习难度。首先,基于对概括错误的偏见变化权衡理论提出了学习难度的理论定义。基于拟议的定义建立了简单和硬样品的理论定义。也给出了一种实用的学习难度测量方法,并受到正式定义的启发。其次,探索了学习难度的加权策略的属性。随后,可以根据探索的属性来很好地解释机器学习中的几种经典加权方法。第三,评估了提出的措施,以验证其在几个主要难度因素方面的合理性和优越性。这些实验中的比较表明,在整个实验过程中,所提出的措施显着优于其他措施。
As learning difficulty is crucial for machine learning (e.g., difficulty-based weighting learning strategies), previous literature has proposed a number of learning difficulty measures. However, no comprehensive investigation for learning difficulty is available to date, resulting in that nearly all existing measures are heuristically defined without a rigorous theoretical foundation. In addition, there is no formal definition of easy and hard samples even though they are crucial in many studies. This study attempts to conduct a pilot theoretical study for learning difficulty of samples. First, a theoretical definition of learning difficulty is proposed on the basis of the bias-variance trade-off theory on generalization error. Theoretical definitions of easy and hard samples are established on the basis of the proposed definition. A practical measure of learning difficulty is given as well inspired by the formal definition. Second, the properties for learning difficulty-based weighting strategies are explored. Subsequently, several classical weighting methods in machine learning can be well explained on account of explored properties. Third, the proposed measure is evaluated to verify its reasonability and superiority in terms of several main difficulty factors. The comparison in these experiments indicates that the proposed measure significantly outperforms the other measures throughout the experiments.