关于机器学习的域知识价值的定量观点

论文标题

关于机器学习的域知识价值的定量观点

A Quantitative Perspective on Values of Domain Knowledge for Machine Learning

论文作者

Yang, Jianyi, Ren, Shaolei

论文摘要

随着机器学习的爆炸性流行，各种形式的领域知识在改善学习绩效方面起着至关重要的作用，尤其是在培训数据受到限制时。但是，从定量的角度来看，几乎没有了解域知识会影响机器学习任务的程度。为了提高透明度并严格解释域知识在机器学习中的作用，我们研究了在知情的机器学习的背景下，根据其对学习绩效的贡献来量化域知识的价值。我们提出了一种基于沙普利价值的量化方法，该方法将整体学习绩效改善归因于不同领域知识。我们还提出了蒙特 - 卡洛抽样，以近似多项式时间复杂性近似域知识的公允价值。我们进行了将符号域知识注入MNIST和CIFAR10数据集中的半监督学习任务的实验，提供了不同符号知识的定量值，并严格地解释了它如何以测试准确性来影响机器学习的性能。

With the exploding popularity of machine learning, domain knowledge in various forms has been playing a crucial role in improving the learning performance, especially when training data is limited. Nonetheless, there is little understanding of to what extent domain knowledge can affect a machine learning task from a quantitative perspective. To increase the transparency and rigorously explain the role of domain knowledge in machine learning, we study the problem of quantifying the values of domain knowledge in terms of its contribution to the learning performance in the context of informed machine learning. We propose a quantification method based on Shapley value that fairly attributes the overall learning performance improvement to different domain knowledge. We also present Monte-Carlo sampling to approximate the fair value of domain knowledge with a polynomial time complexity. We run experiments of injecting symbolic domain knowledge into semi-supervised learning tasks on both MNIST and CIFAR10 datasets, providing quantitative values of different symbolic knowledge and rigorously explaining how it affects the machine learning performance in terms of test accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题