论文标题
量化基于规则的文本分类器的精确估计的不确定性
Quantifying the Uncertainty of Precision Estimates for Rule based Text Classifiers
论文作者
论文摘要
基于规则的分类器,使用关键子弦的存在和不存在进行分类决策具有量化其精度不确定性的自然机制。对于二进制分类器,关键见解是将文档引起的子弦集集的分区视为伯努利随机变量。每个随机变量的平均值是在呈现诱导该分区的文档时对分类器精度的估计值。可以使用标准统计测试将这些手段与所需或预期的分类器精度进行比较。一组二进制分类器可以通过应用Dempster-Shafer证据理论的应用组合为单个多标签分类器。通过基准问题证明了这种方法的实用性。
Rule based classifiers that use the presence and absence of key sub-strings to make classification decisions have a natural mechanism for quantifying the uncertainty of their precision. For a binary classifier, the key insight is to treat partitions of the sub-string set induced by the documents as Bernoulli random variables. The mean value of each random variable is an estimate of the classifier's precision when presented with a document inducing that partition. These means can be compared, using standard statistical tests, to a desired or expected classifier precision. A set of binary classifiers can be combined into a single, multi-label classifier by an application of the Dempster-Shafer theory of evidence. The utility of this approach is demonstrated with a benchmark problem.