论文标题
保守的可能性比率估计器的少于频率阈值略高于频率的数据
Conservative Likelihood Ratio Estimator for Infrequent Data Slightly above a Frequency Threshold
论文作者
论文摘要
使用观察到的事件频率的天真似然比(LR)估计可以高估数据的LRS。避免此问题的一种方法是使用频率阈值,并将估计值设置为零以低于阈值的频率。这种方法消除了一些估计的计算,从而更有效地使用LRS来制作实用任务。但是,它仍然高估了阈值接近低频的LRS。这项研究提出了一个低频率的保守估计器,略高于阈值。我们的实验使用LRS来预测语料库指定实体的发生环境。实验结果表明,我们的估计器提高了预测准确性,同时在上下文预测任务中保持效率。
A naive likelihood ratio (LR) estimation using the observed frequencies of events can overestimate LRs for infrequent data. One approach to avoid this problem is to use a frequency threshold and set the estimates to zero for frequencies below the threshold. This approach eliminates the computation of some estimates, thereby making practical tasks using LRs more efficient. However, it still overestimates LRs for low frequencies near the threshold. This study proposes a conservative estimator for low frequencies, slightly above the threshold. Our experiment used LRs to predict the occurrence contexts of named entities from a corpus. The experimental results demonstrate that our estimator improves the prediction accuracy while maintaining efficiency in the context prediction task.