索赔 - 摩单行者：具有联合重新排列和真实性预测的可解释的事实检查系统

论文标题

索赔 - 摩单行者：具有联合重新排列和真实性预测的可解释的事实检查系统

Claim-Dissector: An Interpretable Fact-Checking System with Joint Re-ranking and Veracity Prediction

论文作者

Fajcik, Martin, Motlicek, Petr, Smrz, Pavel

论文摘要

我们提出了索赔 - 动荡者：一个新颖的潜在变量模型，用于核对和分析，该模型赋予了索赔和一组检索的证据，共同学会了识别：（i）给定索赔的相关证据，（ii）索赔的真实性。我们建议以可解释的方式剥离验证性相关性概率及其对最终准确性的概率的贡献 - 最终的准确性概率与探测相关性概率的线性集合成正比。通过这种方式，可以确定证据对最终预测概率的个人贡献。在时效的相关性概率上，我们的模型可以进一步区分每个相关证据是支持或反驳（r）索赔。这允许量化S/R概率对最终判决的贡献或检测不同意的证据。尽管具有可解释的性质，但与典型的两阶段系统管道相比，我们的系统在发烧数据集上取得了与最先进的结果竞争的结果，同时使用明显较少的参数。它还在Faviq和RealFC数据集上设置了新的最新技术。此外，我们的分析表明，我们的模型可以在使用粗粒度监督的同时学习细粒度的相关性线索，我们以两种方式进行了证明。（i）我们表明，在仅使用段落级相关性监督的同时，我们的模型可以实现竞争性句子召回。（ii）朝着最优质的相关性横穿，我们表明我们的模型能够在令牌层面上识别相关性。为此，我们提出了一个新的基准tlr-fert，着眼于令牌级别的解释性 - 人类在相关证据中注释令牌，他们在做出判断时认为必不可少的证据。然后，我们衡量这些注释与模型的关注的这些注释与代币的相似之处。

We present Claim-Dissector: a novel latent variable model for fact-checking and analysis, which given a claim and a set of retrieved evidences jointly learns to identify: (i) the relevant evidences to the given claim, (ii) the veracity of the claim. We propose to disentangle the per-evidence relevance probability and its contribution to the final veracity probability in an interpretable way -- the final veracity probability is proportional to a linear ensemble of per-evidence relevance probabilities. In this way, the individual contributions of evidences towards the final predicted probability can be identified. In per-evidence relevance probability, our model can further distinguish whether each relevant evidence is supporting (S) or refuting (R) the claim. This allows to quantify how much the S/R probability contributes to the final verdict or to detect disagreeing evidence. Despite its interpretable nature, our system achieves results competitive with state-of-the-art on the FEVER dataset, as compared to typical two-stage system pipelines, while using significantly fewer parameters. It also sets new state-of-the-art on FAVIQ and RealFC datasets. Furthermore, our analysis shows that our model can learn fine-grained relevance cues while using coarse-grained supervision, and we demonstrate it in 2 ways. (i) We show that our model can achieve competitive sentence recall while using only paragraph-level relevance supervision. (ii) Traversing towards the finest granularity of relevance, we show that our model is capable of identifying relevance at the token level. To do this, we present a new benchmark TLR-FEVER focusing on token-level interpretability -- humans annotate tokens in relevant evidences they considered essential when making their judgment. Then we measure how similar are these annotations to the tokens our model is focusing on.

下载PDF全文

下载文献需遵守相关版权规定

论文标题