知识图问题回答排行榜：防止复制危机的社区资源

论文标题

知识图问题回答排行榜：防止复制危机的社区资源

Knowledge Graph Question Answering Leaderboard: A Community Resource to Prevent a Replication Crisis

论文作者

Perevalov, Aleksandr, Yan, Xi, Kovriguina, Liubov, Jiang, Longquan, Both, Andreas, Usbeck, Ricardo

论文摘要

需要评估数据驱动的系统，以建立对科学方法及其适用性的信任。特别是，对于知识图（kg）问题回答（QA），这是正确的，在该问题中，可以通过自然语言接口访问复杂的数据结构。评估这些系统的功能已经成为社区的驱动力已有十多年了，同时建立了不同的KGQA基准数据集。但是，比较不同的方法很麻烦。缺乏现有和精心策划的排行榜会导致对研究领域的全球视野缺失，并可能对结果注入不信任。特别是，LC-Quad和Qald是KGQA社区中最新，最常用的数据集，错过了提供中心和最新的信任点。在本文中，我们对过去十年中100个出版物和98个系统的显着覆盖范围进行了调查和分析，并分析了广泛的评估结果。我们为任何KGQA基准数据集提供了一个新的中央和开放性排行榜，作为社区-https：//kgqa.github.io/leaderboard的焦点。我们的分析强调了KGQA系统评估期间的现有问题。因此，我们将指出未来评估的可能改进。

Data-driven systems need to be evaluated to establish trust in the scientific approach and its applicability. In particular, this is true for Knowledge Graph (KG) Question Answering (QA), where complex data structures are made accessible via natural-language interfaces. Evaluating the capabilities of these systems has been a driver for the community for more than ten years while establishing different KGQA benchmark datasets. However, comparing different approaches is cumbersome. The lack of existing and curated leaderboards leads to a missing global view over the research field and could inject mistrust into the results. In particular, the latest and most-used datasets in the KGQA community, LC-QuAD and QALD, miss providing central and up-to-date points of trust. In this paper, we survey and analyze a wide range of evaluation results with significant coverage of 100 publications and 98 systems from the last decade. We provide a new central and open leaderboard for any KGQA benchmark dataset as a focal point for the community - https://kgqa.github.io/leaderboard. Our analysis highlights existing problems during the evaluation of KGQA systems. Thus, we will point to possible improvements for future evaluations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题