论文标题
WikinFormetrics:开放的Wikipedia知识图数据集的构建和描述
Wikinformetrics: Construction and description of an open Wikipedia knowledge graph dataset for informetric purposes
论文作者
论文摘要
维基百科是世界上访问量最多的网站之一,也是科学研究的经常主题。但是,尚未分析Wikipedia信息的分析可能性,同时考虑了大量页面和属性。这项工作的主要目的是为Wikipedia的信息大规模研究提供方法学框架和开放知识图。将Wikipedia页面的特征与科学出版物的特征进行了比较,以突出两种文档之间的(DI)相似性。基于此比较,探索了Wikipedia及其各种数据源提供的不同分析可能性,最终提供了一组指标,旨在研究来自不同分析维度的Wikipedia。同时,按照关系模型构建了(并共享)英语Wikipedia的完整专用数据集。最后,在英语Wikipedia数据集上进行了描述性案例研究,以说明知识图及其指标的分析潜力。
Wikipedia is one of the most visited websites in the world and is also a frequent subject of scientific research. However, the analytical possibilities of Wikipedia information have not yet been analyzed considering at the same time both a large volume of pages and attributes. The main objective of this work is to offer a methodological framework and an open knowledge graph for the informetric large-scale study of Wikipedia. Features of Wikipedia pages are compared with those of scientific publications to highlight the (di)similarities between the two types of documents. Based on this comparison, different analytical possibilities that Wikipedia and its various data sources offer are explored, ultimately offering a set of metrics meant to study Wikipedia from different analytical dimensions. In parallel, a complete dedicated dataset of the English Wikipedia was built (and shared) following a relational model. Finally, a descriptive case study is carried out on the English Wikipedia dataset to illustrate the analytical potential of the knowledge graph and its metrics.