论文标题
在生活实验室中评估研究数据集建议
Evaluating Research Dataset Recommendations in a Living Lab
论文作者
论文摘要
搜索研究数据集和费力一样重要。由于在进一步研究中选择研究数据的重要性,必须仔细做出该决定。此外,由于几乎所有领域的数据量增加,研究数据已经是经验科学中的核心工件。因此,研究数据集建议可以有益地补充科学出版物搜索。我们通过重点关注研究数据集和科学出版物之间的广泛相似性来提出建议任务作为检索问题。在多阶段方法中,通过BM25排名函数和动态查询检索了初始建议。随后,使用点击反馈和文档嵌入,将初始排名重新排列。使用clef 2021的LILAS实验室中的Stella基础架构实时评估了所提出的系统。在CLEF 2021的实际用户交互数据上进行了现场评估。在实时评估之前,我们的实验系统可以通过基于现场系统的先前用户交互数据进行伪测试收集,在实时评估之前进行微调。结果表明,实验系统的表现优于其他参与系统。
The search for research datasets is as important as laborious. Due to the importance of the choice of research data in further research, this decision must be made carefully. Additionally, because of the growing amounts of data in almost all areas, research data is already a central artifact in empirical sciences. Consequentially, research dataset recommendations can beneficially supplement scientific publication searches. We formulated the recommendation task as a retrieval problem by focussing on broad similarities between research datasets and scientific publications. In a multistage approach, initial recommendations were retrieved by the BM25 ranking function and dynamic queries. Subsequently, the initial ranking was re-ranked utilizing click feedback and document embeddings. The proposed system was evaluated live on real user interaction data using the STELLA infrastructure in the LiLAS Lab at CLEF 2021. Our experimental system could efficiently be fine-tuned before the live evaluation by pre-testing the system with a pseudo test collection based on prior user interaction data from the live system. The results indicate that the experimental system outperforms the other participating systems.