论文标题
使用Pyterrier进行信息检索的声明性实验
Declarative Experimentation in Information Retrieval using PyTerrier
论文作者
论文摘要
用富有表现力的高级语言(例如Python)开发的深度机器学习平台(例如Tensorflow和Pytorch)的出现,允许对深神经网络体系结构的更有表现力表示。我们认为,在信息检索(IR)中缺少这种强大的形式主义,并提出了一个名为Pyterrier的框架,该框架允许以声明性的方式表达和评估高级检索管道,以接近其概念设计。就像将深度学习实验编译为原始GPU操作的上述框架一样,我们的框架将IR平台定为后端,以执行和评估检索管道。此外,我们可以自动优化检索管道,以提高其效率以适合特定的IR平台后端。我们的实验是在TREC稳健和Clueweb09测试集中进行的,证明了这些优化的效率优势,用于检索管道,涉及Anserini和Terrier IR平台。
The advent of deep machine learning platforms such as Tensorflow and Pytorch, developed in expressive high-level languages such as Python, have allowed more expressive representations of deep neural network architectures. We argue that such a powerful formalism is missing in information retrieval (IR), and propose a framework called PyTerrier that allows advanced retrieval pipelines to be expressed, and evaluated, in a declarative manner close to their conceptual design. Like the aforementioned frameworks that compile deep learning experiments into primitive GPU operations, our framework targets IR platforms as backends in order to execute and evaluate retrieval pipelines. Further, we can automatically optimise the retrieval pipelines to increase their efficiency to suite a particular IR platform backend. Our experiments, conducted on TREC Robust and ClueWeb09 test collections, demonstrate the efficiency benefits of these optimisations for retrieval pipelines involving both the Anserini and Terrier IR platforms.