教程：理解和设计下一代信息检索系统的现代理论工具

论文标题

教程：理解和设计下一代信息检索系统的现代理论工具

Tutorial: Modern Theoretical Tools for Understanding and Designing Next-generation Information Retrieval System

论文作者

Xu, Da, Ruan, Chuanwei

论文摘要

在机器学习的相对较短的历史中，工程和理论进步之间的微妙平衡已被证明在各个阶段至关重要。最近的AI浪潮带来了IR社区强大的技术，尤其是用于模式识别。尽管随着许多任务在算法上可行，但许多想法都受益匪浅，但平衡正向应用程序倾斜。 IR中现有的理论工具无法再解释，指导和证明新建立的方法。后果可能会遭受痛苦：与IR行业设想的现代AI如何使生活更轻松，许多人在数据操纵，模型选择，监控，检查，审查和决策方面的混乱和成本增加形成鲜明对比。这一现实并不奇怪：如果没有方便的理论工具，我们通常缺乏对模式识别模型的表现力，优化属性，概括保证的原则知识，而我们的决策过程必须依靠过度简化的假设和人类时不时的判断。现在是时候为社区带来了一个系统的教程，介绍了我们如何成功地适应这些工具并在理解，设计和最终生产有影响力的IR系统方面取得了重大进展。我们强调系统性，因为IR是一门全面的学科，它涉及学习，因果推理分析，交互式（在线）决策等的特定方面。因此，它需要系统的校准来使导入的理论工具的实际实用性用于IR问题，因为它们通常展示独特的结构和定义。因此，我们计划本教程，系统地展示了我们使用先进理论工具来理解和设计IR系统的学习和成功的经验。

In the relatively short history of machine learning, the subtle balance between engineering and theoretical progress has been proved critical at various stages. The most recent wave of AI has brought to the IR community powerful techniques, particularly for pattern recognition. While many benefits from the burst of ideas as numerous tasks become algorithmically feasible, the balance is tilting toward the application side. The existing theoretical tools in IR can no longer explain, guide, and justify the newly-established methodologies. The consequences can be suffering: in stark contrast to how the IR industry has envisioned modern AI making life easier, many are experiencing increased confusion and costs in data manipulation, model selection, monitoring, censoring, and decision making. This reality is not surprising: without handy theoretical tools, we often lack principled knowledge of the pattern recognition model's expressivity, optimization property, generalization guarantee, and our decision-making process has to rely on over-simplified assumptions and human judgments from time to time. Time is now to bring the community a systematic tutorial on how we successfully adapt those tools and make significant progress in understanding, designing, and eventually productionize impactful IR systems. We emphasize systematicity because IR is a comprehensive discipline that touches upon particular aspects of learning, causal inference analysis, interactive (online) decision-making, etc. It thus requires systematic calibrations to render the actual usefulness of the imported theoretical tools to serve IR problems, as they usually exhibit unique structures and definitions. Therefore, we plan this tutorial to systematically demonstrate our learning and successful experience of using advanced theoretical tools for understanding and designing IR systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题