稀疏而密集的方法，用于对对话的全等级回答

论文标题

稀疏而密集的方法，用于对对话的全等级回答

Sparse and Dense Approaches for the Full-rank Retrieval of Responses for Dialogues

论文作者

Penha, Gustavo, Hauff, Claudia

论文摘要

给定对话环境的排名响应是一种流行的基准测试，在该基准中，设置是在有限的$ n $响应中重新排列地面真相响应，其中$ n $通常是10。对话响应排名中这种设置的主要贡献导致人们非常关注神经居民的关注，而构建神经居民，而构建了第一阶段的重试步骤。由于$ n $响应的候选列表中始终可用正确的答案，因此该人工评估设置假设存在第一阶段的检索步骤，该步骤始终能够在其上$ N $列表中对正确的响应进行排名。在本文中，我们专注于更现实的全级回答的任务，其中$ n $最多可达数百万的响应。我们研究了对话环境和响应扩展技术，以进行稀疏检索，以及零拍和微调的密集检索方法。我们基于三个不同信息的对话数据集的发现表明，学习的响应扩展技术是稀疏检索的坚实基准。我们发现，通过中级培训，总体上最佳的表现方法是密集的检索，即在学习句子表示的语言模型前一步，然后对目标对话数据进行微调。我们还研究了有趣的现象，即更难的负面抽样技术会导致微调密集的检索模型的结果较差。代码和数据集可在https://github.com/guzpenha/transformer_rankers/tree/full_rank_retrieval_dialogues获得。

Ranking responses for a given dialogue context is a popular benchmark in which the setup is to re-rank the ground-truth response over a limited set of $n$ responses, where $n$ is typically 10. The predominance of this setup in conversation response ranking has lead to a great deal of attention to building neural re-rankers, while the first-stage retrieval step has been overlooked. Since the correct answer is always available in the candidate list of $n$ responses, this artificial evaluation setup assumes that there is a first-stage retrieval step which is always able to rank the correct response in its top-$n$ list. In this paper we focus on the more realistic task of full-rank retrieval of responses, where $n$ can be up to millions of responses. We investigate both dialogue context and response expansion techniques for sparse retrieval, as well as zero-shot and fine-tuned dense retrieval approaches. Our findings based on three different information-seeking dialogue datasets reveal that a learned response expansion technique is a solid baseline for sparse retrieval. We find the best performing method overall to be dense retrieval with intermediate training, i.e. a step after the language model pre-training where sentence representations are learned, followed by fine-tuning on the target conversational data. We also investigate the intriguing phenomena that harder negatives sampling techniques lead to worse results for the fine-tuned dense retrieval models. The code and datasets are available at https://github.com/Guzpenha/transformer_rankers/tree/full_rank_retrieval_dialogues.

下载PDF全文

下载文献需遵守相关版权规定

论文标题