论文标题

神经检索器与超越:论文提案

Neural Retriever and Go Beyond: A Thesis Proposal

论文作者

Luo, Man

论文摘要

信息检索器(IR)旨在将相关文档(例如摘要,段落和文章)大规模查找。 IR在需要外部知识的许多任务中起着重要的作用。过去,基于术语匹配的搜索算法已被广泛使用。最近,基于神经的算法(称为神经捕犬)引起了更多的关注,可以减轻传统方法的局限性。不管神经检察官取得的成功如何,他们仍然面临许多挑战,例如患有少量培训数据,无法回答以简单的实体问题。此外,大多数现有的神经检索器都是用于纯文本查询的。这样可以防止他们处理多模式查询(即查询由文本描述和图像组成)。该提议有两个目标。首先,我们介绍了从三个角度,新的模型体系结构,面向IR的预处理任务以及生成大规模训练数据的方法来解决神经犬的上述问题。其次,我们确定未来的研究方向并提出潜在的相应解决方案。

Information Retriever (IR) aims to find the relevant documents (e.g. snippets, passages, and articles) to a given query at large scale. IR plays an important role in many tasks such as open domain question answering and dialogue systems, where external knowledge is needed. In the past, searching algorithms based on term matching have been widely used. Recently, neural-based algorithms (termed as neural retrievers) have gained more attention which can mitigate the limitations of traditional methods. Regardless of the success achieved by neural retrievers, they still face many challenges, e.g. suffering from a small amount of training data and failing to answer simple entity-centric questions. Furthermore, most of the existing neural retrievers are developed for pure-text query. This prevents them from handling multi-modality queries (i.e. the query is composed of textual description and images). This proposal has two goals. First, we introduce methods to address the abovementioned issues of neural retrievers from three angles, new model architectures, IR-oriented pretraining tasks, and generating large scale training data. Second, we identify the future research direction and propose potential corresponding solution.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源