论文标题

自然语言技术和查询扩展:问题,最新和观点

Natural language technology and query expansion: issues, state-of-the-art and perspectives

论文作者

Selvaretnam, Bhawani, Belkhatir, Mohammed

论文摘要

大量知识来源的可用性刺激了信息检索技术的开发和增强。用户信息需求以自然语言表示,成功的检索很大程度上取决于预期目的的有效交流。自然语言查询由多种语言特征组成,这些特征用于表示预期的搜索目标。引起语义歧义和误解查询的语言特征以及其他因素,例如缺乏对搜索环境的熟悉程度,会影响用户准确地表示其信息需求的能力,这是由概念意图差距所创造的。后者直接影响返回的搜索结果的相关性,这可能对用户满意,因此是影响信息检索系统有效性的主要问题。我们讨论的核心是通过添加有意义的术语,短语甚至潜在表示,可以手动或自动自动捕获其预期的含义,从而识别出查询意图的重要组成部分及其丰富。具体来说,我们讨论了实现丰富的技术,尤其是那些利用从文档语料库中的术语依赖项或外部知识源(例如本体论)中收集的信息收集的信息的技术。我们放下了基于通用语言的查询扩展框架的解剖结构,并提出了基于模块的分解,涵盖了查询处理,信息检索,计算语言学和本体论工程的主题问题。对于每个模块,我们回顾了根据所使用的技术进行分类和分析的文献中最先进的解决方案。

The availability of an abundance of knowledge sources has spurred a large amount of effort in the development and enhancement of Information Retrieval techniques. Users information needs are expressed in natural language and successful retrieval is very much dependent on the effective communication of the intended purpose. Natural language queries consist of multiple linguistic features which serve to represent the intended search goal. Linguistic characteristics that cause semantic ambiguity and misinterpretation of queries as well as additional factors such as the lack of familiarity with the search environment affect the users ability to accurately represent their information needs, coined by the concept intention gap. The latter directly affects the relevance of the returned search results which may not be to the users satisfaction and therefore is a major issue impacting the effectiveness of information retrieval systems. Central to our discussion is the identification of the significant constituents that characterize the query intent and their enrichment through the addition of meaningful terms, phrases or even latent representations, either manually or automatically to capture their intended meaning. Specifically, we discuss techniques to achieve the enrichment and in particular those utilizing the information gathered from statistical processing of term dependencies within a document corpus or from external knowledge sources such as ontologies. We lay down the anatomy of a generic linguistic based query expansion framework and propose its module-based decomposition, covering topical issues from query processing, information retrieval, computational linguistics and ontology engineering. For each of the modules we review state-of-the-art solutions in the literature categorized and analyzed under the light of the techniques used.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源