论文标题

评估基于主题检索的监督术语加权技术的行为和性能

Assessing the behavior and performance of a supervised term-weighting technique for topic-based retrieval

论文作者

Maisonnave, Mariano, Delbianco, Fernando, Tohmé, Fernando, Maguitman, Ana

论文摘要

本文分析和评估FDD \ b {eta},这是一种监督的术语加权方案,可用于基于主题的检索中的查询期选择。 FDD \ b {eta}基于代表术语相对于给定主题的描述性和歧视力的两个因素加权术语。然后,它通过使用可调节参数来结合这两个因素,该参数允许检索的不同方面,例如精确,召回或两者之间的平衡。本文做出了以下贡献:(1)它对FDD \ b {eta}的行为进行了广泛的分析,作为其可调参数的函数; (2)它将FDD \ b {eta}与18个传统和最先进的加权方案进行了比较; (3)它通过组合使用分析方法选择的术语来评估构建的分离查询的性能; (4)它引入了一个新的公共数据集,其新闻标记为与经济领域相关或无关紧要。分析和评估是在三个数据集上进行的:两个著名的文本数据集,即20个新闻组和路透社-21578,以及新发布的数据集。可以得出结论,尽管它很简单,但FDD ​​\ b {eta}与最新方法具有竞争力,并且具有在适应特定任务目标时提供灵活性的重要优势。结果还表明,FDD \ b {eta}提供了一种有用的机制来探索不同的方法来构建复杂的查询。

This article analyses and evaluates FDD\b{eta}, a supervised term-weighting scheme that can be applied for query-term selection in topic-based retrieval. FDD\b{eta} weights terms based on two factors representing the descriptive and discriminating power of the terms with respect to the given topic. It then combines these two factor through the use of an adjustable parameter that allows to favor different aspects of retrieval, such as precision, recall or a balance between both. The article makes the following contributions: (1) it presents an extensive analysis of the behavior of FDD\b{eta} as a function of its adjustable parameter; (2) it compares FDD\b{eta} against eighteen traditional and state-of-the-art weighting scheme; (3) it evaluates the performance of disjunctive queries built by combining terms selected using the analyzed methods; (4) it introduces a new public data set with news labeled as relevant or irrelevant to the economic domain. The analysis and evaluations are performed on three data sets: two well-known text data sets, namely 20 Newsgroups and Reuters-21578, and the newly released data set. It is possible to conclude that despite its simplicity, FDD\b{eta} is competitive with state-of-the-art methods and has the important advantage of offering flexibility at the moment of adapting to specific task goals. The results also demonstrate that FDD\b{eta} offers a useful mechanism to explore different approaches to build complex queries.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源