学习排名在类条件标签下排名的噪声容忍度

论文标题

学习排名在类条件标签下排名的噪声容忍度

Noise tolerance of learning to rank under class-conditional label noise

论文作者

Haddad, Dany

论文摘要

通常，用于训练排名模型的数据可能会受到标签噪声。例如，在Web搜索中，由于ClickStream数据创建的标签是嘈杂的，这是因为诸如SERP上的项目描述中的信息不足，用户的查询重新重新重新构成以及不稳定的或意外的用户行为。在实践中，很难处理标签噪声而不对标签生成过程做出强有力的假设。结果，从业人员通常会直接在此嘈杂的数据上训练他们的学习级（LTR）模型，而无需考虑标签噪声。令人惊讶的是，我们经常看到以这种方式训练的LTR模型的出色表现。在这项工作中，我们描述了一类耐噪声的LTR损失，即使在类条件标签噪声的背景下，经验风险最小化也是一致的程序。我们还开发了常用损失函数的耐噪声类似物。实验结果进一步支持了我们理论发现的实际意义。

Often, the data used to train ranking models is subject to label noise. For example, in web-search, labels created from clickstream data are noisy due to issues such as insufficient information in item descriptions on the SERP, query reformulation by the user, and erratic or unexpected user behavior. In practice, it is difficult to handle label noise without making strong assumptions about the label generation process. As a result, practitioners typically train their learning-to-rank (LtR) models directly on this noisy data without additional consideration of the label noise. Surprisingly, we often see strong performance from LtR models trained in this way. In this work, we describe a class of noise-tolerant LtR losses for which empirical risk minimization is a consistent procedure, even in the context of class-conditional label noise. We also develop noise-tolerant analogs of commonly used loss functions. The practical implications of our theoretical findings are further supported by experimental results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题