论文标题

Elias:端到端学习索引和搜索大型输出空间

ELIAS: End-to-End Learning to Index and Search in Large Output Spaces

论文作者

Gupta, Nilesh, Chen, Patrick H., Yu, Hsiang-Fu, Hsieh, Cho-Jui, Dhillon, Inderjit S

论文摘要

极端多标签分类(XMC)是一个流行的框架,用于解决许多现实世界中的问题,这些问题需要从大量潜在的输出选择中进行准确的预测。处理大型标签空间的一种流行方法是将标签排列到基于浅的树的索引中,然后学习ML模型以通过Beam搜索有效地搜索此索引。现有方法通过将标签空间聚集到基于预定义的特征中的几个相互排斥的群集中来初始化树索引,并在整个训练过程中保持固定。这种方法导致标签空间上的次级索引结构,并将搜索性能限制为索引初始化期间选择的质量。在本文中,我们提出了一种新颖的方法Elias,该方法将基于树的基于树的索引放松到一个基于加权图的专业索引,该指数以最终的任务目标端到端学习。更具体地说,Elias将现有基于树的索引中的离散集群到标签分配建模为软可学习参数,这些参数与其余ML模型共同学习。 Elias在几个具有数百万个标签的大规模极端分类基准上实现了最先进的性能。特别是,Elias在1上的Precision@1可以好高达2.5%,而回忆@100比现有XMC方法高4%。 Elias的Pytorch实施以及其他资源可在https://github.com/nilesh2797/elias上获得。

Extreme multi-label classification (XMC) is a popular framework for solving many real-world problems that require accurate prediction from a very large number of potential output choices. A popular approach for dealing with the large label space is to arrange the labels into a shallow tree-based index and then learn an ML model to efficiently search this index via beam search. Existing methods initialize the tree index by clustering the label space into a few mutually exclusive clusters based on pre-defined features and keep it fixed throughout the training procedure. This approach results in a sub-optimal indexing structure over the label space and limits the search performance to the quality of choices made during the initialization of the index. In this paper, we propose a novel method ELIAS which relaxes the tree-based index to a specialized weighted graph-based index which is learned end-to-end with the final task objective. More specifically, ELIAS models the discrete cluster-to-label assignments in the existing tree-based index as soft learnable parameters that are learned jointly with the rest of the ML model. ELIAS achieves state-of-the-art performance on several large-scale extreme classification benchmarks with millions of labels. In particular, ELIAS can be up to 2.5% better at precision@1 and up to 4% better at recall@100 than existing XMC methods. A PyTorch implementation of ELIAS along with other resources is available at https://github.com/nilesh2797/ELIAS.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源