多标签分类的在线公制学习

论文标题

多标签分类的在线公制学习

Online Metric Learning for Multi-Label Classification

论文作者

Gong, Xiuwen, Yang, Jiahui, Yuan, Dong, Bao, Wei

论文摘要

现有的在线多标签分类的研究，例如在线顺序多标签极限学习机（OSML-ELM）和随机梯度下降（SGD），已经达到了有希望的性能。但是，这些作品不会考虑标签依赖性，并且缺乏对损失函数的理论分析。因此，我们为多标签分类提出了一种新颖的在线度量学习范式，以填补当前的研究差距。通常，我们首先提出了一个用于多标签分类的新指标，该指标基于$ k $ neart的邻居（$ k $ nn），并结合了较大的利润原则。然后，我们将其调整到在线结束中，以得出我们的模型，该模型以更高的速度在线处理大量的流游数据。具体来说，为了学习新的基于$ k $ nn的公制，我们首先将培训数据集中的实例投入到标签空间中，这使得在相同维度的实例和标签进行比较。之后，我们同时将它们投影到一个新的较低维空间中，这使我们能够在实例和标签之间提取依赖关系的结构。最后，我们利用较大的利润率和$ k $ nn原理来通过有效的优化算法学习指标。此外，我们对我们方法的累积损失的上限提供了理论分析。许多基准多标签数据集的全面实验验证了我们的理论方法，并说明了我们提出的在线度量学习（OML）算法的表现优于最先进的方法。

Existing research into online multi-label classification, such as online sequential multi-label extreme learning machine (OSML-ELM) and stochastic gradient descent (SGD), has achieved promising performance. However, these works do not take label dependencies into consideration and lack a theoretical analysis of loss functions. Accordingly, we propose a novel online metric learning paradigm for multi-label classification to fill the current research gap. Generally, we first propose a new metric for multi-label classification which is based on $k$-Nearest Neighbour ($k$NN) and combined with large margin principle. Then, we adapt it to the online settting to derive our model which deals with massive volume ofstreaming data at a higher speed online. Specifically, in order to learn the new $k$NN-based metric, we first project instances in the training dataset into the label space, which make it possible for the comparisons of instances and labels in the same dimension. After that, we project both of them into a new lower dimension space simultaneously, which enables us to extract the structure of dependencies between instances and labels. Finally, we leverage the large margin and $k$NN principle to learn the metric with an efficient optimization algorithm. Moreover, we provide theoretical analysis on the upper bound of the cumulative loss for our method. Comprehensive experiments on a number of benchmark multi-label datasets validate our theoretical approach and illustrate that our proposed online metric learning (OML) algorithm outperforms state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题