论文标题
Magneto:一种有效的深度学习方法,用于提取标签摘要问题
MAGNeto: An Efficient Deep Learning Method for the Extractive Tags Summarization Problem
论文作者
论文摘要
在这项工作中,我们研究了一个名为“提取标签摘要”(ETS)的新图像注释任务。目标是从图像中的上下文及其相应标签中提取重要标签。我们调整了一些最新的深度学习模型,以同时使用视觉和文本信息。我们提出的解决方案由不同的广泛使用的块(例如卷积和自我注意层)组成,以及结合辅助损失函数的新思想以及胶合和提升这些基本组件并形成统一体系结构的门控机制。此外,我们引入了损失功能,旨在减少培训数据的失衡,并采用一种简单但有效的数据增强技术,该技术致力于减轻异常值对最终结果的影响。最后但并非最不重要的一点是,我们探索了一种无监督的预训练策略,以利用大量可用的未标记数据来进一步提高模型的性能。我们的模型显示了良好的结果,为90%$ f_ \ text {1} $在公共范围内的基准上得分,在一个嘈杂的大规模现实世界中的私人数据集中获得50%$ f_ \ text {1} $得分。复制实验的源代码可在以下网址公开获得:https://github.com/pixta-dev/labteam
In this work, we study a new image annotation task named Extractive Tags Summarization (ETS). The goal is to extract important tags from the context lying in an image and its corresponding tags. We adjust some state-of-the-art deep learning models to utilize both visual and textual information. Our proposed solution consists of different widely used blocks like convolutional and self-attention layers, together with a novel idea of combining auxiliary loss functions and the gating mechanism to glue and elevate these fundamental components and form a unified architecture. Besides, we introduce a loss function that aims to reduce the imbalance of the training data and a simple but effective data augmentation technique dedicated to alleviates the effect of outliers on the final results. Last but not least, we explore an unsupervised pre-training strategy to further boost the performance of the model by making use of the abundant amount of available unlabeled data. Our model shows the good results as 90% $F_\text{1}$ score on the public NUS-WIDE benchmark, and 50% $F_\text{1}$ score on a noisy large-scale real-world private dataset. Source code for reproducing the experiments is publicly available at: https://github.com/pixta-dev/labteam