克：用于基于内容的协作过滤的预培训的语言模型快速微调

论文标题

克：用于基于内容的协作过滤的预培训的语言模型快速微调

GRAM: Fast Fine-tuning of Pre-trained Language Models for Content-based Collaborative Filtering

论文作者

Yang, Yoonseok, Kim, Kyu Seok, Kim, Minsam, Park, Juneyoung

论文摘要

基于内容的协作过滤（CCF）可以根据用户的交互历史记录和项目内容信息来预测用户项目的交互。最近，预训练的语言模型（PLM）已用于提取CCF的高质量项目编码。但是，以端到端（E2E）方式训练基于PLM的CCF模型是资源密集型的，因为优化涉及通过在给定的用户交互序列中编码的每个内容进行反向传播。为了解决此问题，我们提出了革兰氏（CCF中多模式的梯度积累），这利用了一个事实，即给定项目通常在交互历史的一批中多次出现。具体而言，单步革兰氏汇总每个项目编码的梯度以进行后传播，并具有与标准E2E训练的理论等效性。作为单步克的扩展，我们提出了多步骤革兰氏的建议，从而增加了梯度更新延迟，并以较少的GPU内存实现了进一步的加速。克从两个知识跟踪和新闻建议的两个任务领域的五个数据集上显着提高了培训效率（最高146倍）。我们的代码可在https://github.com/yoonseok312/gram上找到。

Content-based collaborative filtering (CCF) predicts user-item interactions based on both users' interaction history and items' content information. Recently, pre-trained language models (PLM) have been used to extract high-quality item encodings for CCF. However, it is resource-intensive to train a PLM-based CCF model in an end-to-end (E2E) manner, since optimization involves back-propagating through every content encoding within a given user interaction sequence. To tackle this issue, we propose GRAM (GRadient Accumulation for Multi-modality in CCF), which exploits the fact that a given item often appears multiple times within a batch of interaction histories. Specifically, Single-step GRAM aggregates each item encoding's gradients for back-propagation, with theoretic equivalence to the standard E2E training. As an extension of Single-step GRAM, we propose Multi-step GRAM, which increases the gradient update latency, achieving a further speedup with drastically less GPU memory. GRAM significantly improves training efficiency (up to 146x) on five datasets from two task domains of Knowledge Tracing and News Recommendation. Our code is available at https://github.com/yoonseok312/GRAM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题