通过元定义的多模式融合

论文标题

通过元定义的多模式融合

Multi-Modal Fusion by Meta-Initialization

论文作者

Jackson, Matthew T., Malik, Shreshth A., Matthews, Michael T., Mohamed-Ahmed, Yousuf

论文摘要

当经验稀缺时，模型可能没有足够的信息来适应新任务。在这种情况下，辅助信息（例如对任务的文本描述）可以启用改进的任务推断和适应性。在这项工作中，我们提出了模型 - 不合时宜的元学习算法（MAML）的扩展，该算法允许模型使用辅助信息以及任务体验适应。我们的方法是元定位化（FUMI）的融合，使用超网络对辅助信息进行了模型初始化，而不是学习一个单一的任务无关初始化。此外，在现有多模式的几杆学习基准的缺点的推动下，我们构建了Inat-Anim-一个大规模的图像分类数据集，具有简洁且视觉上相关的文本类别描述。在INAT-ANIM上，FUMI在几杆状态下明显胜过诸如MAML之类的单模式基线。该项目的代码和INAT-ANIM的数据集探索工具可在https://github.com/s-a-a-malik/multi-few上公开获得。

When experience is scarce, models may have insufficient information to adapt to a new task. In this case, auxiliary information - such as a textual description of the task - can enable improved task inference and adaptation. In this work, we propose an extension to the Model-Agnostic Meta-Learning algorithm (MAML), which allows the model to adapt using auxiliary information as well as task experience. Our method, Fusion by Meta-Initialization (FuMI), conditions the model initialization on auxiliary information using a hypernetwork, rather than learning a single, task-agnostic initialization. Furthermore, motivated by the shortcomings of existing multi-modal few-shot learning benchmarks, we constructed iNat-Anim - a large-scale image classification dataset with succinct and visually pertinent textual class descriptions. On iNat-Anim, FuMI significantly outperforms uni-modal baselines such as MAML in the few-shot regime. The code for this project and a dataset exploration tool for iNat-Anim are publicly available at https://github.com/s-a-malik/multi-few .

下载PDF全文

下载文献需遵守相关版权规定

论文标题