论文标题
多模式数据集中的透视图
The Case for Perspective in Multimodal Datasets
论文作者
论文摘要
本文主张采用识别并代表多模式通信本质上透视的多模式数据集的注释实践。为了支持我们的主张,我们提出了一组注释实验,其中将Framenet注释应用于Multi30k和Flickr 30K实体数据集。我们评估从图片的注释和字幕的标题中得出的语义表示之间的余弦相似性。我们的发现表明:(i)以不同语言产生的同一图片标题之间的字幕框架语义相似性对标题是否是另一个字幕的翻译敏感,并且(ii)语义帧的图片注释对图像是否在存在标题的情况下注释图像对图像敏感。
This paper argues in favor of the adoption of annotation practices for multimodal datasets that recognize and represent the inherently perspectivized nature of multimodal communication. To support our claim, we present a set of annotation experiments in which FrameNet annotation is applied to the Multi30k and the Flickr 30k Entities datasets. We assess the cosine similarity between the semantic representations derived from the annotation of both pictures and captions for frames. Our findings indicate that: (i) frame semantic similarity between captions of the same picture produced in different languages is sensitive to whether the caption is a translation of another caption or not, and (ii) picture annotation for semantic frames is sensitive to whether the image is annotated in presence of a caption or not.