XL-WIC：用于评估语义上下文化的多语言基准

论文标题

XL-WIC：用于评估语义上下文化的多语言基准

XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization

论文作者

Raganato, Alessandro, Pasini, Tommaso, Camacho-Collados, Jose, Pilehvar, Mohammad Taher

论文摘要

正确模拟单词不同含义的能力对于语义表示技术的有效性至关重要。但是，用于评估此标准的大多数现有评估基准都与感知库存（通常是WordNet）相关，将其用法限制为一小部分基于知识的表示技术。文字上下文数据集（WIC）通过将标准的歧义任务重新设计为二进制分类问题来解决对感官库存的依赖；但是，它仅限于英语。我们提出了一个大型的多语言基准XL-WIC，其中包含来自不同语言家族的12种新语言的金标准，并具有不同程度的资源可用性，开放室的评估场景，例如零拍的跨语性转移。我们执行一系列实验，以确定数据集的可靠性，并为几个最近的上下文化多语言模型设置性能基准。实验结果表明，即使没有针对目标语言的标签实例，仅在英语数据上训练的模型也可以在区分单词的不同含义的任务中获得竞争性能，即使对于遥远的语言。 XL-WIC可从https://pilehvar.github.io/xlwic/获得。

The ability to correctly model distinct meanings of a word is crucial for the effectiveness of semantic representation techniques. However, most existing evaluation benchmarks for assessing this criterion are tied to sense inventories (usually WordNet), restricting their usage to a small subset of knowledge-based representation techniques. The Word-in-Context dataset (WiC) addresses the dependence on sense inventories by reformulating the standard disambiguation task as a binary classification problem; but, it is limited to the English language. We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages from varied language families and with different degrees of resource availability, opening room for evaluation scenarios such as zero-shot cross-lingual transfer. We perform a series of experiments to determine the reliability of the datasets and to set performance baselines for several recent contextualized multilingual models. Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance in the task of distinguishing different meanings of a word, even for distant languages. XL-WiC is available at https://pilehvar.github.io/xlwic/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题