日语文本和相似性的组成评估

论文标题

日语文本和相似性的组成评估

Compositional Evaluation on Japanese Textual Entailment and Similarity

论文作者

Yanaka, Hitomi, Mineshima, Koji

论文摘要

自然语言推论（NLI）和语义文本相似性（STS）是广泛使用的基准任务，用于对预训练的语言模型进行组成评估。尽管对语言普遍性的兴趣越来越大，但大多数NLI/STS研究几乎完全集中在英语上。特别是，日语中没有可用的多语言NLI/STS数据集，这在类型上与英语不同，并且可以阐明语言模型当前有争议的行为，例如对单词顺序和案例粒子的敏感性。在此背景下，我们介绍了日本NLI/STS数据集JSick，该数据集是由英语数据集病手动翻译的。我们还提出了一个用于组成推断的应力测试数据集，该数据集是通过转换JSick中句子的句法结构来研究语言模型是否对单词顺序和案例粒子敏感的。我们对不同的预训练语言模型进行了基线实验，并比较应用于日语和其他语言时多语言模型的性能。应力测试实验的结果表明，当前的预训练的语言模型对单词顺序和案例标记不敏感。

Natural Language Inference (NLI) and Semantic Textual Similarity (STS) are widely used benchmark tasks for compositional evaluation of pre-trained language models. Despite growing interest in linguistic universals, most NLI/STS studies have focused almost exclusively on English. In particular, there are no available multilingual NLI/STS datasets in Japanese, which is typologically different from English and can shed light on the currently controversial behavior of language models in matters such as sensitivity to word order and case particles. Against this background, we introduce JSICK, a Japanese NLI/STS dataset that was manually translated from the English dataset SICK. We also present a stress-test dataset for compositional inference, created by transforming syntactic structures of sentences in JSICK to investigate whether language models are sensitive to word order and case particles. We conduct baseline experiments on different pre-trained language models and compare the performance of multilingual models when applied to Japanese and other languages. The results of the stress-test experiments suggest that the current pre-trained language models are insensitive to word order and case marking.

下载PDF全文

下载文献需遵守相关版权规定

论文标题