使用合成语料库转移学习用于空间角色标签和推理

论文标题

使用合成语料库转移学习用于空间角色标签和推理

Transfer Learning with Synthetic Corpora for Spatial Role Labeling and Reasoning

论文作者

Mirzaee, Roshanak, Kordjamshidi, Parisa

论文摘要

最近的研究表明，合成数据作为监督的来源有助于验证的语言模型（PLM）将学习转移到新的目标任务/域。但是，对于空间语言而言，这个想法的探索较少。我们在多个空间语言处理任务上提供两个新的数据资源。第一个数据集是合成的，用于在空间问题回答（SQA）和空间角色标签（SPRL）上转移学习。与以前的SQA数据集相比，我们包括更多的空间关系类型和空间表达式。新的空间表达词典很容易扩展我们的数据生成过程。第二个是一个现实世界中的SQA数据集，其中具有人类生成的问题，该问题基于带有SPRL注释的现有语料库。该数据集可用于评估现实情况下的空间语言处理模型。我们显示使用自动生成的数据进行预读，可显着改善几个SQA和SPRL基准测试的SOTA结果，尤其是当目标域中的训练数据很小时。

Recent research shows synthetic data as a source of supervision helps pretrained language models (PLM) transfer learning to new target tasks/domains. However, this idea is less explored for spatial language. We provide two new data resources on multiple spatial language processing tasks. The first dataset is synthesized for transfer learning on spatial question answering (SQA) and spatial role labeling (SpRL). Compared to previous SQA datasets, we include a larger variety of spatial relation types and spatial expressions. Our data generation process is easily extendable with new spatial expression lexicons. The second one is a real-world SQA dataset with human-generated questions built on an existing corpus with SPRL annotations. This dataset can be used to evaluate spatial language processing models in realistic situations. We show pretraining with automatically generated data significantly improves the SOTA results on several SQA and SPRL benchmarks, particularly when the training data in the target domain is small.

下载PDF全文

下载文献需遵守相关版权规定

论文标题