论文标题
学习音乐可以帮助您阅读:在语言模型中使用转移来学习语言结构
Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models
论文作者
论文摘要
我们建议转移学习作为分析神经语言模型中语法结构编码的方法。我们在非语言数据上培训LSTMS,并评估其在自然语言上的表现,以评估哪些数据诱导LSTMS可以用于自然语言的可概括结构特征。我们发现,尽管表面形式或词汇尚无重叠,但对具有潜在结构(MIDI音乐或Java代码)的非语言数据培训可提高自然语言的测试性能。为了确定模型可能正在编码以导致这种改进的抽象结构的种类,我们使用两种人造括号语言进行了类似的实验:一种具有层次递归结构,并且具有配对令牌但没有递归的控件。出乎意料的是,在对自然语言进行测试时,培训这两种人造语言的模型都会带来同样的增长。关于控制词汇重叠的自然语言之间转移的进一步实验表明,测试语言上的零击性能与训练语言的类型学句法相似性高度相关,这表明预训练引起的表示与跨语言的句法特性相对应。我们的结果提供了有关神经模型代表抽象句法结构的方式以及允许自然语言获取的结构归纳偏见的方式的见解。
We propose transfer learning as a method for analyzing the encoding of grammatical structure in neural language models. We train LSTMs on non-linguistic data and evaluate their performance on natural language to assess which kinds of data induce generalizable structural features that LSTMs can use for natural language. We find that training on non-linguistic data with latent structure (MIDI music or Java code) improves test performance on natural language, despite no overlap in surface form or vocabulary. To pinpoint the kinds of abstract structure that models may be encoding to lead to this improvement, we run similar experiments with two artificial parentheses languages: one which has a hierarchical recursive structure, and a control which has paired tokens but no recursion. Surprisingly, training a model on either of these artificial languages leads to the same substantial gains when testing on natural language. Further experiments on transfer between natural languages controlling for vocabulary overlap show that zero-shot performance on a test language is highly correlated with typological syntactic similarity to the training language, suggesting that representations induced by pre-training correspond to the cross-linguistic syntactic properties. Our results provide insights into the ways that neural models represent abstract syntactic structure, and also about the kind of structural inductive biases which allow for natural language acquisition.