论文标题
在大多数简单的从语言中,主题的语法提示是多余的
Grammatical cues to subjecthood are redundant in a majority of simple clauses across languages
论文作者
论文摘要
语法提示有时具有自然语言的单词含义是多余的。例如,英语单词顺序规则限制了句子的单词顺序,例如“狗咀嚼骨头”,即使可以从世界知识和合理性中推断出“狗”作为主题和“骨头”的状态。量化这种冗余的频率以及冗余水平如何在类型上多样化的语言中变化,可以阐明语法的功能和演变。为此,我们在英语和俄语中进行了行为实验,并进行了跨语言计算分析,以测量从语料库文本中提取的及物子句中语法提示的冗余性。从自然发生的句子中提取的主题,动词和对象(按随机顺序和形态标记)提出了英语和俄罗斯说话者(n = 484),并被要求确定哪个名词是行动的主题。两种语言的准确性都很高(英语约为89%,俄语为87%)。接下来,我们在类似的任务上训练了神经网络机器分类器:预测主题对象三合会中的哪个名义是主题。在来自八个语言家庭的30种语言中,性能始终很高:中位准确性为87%,与人类实验中观察到的准确性相当。结论是,语法提示(例如单词顺序)对于传达少数自然存在的及物子句中的主题和对象性是必要的。然而,他们(a)可以提供重要的冗余来源,并且(b)对于传达无法从单独单词中推断出的预期含义至关重要,其中包括对人类互动的描述,在这些含义中,这些角色通常是可逆的(例如,雷/lu/lu帮助雷帮助了雷),并表达了非概念性的含义(例如,bone the bone chew dog dog dog ew dog dog dog dog dog dog dog ew dog dog dog ew dog dog dog dog ew dog ew dog dog ew dog dog ew dog ew dog dog ew dog ew dog ew dog ew dog dog'')。
Grammatical cues are sometimes redundant with word meanings in natural language. For instance, English word order rules constrain the word order of a sentence like "The dog chewed the bone" even though the status of "dog" as subject and "bone" as object can be inferred from world knowledge and plausibility. Quantifying how often this redundancy occurs, and how the level of redundancy varies across typologically diverse languages, can shed light on the function and evolution of grammar. To that end, we performed a behavioral experiment in English and Russian and a cross-linguistic computational analysis measuring the redundancy of grammatical cues in transitive clauses extracted from corpus text. English and Russian speakers (n=484) were presented with subjects, verbs, and objects (in random order and with morphological markings removed) extracted from naturally occurring sentences and were asked to identify which noun is the subject of the action. Accuracy was high in both languages (~89% in English, ~87% in Russian). Next, we trained a neural network machine classifier on a similar task: predicting which nominal in a subject-verb-object triad is the subject. Across 30 languages from eight language families, performance was consistently high: a median accuracy of 87%, comparable to the accuracy observed in the human experiments. The conclusion is that grammatical cues such as word order are necessary to convey subjecthood and objecthood in a minority of naturally occurring transitive clauses; nevertheless, they can (a) provide an important source of redundancy and (b) are crucial for conveying intended meaning that cannot be inferred from the words alone, including descriptions of human interactions, where roles are often reversible (e.g., Ray helped Lu/Lu helped Ray), and expressing non-prototypical meanings (e.g., "The bone chewed the dog.").