论文标题
网络上的冷冻二项式:在线文本中的单词顺序和语言惯例
Frozen Binomials on the Web: Word Ordering and Language Conventions in Online Text
论文作者
论文摘要
我们在列表中写单词的顺序中捕获了固有的信息。二项式的顺序---由``and''或'或'或' - - 的两个单词的列表进行了一个多世纪的研究。这些二项式在许多言论领域都很常见,无论是正式和非正式文本。在上个世纪,已经给出了许多解释来描述人们在语义上的差异到语音学差异的顺序。这些规则主要描述了完全有序中存在的“冷冻”二项式,并且缺乏大规模的试验来确定功效。 在线文本提供了一个独特的机会,可以在非正式文本的背景下进行大规模研究这些列表。在这项工作中,我们将二项式的视图扩展到以定量方式对冷冻和非冷冻二项式进行大规模分析。然后,使用这些数据,我们证明了大多数先前提出的规则在预测二项式排序方面无效。通过在时间和社区之间跟踪这些二项式的顺序,我们可以建立这些预测中心的额外,未开发的维度。 扩展超出了单个二项式的问题,我们还探索了各个社区中二项式的全球结构,为这些列表建立了新的模型,并分析了非冻结和冷冻二项式的这种结构。此外,对三项官方的新分析 - 三个长度列表----表明在这些情况下,二项式分析都不适用。最后,我们演示了如何将从Web收集的大数据集与较旧理论结合使用以扩展和改进旧问题。
There is inherent information captured in the order in which we write words in a list. The orderings of binomials --- lists of two words separated by `and' or `or' --- has been studied for more than a century. These binomials are common across many areas of speech, in both formal and informal text. In the last century, numerous explanations have been given to describe what order people use for these binomials, from differences in semantics to differences in phonology. These rules describe primarily `frozen' binomials that exist in exactly one ordering and have lacked large-scale trials to determine efficacy. Online text provides a unique opportunity to study these lists in the context of informal text at a very large scale. In this work, we expand the view of binomials to include a large-scale analysis of both frozen and non-frozen binomials in a quantitative way. Using this data, we then demonstrate that most previously proposed rules are ineffective at predicting binomial ordering. By tracking the order of these binomials across time and communities we are able to establish additional, unexplored dimensions central to these predictions. Expanding beyond the question of individual binomials, we also explore the global structure of binomials in various communities, establishing a new model for these lists and analyzing this structure for non-frozen and frozen binomials. Additionally, novel analysis of trinomials --- lists of length three --- suggests that none of the binomials analysis applies in these cases. Finally, we demonstrate how large data sets gleaned from the web can be used in conjunction with older theories to expand and improve on old questions.