论文标题
这就是名字:推断宗教的基于角色的方法
It's All in the Name: A Character Based Approach To Infer Religion
论文作者
论文摘要
在过去的十年中,来自文本的人口推断在自然语言处理领域引起了人们的关注。在本文中,我们使用个人名称来推断南亚的宗教 - 宗教是一个很大的社会划分,但对其进行分解数据仍然很少。现有工作使用基于字典的方法预测宗教,因此无法对看不见的名称进行分类。我们使用基于角色的模型来学习角色模式,因此可以以高精度对看不见的名称进行分类。这些模型也更快,可以轻松地缩放到大型数据集。我们通过将个人的名称与父母/配偶的名称相结合,并达到极高的准确性来改善分类器。最后,我们使用层次相关性传播来追踪卷积神经网络模型的分类决策,这可以解释复杂的非线性分类器的预测并规避其所谓的黑匣子性质。我们展示了分类器学到的字符模式如何植根于名称的语言起源。
Demographic inference from text has received a surge of attention in the field of natural language processing in the last decade. In this paper, we use personal names to infer religion in South Asia - where religion is a salient social division, and yet, disaggregated data on it remains scarce. Existing work predicts religion using dictionary based method, and therefore, can not classify unseen names. We use character based models which learn character patterns and, therefore, can classify unseen names as well with high accuracy. These models are also much faster and can easily be scaled to large data sets. We improve our classifier by combining the name of an individual with that of their parent/spouse and achieve remarkably high accuracy. Finally, we trace the classification decisions of a convolutional neural network model using layer-wise relevance propagation which can explain the predictions of complex non-linear classifiers and circumvent their purported black box nature. We show how character patterns learned by the classifier are rooted in the linguistic origins of names.