论文标题

用于处理计算社会科学的处理访谈数据的文本挖掘

Text Mining for Processing Interview Data in Computational Social Science

论文作者

Karlgren, Jussi, Li, Renee, Milgrom, Eva M Meyersson

论文摘要

我们使用市售的文本分析技术来处理计算社会科学研究的访谈文本数据。我们发现,局部聚类和术语富集提供了方便的探索和量化响应。这使得可以生成和检验假设并比较文本和非文本变量,并节省分析师的工作。我们鼓励社会科学研究使用文本分析,特别是用于探索性开放式研究。我们讨论文本分析技术如何满足可复制性要求。我们注意到,最新的学习模型并非考虑到透明度,该研究要求一个模型可以编辑及其决策以解释。当今可用的工具,例如本研究中使用的工具,并非用于处理访谈文本。尽管正在考虑的许多变量使用词汇统计数据进行量化,但我们发现目前难以可靠地自动化一些有趣且潜在的有价值的特征。我们注意到,在该应用领域中,传统的自然语言处理机制(例如命名实体识别和Anaphora解决方案)有一​​些潜在有趣的应用。最后,我们对语言技术人员的建议进行了建议,以全面研究处理访谈数据的挑战,尤其是问题与回答之间的相互作用,我们鼓励社会科学研究人员不要犹豫使用文本分析工具,尤其是在处理访谈数据的探索性阶段。

We use commercially available text analysis technology to process interview text data from a computational social science study. We find that topical clustering and terminological enrichment provide for convenient exploration and quantification of the responses. This makes it possible to generate and test hypotheses and to compare textual and non-textual variables, and saves analyst effort. We encourage studies in social science to use text analysis, especially for exploratory open-ended studies. We discuss how replicability requirements are met by text analysis technology. We note that the most recent learning models are not designed with transparency in mind, and that research requires a model to be editable and its decisions to be explainable. The tools available today, such as the one used in the present study, are not built for processing interview texts. While many of the variables under consideration are quantifiable using lexical statistics, we find that some interesting and potentially valuable features are difficult or impossible to automatise reliably at present. We note that there are some potentially interesting applications for traditional natural language processing mechanisms such as named entity recognition and anaphora resolution in this application area. We conclude with a suggestion for language technologists to investigate the challenge of processing interview data comprehensively, especially the interplay between question and response, and we encourage social science researchers not to hesitate to use text analysis tools, especially for the exploratory phase of processing interview data.?

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源