论文标题
文本中的位置参考识别:调查和比较
Location reference recognition from texts: A survey and comparison
论文作者
论文摘要
非结构化的文本中存在大量的位置信息,例如社交媒体帖子,新闻报道,科学文章,网页,旅行博客和历史档案。地理学是指识别文本中的位置参考并识别其地理空间表示的过程。虽然地理标准可以使许多领域受益,但仍缺少特定应用程序的摘要。此外,缺乏对位置参考识别方法的现有方法的全面审查和比较,这是地理验证的第一个和核心步骤。为了填补这些研究空白,本综述首先总结了七个典型的应用领域:地理信息检索,灾难管理,疾病监视,交通管理,空间人文,旅游业管理和犯罪管理。然后,我们根据其基本功能原理将这些方法分为四组,以回顾现有的位置参考识别方法:基于规则的,基于统计学的基于统计学习和混合方法的方法。接下来,我们根据26个公共数据集(例如,具有不同类型的文本(例如社交媒体帖子和新闻报道),包含39,736个位置参考的27种公共数据集(例如,在26个公共数据集中,我们彻底评估了27种最广泛使用的位置参考识别方法的正确性和计算效率。这项彻底评估的结果可以帮助未来的方法论发展,以获取位置参考识别,并可以根据应用需求指导选择适当方法的选择。
A vast amount of location information exists in unstructured texts, such as social media posts, news stories, scientific articles, web pages, travel blogs, and historical archives. Geoparsing refers to the process of recognizing location references from texts and identifying their geospatial representations. While geoparsing can benefit many domains, a summary of the specific applications is still missing. Further, there lacks a comprehensive review and comparison of existing approaches for location reference recognition, which is the first and a core step of geoparsing. To fill these research gaps, this review first summarizes seven typical application domains of geoparsing: geographic information retrieval, disaster management, disease surveillance, traffic management, spatial humanities, tourism management, and crime management. We then review existing approaches for location reference recognition by categorizing these approaches into four groups based on their underlying functional principle: rule-based, gazetteer matching-based, statistical learning-based, and hybrid approaches. Next, we thoroughly evaluate the correctness and computational efficiency of the 27 most widely used approaches for location reference recognition based on 26 public datasets with different types of texts (e.g., social media posts and news stories) containing 39,736 location references across the world. Results from this thorough evaluation can help inform future methodological developments for location reference recognition, and can help guide the selection of proper approaches based on application needs.