论文标题
审查英语和波斯语中的核心分辨率
Review of coreference resolution in English and Persian
论文作者
论文摘要
核心分辨率(CR),确定指代相同现实世界实体的表达方式,是自然语言处理(NLP)的基本挑战。本文探讨了CR,跨越核心和Anaphora解决方案的最新进步。我们批判性地分析了促进CR研究的多样化语料库,强调了它们的优势,局限性和对各种任务的适用性。我们研究用于评估CR系统的评估指标的范围,强调其优势,缺点以及对更细微的特定任务指标的需求。追踪CR算法的演变,我们提供了方法论的详细概述,从基于规则的方法到最先进的深度学习体系结构。我们深入研究基于实体的,基于实体的群集,序列到序列和图形神经网络模型,从而阐明了他们在基准数据集上的理论基础和性能。认识到波斯CR的独特挑战,我们将重点分析用于这种资源不足的语言。我们检查了现有的波斯CR系统,并强调了利用Parsbert等预训练的语言模型的端到端神经模型的出现。这篇综述是研究人员和从业人员的重要资源,它为CR中最新的最新概述提供了全面的概述,确定了关键的挑战,并为这个迅速发展的领域中的未来研究绘制了课程。
Coreference resolution (CR), identifying expressions referring to the same real-world entity, is a fundamental challenge in natural language processing (NLP). This paper explores the latest advancements in CR, spanning coreference and anaphora resolution. We critically analyze the diverse corpora that have fueled CR research, highlighting their strengths, limitations, and suitability for various tasks. We examine the spectrum of evaluation metrics used to assess CR systems, emphasizing their advantages, disadvantages, and the need for more nuanced, task-specific metrics. Tracing the evolution of CR algorithms, we provide a detailed overview of methodologies, from rule-based approaches to cutting-edge deep learning architectures. We delve into mention-pair, entity-based, cluster-ranking, sequence-to-sequence, and graph neural network models, elucidating their theoretical foundations and performance on benchmark datasets. Recognizing the unique challenges of Persian CR, we dedicate a focused analysis to this under-resourced language. We examine existing Persian CR systems and highlight the emergence of end-to-end neural models leveraging pre-trained language models like ParsBERT. This review is an essential resource for researchers and practitioners, offering a comprehensive overview of the current state-of-the-art in CR, identifying key challenges, and charting a course for future research in this rapidly evolving field.