论文标题

TMVAR 3.0:改进的变体概念识别和标准化工具

tmVar 3.0: an improved variant concept recognition and normalization tool

论文作者

Wei, Chih-Hsuan, Allot, Alexis, Riehle, Kevin, Milosavljevic, Aleksandar, Lu, Zhiyong

论文摘要

先前的研究表明,自动化文本工具对于大规模科学文献中成功解锁变体信息变得越来越重要。尽管过去有多次尝试,但现有工具仍然具有有限的识别范围和精度。我们提出了TMVAR 3.0:改进的变体识别和归一化工具。与其前任相比,TMVAR 3.0能够识别广泛的相关实体(例如等位基因和拷贝数变体),并分组在文章中属于同一概念的不同变体提及,以提高准确性。此外,TMVAR3提供了其他变体标准化选项,例如Clingen等位基因注册表的等位基因特异性标识符。当在三个独立的基准数据集上评估时,TMVAR3表现出最先进的性能,在变体识别和归一化方面具有超过90%的精度。 TMVAR3可自由下载。我们还使用TMVAR3处理了整个PubMed和PMC,并在FTP上发布了注释。可用性:ftp://ftp.ncbi.nlm.nih.gov/pub/lu/tmvar3

Previous studies have shown that automated text-mining tools are becoming increasingly important for successfully unlocking variant information in scientific literature at large scale. Despite multiple attempts in the past, existing tools are still of limited recognition scope and precision. We propose tmVar 3.0: an improved variant recognition and normalization tool. Compared to its predecessors, tmVar 3.0 is able to recognize a wide spectrum of variant related entities (e.g., allele and copy number variants), and to group different variant mentions belonging to the same concept in an article for improved accuracy. Moreover, tmVar3 provides additional variant normalization options such as allele-specific identifiers from the ClinGen Allele Registry. tmVar3 exhibits a state-of-the-art performance with over 90% accuracy in F-measure in variant recognition and normalization, when evaluated on three independent benchmarking datasets. tmVar3 is freely available for download. We have also processed the entire PubMed and PMC with tmVar3 and released its annotations on our FTP. Availability: ftp://ftp.ncbi.nlm.nih.gov/pub/lu/tmVar3

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源