AUTOMSC：数学主题分类标签的自动分配

论文标题

AUTOMSC：数学主题分类标签的自动分配

AutoMSC: Automatic Assignment of Mathematics Subject Classification Labels

论文作者

Schubotz, Moritz, Scharpf, Philipp, Teschke, Olaf, Kuehnemund, Andreas, Breitinger, Corinna, Gipp, Bela

论文摘要

数学领域的研究论文的作者以及其他数学繁重的学科通常采用数学主题分类（MSC）方案来搜索相关文献。 MSC是一种层次字母分类方案，允许图书馆员指定一个或多个出版物的代码。数学图书馆以及审查服务（例如ZBMATH和数学评论）（MR）在其工作流程中依靠这些MSC标签来组织抽象和审查过程。特别是，粗粒分类决定了负责实际审查过程的主题编辑。在本文中，我们通过将问题作为多级分类机器学习任务来研究使用MSC方案自动分配粗粒的主要分类的可行性。我们发现我们的方法达到了超过77％的（F_1） - 得分，这非常接近ZBMATH和MR（（F_1） - 分数为81％）的一致性。此外，我们发现该方法的置信度得分允许与手动粗粒分类工作相比，将努力降低了86％，同时保持自动分类的文章的精度为81％。

Authors of research papers in the fields of mathematics, and other math-heavy disciplines commonly employ the Mathematics Subject Classification (MSC) scheme to search for relevant literature. The MSC is a hierarchical alphanumerical classification scheme that allows librarians to specify one or multiple codes for publications. Digital Libraries in Mathematics, as well as reviewing services, such as zbMATH and Mathematical Reviews (MR) rely on these MSC labels in their workflows to organize the abstracting and reviewing process. Especially, the coarse-grained classification determines the subject editor who is responsible for the actual reviewing process. In this paper, we investigate the feasibility of automatically assigning a coarse-grained primary classification using the MSC scheme, by regarding the problem as a multi-class classification machine learning task. We find that our method achieves an (F_1)-score of over 77%, which is remarkably close to the agreement of zbMATH and MR ((F_1)-score of 81%). Moreover, we find that the method's confidence score allows for reducing the effort by 86% compared to the manual coarse-grained classification effort while maintaining a precision of 81% for automatically classified articles.

下载PDF全文

下载文献需遵守相关版权规定

论文标题