Harbsafe-162。针对术语表示语义表示的内在评估的特定领域数据集

论文标题

Harbsafe-162。针对术语表示语义表示的内在评估的特定领域数据集

Harbsafe-162. A Domain-Specific Data Set for the Intrinsic Evaluation of Semantic Representations for Terminological Data

论文作者

Arndt, Susanne, Schnäpp, Dieter

论文摘要

本文介绍了Harbsafe-162，这是一个用于评估分布语义模型的特定领域数据集。它源自TechnischeUniversitätBraunschweig和德国电气，电子和信息技术委员会DIN和VDE，Harbsafe Project的合作。该项目的一个目的是将分布语义模型应用于术语条目，即，复杂的词汇数据包括至少一个或几个术语，项短语和定义。需要此应用来解决一个更复杂的问题：标准和标准机构的术语的协调（即解决双重和不一致的解决）。由于缺乏术语条目的评估数据集，因此Harbsafe-162的创建是朝着协调援助的必要步骤。 Harbsafe-162涵盖了功能安全性，IT安全性和可靠性领域中九个电力技术标准的数据。已经采用了一种以相似性评分任务形式的固有评估方法，其中两名语言学家和三个来自标准化的领域专家参与了。数据集用于评估已建立句子嵌入模型的特定实现。事实证明，对于特定领域的数据，该实施是令人满意的，因此项目可以提出进一步的协调援助实施。考虑到最近对内在评估方法的批评，本文以对Harbsafe-162的评估结束，并就相似性评级任务的性质进行了更一般的讨论。 Harbsafe-162已为社区提供。

The article presents Harbsafe-162, a domain-specific data set for evaluating distributional semantic models. It originates from a cooperation by Technische Universität Braunschweig and the German Commission for Electrical, Electronic & Information Technologies of DIN and VDE, the Harbsafe project. One objective of the project is to apply distributional semantic models to terminological entries, that is, complex lexical data comprising of at least one or several terms, term phrases and a definition. This application is needed to solve a more complex problem: the harmonization of terminologies of standards and standards bodies (i.e. resolution of doublettes and inconsistencies). Due to a lack of evaluation data sets for terminological entries, the creation of Harbsafe-162 was a necessary step towards harmonization assistance. Harbsafe-162 covers data from nine electrotechnical standards in the domain of functional safety, IT security, and dependability. An intrinsic evaluation method in the form of a similarity rating task has been applied in which two linguists and three domain experts from standardization participated. The data set is used to evaluate a specific implementation of an established sentence embedding model. This implementation proves to be satisfactory for the domain-specific data so that further implementations for harmonization assistance may be brought forward by the project. Considering recent criticism on intrinsic evaluation methods, the article concludes with an evaluation of Harbsafe-162 and joins a more general discussion about the nature of similarity rating tasks. Harbsafe-162 has been made available for the community.

下载PDF全文

下载文献需遵守相关版权规定

论文标题