论文标题
用于WordNet的模糊算法,由数学证明支持
An Algorithm for Fuzzification of WordNets, Supported by a Mathematical Proof
论文作者
论文摘要
类似WordNet的词汇数据库(WLDS)将英语单词组成称为“ Synsets”的同义词集。尽管标准WLD在许多成功的文本挖掘应用程序中都使用,但它们的限制是,单词senses被认为代表与其相应的合成集相关的含义,这通常不是正确的。为了克服这一限制,已经提出了几种模糊版本的合成器。这些研究的一个普遍特征是,据我们所知,它们并不是要生产现有WLD的模糊版本,而是从Scratch中构建新的WLD,这限制了从文本挖掘社区获得的关注,许多资源和应用程序基于现有的WLD。在这项研究中,我们提出了一种用于构建任何语言WLD的模糊版本的算法,给定该语言的文档语料库和单词义的歧义(WSD)系统。然后,使用开放式美国民族 - 库普斯和UKB WSD作为算法输入,我们在线构建和发布了英语Wordnet(FWN)模糊版本。我们还提出了其结果有效性的理论(数学)证明。
WordNet-like Lexical Databases (WLDs) group English words into sets of synonyms called "synsets." Although the standard WLDs are being used in many successful Text-Mining applications, they have the limitation that word-senses are considered to represent the meaning associated to their corresponding synsets, to the same degree, which is not generally true. In order to overcome this limitation, several fuzzy versions of synsets have been proposed. A common trait of these studies is that, to the best of our knowledge, they do not aim to produce fuzzified versions of the existing WLD's, but build new WLDs from scratch, which has limited the attention received from the Text-Mining community, many of whose resources and applications are based on the existing WLDs. In this study, we present an algorithm for constructing fuzzy versions of WLDs of any language, given a corpus of documents and a word-sense disambiguation (WSD) system for that language. Then, using the Open-American-National-Corpus and UKB WSD as algorithm inputs, we construct and publish online the fuzzified version of English WordNet (FWN). We also propose a theoretical (mathematical) proof of the validity of its results.