论文标题
高温:NLP任务中资源和性能之间的权衡,具有超维度计算的启用n-gram统计数据的嵌入
HyperEmbed: Tradeoffs Between Resources and Performance in NLP Tasks with Hyperdimensional Computing enabled Embedding of n-gram Statistics
论文作者
论文摘要
深度学习的最新进展导致多个NLP任务的性能显着提高,但是,这些模型在计算方面越来越多。因此,本文解决了NLP任务的计算有效算法的领域。特别是,它研究了文本N-Gram统计数据的分布式表示形式。表示表示使用启用高维计算的嵌入形成。然后,这些表示形式用作标准分类器的输入。我们研究了使用九个分类器在一个大型和三个小标准数据集上嵌入的适用性。与传统的N-gram统计数据相比,在PAR F1分数上达到的嵌入,同时将时间和内存需求减少了几倍,例如,对于一个小数据集中的一个分类器之一,存储器的减少为6.18倍;火车和测试的速度分别为4.62和3.84次。对于大型数据集中的许多分类器,记忆降低为大约。 100次,火车和测试加速度超过100次。重要的是,通过高维计算形成的分布式表示形式的使用允许在表示的维度和n-gram大小之间剖析严格的依赖性,从而为折衷的空间打开了空间。
Recent advances in Deep Learning have led to a significant performance increase on several NLP tasks, however, the models become more and more computationally demanding. Therefore, this paper tackles the domain of computationally efficient algorithms for NLP tasks. In particular, it investigates distributed representations of n-gram statistics of texts. The representations are formed using hyperdimensional computing enabled embedding. These representations then serve as features, which are used as input to standard classifiers. We investigate the applicability of the embedding on one large and three small standard datasets for classification tasks using nine classifiers. The embedding achieved on par F1 scores while decreasing the time and memory requirements by several times compared to the conventional n-gram statistics, e.g., for one of the classifiers on a small dataset, the memory reduction was 6.18 times; while train and test speed-ups were 4.62 and 3.84 times, respectively. For many classifiers on the large dataset, memory reduction was ca. 100 times and train and test speed-ups were over 100 times. Importantly, the usage of distributed representations formed via hyperdimensional computing allows dissecting strict dependency between the dimensionality of the representation and n-gram size, thus, opening a room for tradeoffs.