从临床笔记表示学习中，自动编码器适应自动编码器以减少稀疏性

论文标题

从临床笔记表示学习中，自动编码器适应自动编码器以减少稀疏性

Adaptation of Autoencoder for Sparsity Reduction From Clinical Notes Representation Learning

论文作者

Le, Thanh-Dung, Noumeir, Rita, Rambaud, Jerome, Sans, Guillaume, Jouvet, Philippe

论文摘要

在处理小型数据集上的临床文本分类时，最近的研究证实，经过良好调整的多层感知器的表现优于其他生成分类器，包括深度学习。为了提高神经网络分类器的性能，可以有效地使用学习表征的功能选择。但是，大多数特征选择方法仅估计变量之间的线性依赖性程度，并根据单变量统计测试选择最佳特征。此外，学习表示所涉及的特征空间的稀疏性被忽略了。目标：因此，我们的目的是通过压缩临床代表性空间来访问一种替代方法来解决稀疏性，在这种情况下，法国临床记录也可以有效地处理有限的法国临床笔记。方法：本研究提出了一种自动编码器学习算法来利用临床注释表示的稀疏性。动机是通过降低临床音符表示特征空间的维度来确定如何压缩稀疏的高维数据。然后在受过训练和压缩的特征空间中评估分类器的分类性能。结果：拟议的方法为每种评估提供了高达3％的总体绩效增长。最后，在检测患者病情时，分类器的精度达到了92％的精度，91％的召回，91％的精度和91％的F1得分。此外，通过应用理论信息瓶颈框架来证明压缩工作机制和自动编码器预测过程。

When dealing with clinical text classification on a small dataset recent studies have confirmed that a well-tuned multilayer perceptron outperforms other generative classifiers, including deep learning ones. To increase the performance of the neural network classifier, feature selection for the learning representation can effectively be used. However, most feature selection methods only estimate the degree of linear dependency between variables and select the best features based on univariate statistical tests. Furthermore, the sparsity of the feature space involved in the learning representation is ignored. Goal: Our aim is therefore to access an alternative approach to tackle the sparsity by compressing the clinical representation feature space, where limited French clinical notes can also be dealt with effectively. Methods: This study proposed an autoencoder learning algorithm to take advantage of sparsity reduction in clinical note representation. The motivation was to determine how to compress sparse, high-dimensional data by reducing the dimension of the clinical note representation feature space. The classification performance of the classifiers was then evaluated in the trained and compressed feature space. Results: The proposed approach provided overall performance gains of up to 3% for each evaluation. Finally, the classifier achieved a 92% accuracy, 91% recall, 91% precision, and 91% f1-score in detecting the patient's condition. Furthermore, the compression working mechanism and the autoencoder prediction process were demonstrated by applying the theoretic information bottleneck framework.

下载PDF全文

下载文献需遵守相关版权规定

论文标题