使用最佳特征选择基于火花流的最佳特征选择（OFS-ULR）模型对LIFELOG数据的经验分析

论文标题

使用最佳特征选择基于火花流的最佳特征选择（OFS-ULR）模型对LIFELOG数据的经验分析

Empirical Analysis of Lifelog Data using Optimal Feature Selection based Unsupervised Logistic Regression (OFS-ULR) Model with Spark Streaming

论文作者

Tiwari, Sadhana, Agarwal, Sonali

论文摘要

普遍医疗保健监测系统领域的最新进展导致实时生成大量的Lifeog数据。慢性疾病是发展中国家和发达国家最严重的健康挑战之一。根据世卫组织的说法，这占所有死亡人数的73％和全球疾病负担的60％。慢性疾病分类模型现在正在利用LifElog数据的潜力来探索更好的医疗保健实践。本文是构建基于最佳特征选择的无监督逻辑回归模型（OFS-ULR），以对慢性疾病进行分类。由于生命值数据分析由于其敏感性至关重要。因此，常规分类模型显示出有限的性能。因此，设计新的分类器以使用LIFELOG数据进行慢性疾病分类是年龄的需求。构建良好模型的重要部分取决于数据集的预处理，识别重要功能，然后训练具有合适的超级参数的学习算法以提高性能。所提出的方法使用一系列步骤（i）删除冗余或无效的实例，（ii）将数据标记并将数据分配到类中，（iii）通过应用某些域知识或选择算法的启动环境（VES for Sprip cramper cormection confruniation），（iii）识别功能的合适子集，（III）识别功能的合适子集，（iii）识别功能的适当子集，则（iii）识别功能的适当子集，以获取模型的最佳结果（V）。为此，实验中使用了两次系列数据集来计算准确性，召回，精度和F1得分。与常规分类器相比，实验分析证明了所提出的方法的适用性，而我们新建的模型可以达到最高的准确性，并且都降低了培训的复杂性。

Recent advancement in the field of pervasive healthcare monitoring systems causes the generation of a huge amount of lifelog data in real-time. Chronic diseases are one of the most serious health challenges in developing and developed countries. According to WHO, this accounts for 73% of all deaths and 60% of the global burden of diseases. Chronic disease classification models are now harnessing the potential of lifelog data to explore better healthcare practices. This paper is to construct an optimal feature selection-based unsupervised logistic regression model (OFS-ULR) to classify chronic diseases. Since lifelog data analysis is crucial due to its sensitive nature; thus the conventional classification models show limited performance. Therefore, designing new classifiers for the classification of chronic diseases using lifelog data is the need of the age. The vital part of building a good model depends on pre-processing of the dataset, identifying important features, and then training a learning algorithm with suitable hyper parameters for better performance. The proposed approach improves the performance of existing methods using a series of steps such as (i) removing redundant or invalid instances, (ii) making the data labelled using clustering and partitioning the data into classes, (iii) identifying the suitable subset of features by applying either some domain knowledge or selection algorithm, (iv) hyper parameter tuning for models to get best results, and (v) performance evaluation using Spark streaming environment. For this purpose, two-time series datasets are used in the experiment to compute the accuracy, recall, precision, and f1-score. The experimental analysis proves the suitability of the proposed approach as compared to the conventional classifiers and our newly constructed model achieved highest accuracy and reduced training complexity among all among all.

下载PDF全文

下载文献需遵守相关版权规定

论文标题