论文标题

部分可观测时空混沌系统的无模型预测

Feature subset selection for Big Data via Chaotic Binary Differential Evolution under Apache Spark

论文作者

Vivek, Yelleti, Ravi, Vadlamani, Radhakrishna, P.

论文摘要

使用包装器方法的特征子集选择(FSS)本质上是一个组合优化问题,具有两个目标函数,即选定的 - 亚群的基数,应将其最小化,并且在ROC曲线(AUC)下的相应区域(AUC)被最大化。在这项研究中,我们提出了一种涉及基数和AUC的新型乘法单一目标函数。二进制差异进化(BDE)所涉及的随机性可能产生较少的解决方案,从而被困在局部最小值中。因此,我们将逻辑和帐篷混沌图嵌入BDE中,并将其命名为混沌二进制差异进化(CBDE)。在处理高维和大量数据集时,针对FSS设计可扩展的解决方案至关重要。因此,我们提出了一种基于可扩展的岛屿(IS)并行化方法,其中数据被分为多个分区/岛,从而单独演变,并最终合并为迁移策略。从经验上看,结果表明,所提出的平行混沌二进制差异进化(p-CBDE-IS)能够找到比平行双新差分进化(p-bde-is)更好的质量特征子集。逻辑回归(LR)由于其简单性和力量而被用作分类器。拟议的并行方法实现的加速表示重要性。

Feature subset selection (FSS) using a wrapper approach is essentially a combinatorial optimization problem having two objective functions namely cardinality of the selected-feature-subset, which should be minimized and the corresponding area under the ROC curve (AUC) to be maximized. In this research study, we propose a novel multiplicative single objective function involving cardinality and AUC. The randomness involved in the Binary Differential Evolution (BDE) may yield less diverse solutions thereby getting trapped in local minima. Hence, we embed Logistic and Tent chaotic maps into the BDE and named it as Chaotic Binary Differential Evolution (CBDE). Designing a scalable solution to the FSS is critical when dealing with high-dimensional and voluminous datasets. Hence, we propose a scalable island (iS) based parallelization approach where the data is divided into multiple partitions/islands thereby the solution evolves individually and gets combined eventually in a migration strategy. The results empirically show that the proposed parallel Chaotic Binary Differential Evolution (P-CBDE-iS) is able to find the better quality feature subsets than the Parallel Bi-nary Differential Evolution (P-BDE-iS). Logistic Regression (LR) is used as a classifier owing to its simplicity and power. The speedup attained by the proposed parallel approach signifies the importance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源