Applied Soft Computing 2024 | 信息素引导的粗糙超立方体并行属性约简算法

发布时间：2024-05-14 阅读次数：26 分享至：

Pheromone-guided parallel rough hypercuboid attribute reduction algorithm

Weiping Ding *, Hongcheng Yao , Hengrong Ju , Jiashuang Huang , Shu Jiang , Yuepeng Chen

MOTIVATION

In the era of big data, handling high-dimensional datasets is becoming increasingly critical for data mining and knowledge discovery. Traditional attribute reduction algorithms struggle with scalability and computational efficiency, particularly when dealing with large datasets containing uncertainty or noise. These limitations necessitate more efficient methods to extract essential information while reducing irrelevant attributes.
Conventional heuristic search strategies, such as forward or backward search, often suffer from convergence to local optima, especially in high-dimensional datasets. These methods fail to consistently find the global optimal attribute subset, leading to reduced search efficiency and classification performance. There is a clear need for enhanced algorithms that can balance local and global search capabilities to avoid these pitfalls.
With the growth of large-scale datasets, traditional single-threaded algorithms encounter memory and processing limitations. The demand for a more scalable, distributed approach has become evident to handle massive data more efficiently. Parallel computing frameworks, like Apache Spark, offer a way to distribute tasks and speed up computation, making it essential for large-scale data environments.

INNOVATION

This paper introduces the pheromone and mutation mechanisms to the traditional Binary Grey Wolf Optimizer and combines them with the rough hypercuboid approach, proposing a novel parallel attribute reduction algorithm. The enhanced algorithm exhibits strong search capabilities, effectively avoiding local optima. This improvement leads to enhanced search efficiency and accuracy. Moreover, the resulting attribute subset obtained by the algorithm is more compact and possesses higher discriminative power
This paper solves the issue of imbalanced evaluation indicators of attribute subsets in traditional attribute reduction methods by parallel computing the relevance, dependencies, and importance of attributes. This improvement leads to enhanced accuracy and reliability in attribute reduction.
To address the requirements of large-scale data processing, this paper introduces a distributed meta-heuristic attribute reduction algorithm based on the Apache Spark framework.
Furthermore, applying the proposed algorithm to the schizophrenia dataset has yielded remarkable results. This further proves the effectiveness and practicality of our algorithm in practical applications.

METHOD

Fig. 1 presents the framework of this work. This article proposes a novel parallel attribute reduction algorithm that combines the rough hypercuboid approach with an improved Grey Wolf Optimizer (IGWO), incorporating pheromone and mutation mechanisms. The process is structured to leverage the parallel computing capabilities of the Apache Spark framework to handle large-scale datasets.

Data Chunking and Partitioning: The dataset is first divided into multiple partitions (dim₁, dim₂, ..., dimₘ), each containing subsets of data with specific attribute dimensions and corresponding class labels (C₁, C₂, ..., Cₙ). These partitions are processed in parallel using Spark Executors, where fitness calculations for each partition are performed independently. The parallelized approach ensures efficient handling of large-scale data by utilizing distributed computing resources.
Rough Hypercuboid Construction:The Rough Hypercuboid model is employed to assess the relevance of attributes by constructing lower and upper approximations of decision classes, enabling the algorithm to handle uncertainty and ambiguity in large datasets effectively.
Improved Grey Wolf Optimizer (IGWO):The IGWO optimizes attribute selection by updating the positions of wolves (candidate solutions) based on their fitness values. Pheromone mechanisms guide the search toward promising solutions, while mutation operations maintain diversity and prevent convergence to local optima.