Multi-Objective Feature Selection based on Clustering and Principal Component Analysis by Enhanced Electromagnetic-likes Algorithm

نویسندگانM. Abdolrazzagh-Nezhad, S.P. Mahyabadi, S. Jalali-Poor, E.B. Nababan
همایشInternational Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)
تاریخ برگزاری همایش2020
نوع ارائهچاپ در مجموعه مقالات
سطح همایشبین المللی

چکیده مقاله

This paper introduces a novel multi-objective feature selection method that integrates clustering and principal component analysis (PCA) within an enhanced electromagnetism-like mechanism (EM) optimization framework. The core challenge addressed is the high dimensionality of datasets, which often contain redundant and irrelevant features that degrade the performance and efficiency of data mining tasks. Unlike traditional single-objective approaches, the proposed method simultaneously pursues three goals: minimizing the number of selected features, minimizing the PCA coefficient (which reflects the dispersion of principal components), and maximizing the accuracy of k-medoids clustering. This multi-objective formulation is innovative, particularly in its combination of unsupervised pattern discovery via clustering with feature space analysis via PCA.

A key contribution is the adaptation of the continuous EM algorithm to solve the discrete feature selection problem. This is achieved through a heuristic discretization technique that maps continuous solution vectors to binary feature subsets, ensuring sensitivity to small changes in the continuous space. The algorithm evaluates each candidate feature subset using a composite fitness function that balances the three objectives. Specifically, the fitness function is defined as the sum of the normalized PCA coefficient and clustering error, divided by the number of selected features, thereby promoting compact subsets with high intrinsic data quality.

The proposed method was rigorously evaluated on 14 diverse UCI datasets and compared against three established metaheuristic algorithms: Genetic Algorithm (GA), Harmony Search (HS), and Ant Colony Optimization (ACO). Experimental results demonstrate that the EM-based approach consistently achieves competitive or superior performance in terms of the defined fitness function across most datasets. Importantly, the method not only identifies smaller feature subsets but also maintains high clustering accuracy and low PCA dispersion, indicating effective preservation of essential data structure.

Further analysis through convergence curves and box plots reveals that the EM algorithm exhibits stable and efficient search behavior, with fewer fluctuations and a tendency to avoid local optima compared to the other algorithms. The study also highlights the algorithm’s flexibility and robustness in handling varied dataset characteristics. In conclusion, this work successfully demonstrates that a multi-objective EM algorithm, enhanced with PCA and clustering criteria, offers a powerful and reliable solution for feature selection, balancing dimensionality reduction with data integrity. Future research may focus on automating parameter tuning using fuzzy rules to further enhance adaptability and performance.

لینک ثابت مقاله

کلید واژه ها: Feature Selection