| Authors | Abdolrazzagh-Nezhad, M., S.P. Mahyabadi, and A. Ebrahimpoor |
|---|---|
| Journal | Data Science: Journal of Computing and Applied Informatics |
| Paper Type | Full Paper |
| Published At | ۲۰۲۰ |
| Journal Grade | Scientific - research |
| Journal Type | Typographic |
| Journal Country | Indonesia |
Abstract
This paper proposes a novel hybrid approach for breast cancer detection by enhancing the K-Nearest Neighbor (KNN) classification algorithm using the Gases Brownian Motion Optimization (GBMO) algorithm. The primary goal is to address two well-known challenges of KNN: determining the optimal value of k and managing the high computational load associated with large datasets. By integrating GBMO with KNN, the authors aim to simultaneously perform feature selection and optimize the k parameter, thereby improving classification accuracy while reducing computational complexity.
The methodology involves encoding each solution within the GBMO framework as a gas molecule, where the molecule’s structure includes a binary representation of selected features and a value for k. The GBMO algorithm, inspired by the random Brownian motion and vibration of gas molecules, efficiently explores the search space through a balance of global and local search mechanisms. The fitness of each molecule is evaluated by running the KNN classifier on the selected feature subset with the corresponding k, using classification accuracy as the fitness score. The approach is validated on three breast cancer datasets from the UCI repository: Wisconsin Original Breast Cancer (WOBC), Wisconsin Diagnostic Breast Cancer (WDBC), and Breast Cancer Coimbra.
Key experimental results demonstrate the effectiveness of the GBMO+KNN hybrid. In comparative tests against other metaheuristic-based hybrids—such as Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and Imperialist Competitive Algorithm (ICA)—GBMO+KNN consistently achieved higher average and best accuracy rates across all three datasets. Notably, the method showed a small gap between training and testing accuracy, indicating good generalization and robustness. Furthermore, GBMO exhibited superior performance in high-dimensional optimization benchmark functions, confirming its scalability and efficiency in handling complex search spaces.
The study makes several important contributions. First, it successfully applies GBMO for the first time to optimize KNN in a medical diagnosis context, specifically for breast cancer detection. Second, the hybrid model effectively reduces dimensionality through intelligent feature selection, which lowers computational cost without sacrificing accuracy. Third, the results indicate that GBMO is particularly advantageous in high-dimensional problems compared to GA, PSO, and ICA. The work highlights the potential of nature-inspired optimization algorithms to enhance traditional machine learning classifiers, offering a reliable and efficient tool for early and accurate breast cancer diagnosis.
tags: KNN