A new hybridization of DBSCAN and fuzzy earthworm optimization algorithm for data cube clustering

AuthorsHosseini Rad, M. and M. Abdolrazzagh-Nezhad
JournalSoft Computing
Paper TypeFull Paper
Published At2020
Journal GradeISI
Journal TypeTypographic
Journal CountryGermany

Abstract

This paper introduces a novel hybrid approach for clustering multidimensional data cubes, combining DBSCAN with the Earthworm Optimization Algorithm (EWOA) and fuzzy logic control. The main challenge addressed is the difficulty of performing clustering on three-dimensional data structures, which are common in data warehousing and OLAP systems. To overcome this, the authors propose a preprocessing step that transforms the 3D data cube into a 2D format through a technique called "dimension move," while also assigning a unique address to each cell for scalability. A new similarity metric is introduced to better capture relationships in the transformed 2D data, replacing the traditional Euclidean distance in one of the proposed variants.

The core contribution is the hybridization of DBSCAN—a density-based clustering algorithm—with EWOA, a meta-heuristic optimization algorithm, to automatically determine DBSCAN's critical parameters: the neighborhood radius (ε) and the minimum number of points (MinPts). This hybrid, called EWOA–DBSCAN, helps overcome DBSCAN's sensitivity to parameter settings. Furthermore, to address the challenge of tuning EWOA's own parameters (such as the similarity factor and the number of kept earthworms), the authors design a fuzzy logic controller (FLC) that dynamically adjusts these parameters during execution. Two versions of the FLC are implemented based on Mamdani and Sugeno inference systems, leading to the soft improved algorithms EWOA–DBSCAN-Mamdani and EWOA–DBSCAN-Sugeno.

The proposed methods are evaluated on six real-world data cube datasets using three clustering validation indices: Davies–Bouldin Index (DBI), Dunn Index (DI), and Silhouette Index (SI). Experimental results demonstrate that the EWOA–DBSCAN2-Sugeno algorithm, which uses the new similarity metric and Sugeno-type fuzzy control, outperforms all other compared methods, including standard DBSCAN, GA-optimized DBSCAN, and other hybrid versions. The new similarity metric also proves more effective than Euclidean distance in most cases. Statistical significance tests confirm the superiority of the proposed approach.

Overall, the study successfully presents an efficient, flexible, and automated clustering framework for data cubes. It effectively integrates meta-heuristic optimization and fuzzy logic to enhance parameter tuning and clustering quality. The work opens avenues for further research into multidimensional data analysis and adaptive clustering techniques, with potential applications in business intelligence and big data analytics.

Paper URL

tags: DBSCAN