Data Cube Clustering with Improved DBSCAN Based on Fuzzy Logic and Genetic Algorithm

نویسندگانRad, M.H. and M. Abdolrazzagh-Nezhad
نشریهInformation Technology and Control
نوع مقالهFull Paper
تاریخ انتشار2020
رتبه نشریهISI
نوع نشریهچاپی
کشور محل چاپلیتوانی

چکیده مقاله

This paper proposes an enhanced density-based clustering approach for multidimensional data cubes, addressing the challenge of applying traditional pattern recognition methods to complex, aggregated data structures found in data warehouses. The core of the work is the improvement of the DBSCAN algorithm, whose performance heavily depends on the manual tuning of its key parameters: the neighborhood radius (ε) and the minimum points (MinPts). To overcome this limitation, the authors introduce a hybrid method called Improved DBSCAN (IDBSCAN), which integrates a Genetic Algorithm (GA) to automatically and optimally determine these parameters, thereby optimizing the clustering process for data cubes.

Building upon the IDBSCAN, the authors further refine the approach by addressing the GA's own challenge of parameter tuning—specifically, the crossover and mutation rates. This leads to the development of the Soft Improved DBSCAN (SIDBSCAN), which incorporates a Fuzzy Logic Controller (FLC) to dynamically adjust the GA's parameters during execution. Two variants are presented: SIDBSCAN-Mamdani and SIDBSCAN-Sugeno, based on Mamdani-type and Takagi-Sugeno-type fuzzy inference systems, respectively. This fuzzy adaptation allows the meta-heuristic process to be more responsive, enhancing its exploration and exploitation capabilities based on the algorithm's current state, such as iteration progress and fitness value quality.

The experimental evaluation demonstrates the clear success of the proposed methods. Using six real-world data cube datasets from the UCI repository, the results show that the IDBSCAN significantly improves clustering quality over the standard DBSCAN, with performance gains ranging from 4% to 28%. The SIDBSCAN variants, particularly SIDBSCAN-Sugeno, achieve even better results. The statistical Wilcoxon signed-rank test confirms the significance of these improvements. Furthermore, convergence analysis reveals that the SIDBSCAN algorithms are more effective at avoiding local optima and achieving better final clustering quality (measured by the Davies Bouldin Index) compared to the IDBSCAN, highlighting the benefit of the dynamic, fuzzy-guided parameter adaptation.

A key achievement of this research is the presentation of a complete pipeline for data cube clustering, which includes a preprocessing step to normalize and reshape the 3D data into a 2D format without information loss, followed by the novel hybrid clustering algorithms. The most notable positive outcome is the SIDBSCAN-Sugeno algorithm's superior performance, attributed to its efficient, weighted-average defuzzification process and effective dynamic parameter control. While the use of meta-heuristics increases computational time compared to standard DBSCAN, the primary aim of significantly enhancing clustering quality is successfully met. The authors posit that this framework provides a robust foundation for efficient unsupervised analysis of complex multidimensional data and suggest future work in parallel processing for speed optimization and the extension of these concepts to native 3D clustering for applications like image processing.

لینک ثابت مقاله

tags: DBSCAN