Improved BIRCH Clustering by Chemical Reaction Optimization Algorithm to Health Fraud Detection

Majid Abdolrazzagh; Nezhad, Mehdi Kherad

Authors	Majid Abdolrazzagh-Nezhad, Mehdi Kherad
Journal	Nashriyyah-i Muhandisi-i Barq va Muhandisi-i Kampyutar-i Iran
Paper Type	Full Paper
Published At	۲۰۲۰
Journal Grade	ISI
Journal Type	Typographic
Journal Country	Iran, Islamic Republic Of

Abstract

This paper addresses the significant challenge of fraud detection in the healthcare domain, an area particularly vulnerable to financial abuse due to the large scale of transactions and the complexity of medical claims. Recognizing the limitations of supervised and semi-supervised methods, which rely on pre-labeled data that is often scarce or unreliable, the authors propose a novel, fully unsupervised approach. The core of their solution is a hybrid algorithm that ingeniously combines the strengths of two distinct techniques: the BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) clustering algorithm and the Chemical Reaction Optimization (CRO) metaheuristic. This fusion is designed to efficiently and accurately identify anomalous patterns indicative of fraud within large, unlabeled healthcare datasets.

The proposed method, named BIRCH-CRO, tackles key shortcomings of the standard BIRCH algorithm. While BIRCH is efficient for large-scale data due to its linear time complexity and use of a Cluster Feature (CF) tree, it relies on critical user-defined parameters like the branching factor and a distance threshold, which are difficult to set optimally and can severely impact clustering quality. The innovation lies in using the CRO algorithm to dynamically manage and optimize the tree-building process. Instead of fixed thresholds, the CRO metaheuristic guides the search for the most suitable leaf node when inserting new data into the CF tree. It intelligently decides whether to create a new cluster (leaf) or reorganize existing ones based on the data's characteristics, effectively automating the parameter tuning that is typically a major hurdle.

The experimental evaluation demonstrates the clear superiority of the BIRCH-CRO hybrid. Using a real-world public dataset related to heart attack treatment payments, the algorithm was tested on three common types of healthcare fraud: duplicate claims, claims with anomalous provider information, and claims with non-standard pricing. Compared to standalone clustering algorithms like BIRCH, K-means, and DBSCAN, the proposed method achieved the best overall performance. It recorded the highest accuracy (0.996), precision (0.995), specificity (0.988), and F-score (0.997), while also boasting the fastest execution time (0.38 seconds). This indicates that BIRCH-CRO is not only more accurate in distinguishing fraudulent from legitimate claims but also significantly more computationally efficient.

The key achievements and positive aspects of this research are multifaceted. Firstly, it presents a robust, fully unsupervised fraud detection system, eliminating the dependency on labeled data—a major practical advantage. Secondly, the integration of CRO successfully automates the critical and challenging task of parameter optimization for BIRCH, leading to more adaptive and resilient clustering. Thirdly, the algorithm maintains the linear time complexity of BIRCH, making it scalable for large datasets, a crucial requirement in healthcare analytics. Finally, by dynamically determining cluster radii and similarity thresholds, the system is better equipped to handle new and evolving fraud patterns without manual reconfiguration. This work provides a powerful and efficient framework for automated anomaly detection, offering a promising tool for insurers and healthcare providers to combat financial fraud effectively.

Paper URL

tags: Health Fraud Detection