A Hybrid Approach for Customer Segmentation Using Ensemble Methods: Bagging and Voting Classifiers
Sufaira ShamsudeenDepartment of Computer Science, Karpagam Academy of Higher Education, Coimbatore, Tamil Nadu, India. sufaira@mesmarampally.org0009-0003-3783-9117
Dr.K. Ranjith SinghDepartment of Computer Science, Karpagam Academy of Higher Education, Coimbatore, Tamil Nadu, India. ranjithsingh.koppaiyan@kahedu.edu.in0000-0003-2651-2509
Customer Classification is the approach of categorizing diverse customers into interrelated classes and thereby helping banks to know loyal customers, by offering, more tailored products and services, ultimately increasing business revenue. The study explores the evolution of Machine Learning (ML) and Ensemble Learning (EL) techniques in the banking domain, progressing from basic to advanced classification methods for customer categorization based on credit information. The objective is to identify profitable customers and classify them into five categories like Outstanding, Excellent, Good, Satisfactory, and Bad. Using algorithms such as K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Decision Tree (DT), Random Forest (RF), and Artificial Neural Network (ANN), the researcher finds that, the Decision Tree classifier underperforms in comparison to others. The main aim of the study is to strengthen the classifiers using the ensemble learning method- Voting and Bagging approaches for the multi-class customer segmentation. Initially, the study integrates the strengths of neural network optimizers with the DT classifier, and ensembles DT with the base models using a Voting Classifier. A neural network solver such as Stochastic Gradient Descent (SGD), Adaptive Moment Estimation (Adam), and Limited–memory Broyden–Fletcher–Goldfarb–Shannon (L-BFGS) is used to find the best local minima for the Decision Tree algorithm. Then crafting an ensemble base model using Voting Classifier, aims to enhance the Decision Tree classifier's performance despite its initial shortcomings. The efficiency of the proposed ensemble Decision Tree models using different optimizers such as VC-DT-ADAM, VC-DT-SGD, and VC-DT-L-BFGS, and ensemble with the base models such as VC-DT-KNN, VC-DT-SVM, and VC-DT-RF compared for better selection. Both hard and soft voting classifier methods are computed and analyzed. The result demonstrates that all the ensemble models are fine in their performance. Despite the neural network solver VC-DT-SGD delivering better classification performance with an accuracy score of 0.98 and all the base ensemble models scoring 0.98, VC-DT-RF offers superior performance with an accuracy score of 0.99 and auc value of 1.000 using both hard and soft voting classifiers. Furthermore, the study enhances the DT classifier performance based on the Bagging Classifier (BC) approach, i.e., BC-DT. Amongst all hybrid models, bagging classifier with DT i.e., BC-DT is superior with the score 0.99 and with auc value of 0.999. Various evaluation metrics are measured here to highlight the performance of the models and compared to assess the efficacy of the proposed models. Taking into account all evaluation metrics beyond accuracy, the decision tree classifier combined with a neural network optimizer excelled all other hybrid models. The simulation results show that the hybrid model produces superior performance compared to the base classifier models across all key metrics.