Informática 33 (2009) 235-239 235 Churn Prediction Model in Retail Banking Using Fuzzy C-Means Algorithm Džulijana Popovic Zagrebačka banka d.d., Consumer Finance Trg bana Josipa Jelačica 10, 10000 Zagreb, Croatia E-mail: dzulijana.popovic@unicreditgroup.zaba.hr, www.zaba.hr Bojana Dalbelo Bašic University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3, 10000 Zagreb, Croatia E-mail: bojana.dalbelo@fer.hr Keywords: churn prediction, fuzzy c-means algorithm, fuzzy transitional condition of the first degree, fuzzy transitional condition of the second degree, distance of k instances fuzzy sum Received: February 22, 2008 The paper presents model based on fuzzy methods for churn prediction in retail banking. The study was done on the real, anonymised data of 5000 clients of a retail bank. Real data are great strength of the study, as a lot of studies often use old, irrelevant or artificial data. Canonical discriminant analysis was applied to reveal variables that provide maximal separation between clusters of churners and non-churners. Combination of standard deviation, canonical discriminant analysis and k-means clustering results were used for outliers detection. Due to the fuzzy nature of practical customer relationship management problems it was expected, and shown, that fuzzy methods performed better than the classical ones. According to the results of the preliminary data exploration and fuzzy clustering with different values of the input parameters for fuzzy c-means algorithm, the best parameter combination was chosen and applied to training data set. Four different prediction models, called prediction engines, have been developed. The definitions of clients in the fuzzy transitional conditions and the distance of k instances fuzzy sums were introduced. The prediction engine using these sums performed best in churn prediction, applied to both balanced and non-balanced test sets. Povzetek: Razvita je metoda mehke logike za uporabo v bančništvu. 1 Introduction Due to intensive competition and saturated markets, companies in all industries realize that their existing clients database is their most valuable asset. Retaining existing clients is the best marketing strategy to survive in industry and a lot of studies showed it is more profitable to keep and satisfy existing clients than to constantly attract new ones [1,4,8,11]. Churn management, as the general concept of identifying those clients most prone to switching to another company, led to development of variety of techniques and models for churn prediction. Next generation of such models has to concentrate on the improved accuracy, robustness and lower implementation costs, as every delay in reaction means increased costs for the company [2]. The aim of this study was to show that the data mining methods based on the fuzzy logic could be successfully applied in the retail banking analysis and, moreover, that the fuzzy c-means clustering performed better than the classical clustering algorithms in the problem of churn prediction. Although the clustering analysis is in fact an unsupervised learning technique, it can be used as the basis for classification model, if the data set contains the classification variable, what was case in this study. To our best knowledge this is the first paper considering application of fuzzy clustering in churn prediction for retail banking. Studies of churn prediction in banking are very scarce, and the most of papers used models based on logistic regression, decision trees and neural networks [9,11]. Useful literature review of attrition models can be found in [11]. Some of them [9] reported the percentage of correct predictions varying from 14% to 73%, depending on the proportion of churners in the validation set. The others [3] obtained AUC performance in subscription services varying from 69,4% for overall churn to 90,4% but only for churn caused by financial reasons, which is much easier to predict. Results are not perfectly comparable due to differences in churn moment definitions, data sets sizes or industries, but still can provide valuable subject insight. 236 Informatica 33 (2009) 235-239 D. Popovic et al. 2 Fuzzy c-means clustering algorithm Classical clustering assigns each observation to a single cluster, without information how far or near the observation is from all the other possible decisions. This type of clustering is often called hard or crisp clustering [1,10,12]. Two major classes of crisp clustering methods are hierarchical and optimization (partitive) clustering, with number of different algorithms, used in the study. Based on the fuzzy set theory, firstly introduced by Zadeh in 1965. [5,6,10] and on the concept of membership functions, the fuzzy clustering methods have been developed. In fuzzy clustering entities are allowed to belong to many clusters with different degrees of membership. Fuzzy clustering of X into p clusters is characterized by p membership functions /Uy, where n Hj:X->[0,i\,j = l,-,P, = 1, i = 1,2..... n, (1) (2) and 0 < Zti Ufa) 0, holds that nh^ - nh^ < £. Definition 2. Let p be the number of clusters in the FCM algorithm. Let us denote max?=1{fij(xi)} = Umax. The entity Xi is said to be in the fuzzy transitional condition of the 2nd degree if, for arbitrary small £ > 0, holds that Umax ~p<~£" Subsets of clients in the FTC of both degrees, and with floating £ values, were further analyzed and the information gained from the fact about their membership values helped in explaining their behavior. Four prediction models were developed, based on the main idea of the distance of the new client from the clients in the training data set. For the predictive purpose in the 4th model, the definition of distance of k instances (DOKI) sums was introduced. Definition 3. Let p be the number of clusters in the FCM algorithm and X be the set of n entities with assigned membership values ¡j.j, j = 1, ...,p. Distance of k instances sum i.e. DOKIf(x) sum for the new entity Xn+i is defined as the sum of membership values {¡Xj} in the j — th cluster of the k nearest entities from X, according to distance metric used in FCM. Calculation of DOKI sums requires the input parameter k and several different values were applied. Table 3 presents the results of FCM on the training set and prediction engine with DOKI sums applied on balanced and non-balanced test sets. Concept of DOKI sums might seem similar to k nearest neighbors approach, but DOKI sums up values of membership functions and not the pure distances. Recall rate for test sets were even higher then recall rate obtained with FCM on the training set. Improvement in recall was paid in slight decrease in specificity. As mentioned previously, it is more important to hit churners, even if it is paid by hitting some percentage of loyal clients. The cost minimization can be achieved later through more intelligent and multi-level communication channels. DATA SET (recalh fP rate accuracy specificity training 79.64% 55,61% 62,00% 44,39% test - nonbalanced 87,60% 60,82% 63,64% 39,18% test-balanced 88,52%| 62,45%| 63,04%| 37,55% Table 3: Results of FCM and DOKI prediction model. 5 Conclusions and further work It is always challenging to deal with real data and business situations, where classical methods can rarely be applied in their simplest theoretical form. The main idea of the study - to prove that fuzzy logic and fuzzy data mining methods can find their place in the reality of retail banking - was completely fulfilled. FCM performed much better than the classical clustering and provided more hidden information about the clients, especially those in fuzzy transitional conditions. Three new definitions were introduced and had the impact on the overall work. Implementation of DOKI sums increased hit rate (recall) by 8,88% in comparison to pure FCM. A lot of work still needs to be done. In the near future every client and every selling opportunity will become important. Methods which require a lot of preprocessing and, above all, removing many outlying clients, will lose the battle with more efficient and robust methods. More accuracy should be obtained through better information exploitation of clients in fuzzy transitional conditions, and not through clients removal. Monitoring clients in FTCs and reacting as they approach to churners could be a way for more intelligent churn management. This requires analysis on larger data sets, including more transactional variables into the model and tuning e. Model should also include costs of positive and negative misclassifications. Different segments of clients or clients having similar product lines could be modeled CHURN PREDICTION MODEL IN. Informatica 33 (2009) 235-239 239 on their own, to find empirically best FCM parameters for each segment/product line. Acknowledgement The first author's opinions expressed in this paper do not necessarily reflect the official positions of Zagrebacka banka d.d. This work has been supported by the Ministry of Science, Education and Sports, Republic of Croatia, under the grant No. 036-1300646-1986 and 0980982560-2563. References [1] Berry J.A.M., Linoff S.G. (2004) Data Mining Techniques For Marketing, Sales, and Customer Relationship Management, 2nd Ed. Indianapolis: Wiley Publishing, Inc. [2] Burez J., Van den Poel D. (2008) "Handling class imbalance in customer churn prediction", Expert Systems with Applications, In Press, available online 16 May 2008. [3] Burez J., Van den Poel D. (2008) "Separating financial from commercial customer churn: A modeling step towards resolving the conflict between the sales and credit department", Expert Systems with Applications 35 (1-2), pp. 497-514. [4] Coussement K., Van den Poel D. (2006) "Churn Prediction in Subscription Services: an Application of Support Vector Machines While Comparing Two Parameter-Selection Techniques", Working Paper 2006/412, Ghent University. [5] Cox E. (2005) Fuzzy Modeling and Genetic Algorithms for Data Mining and Exploration, San Francisco: Morgan Kaufmann Publishers. [6] De Oliveira, J.V., Pedrycz W. (editors) (2007) Advances in Fuzzy Clustering and its Applications, John Wiley & Sons Ltd. [7] Fawcett T. (2004) ROC Graphs: Notes and Practical Considerations for Researchers, Netherlands: Kluwer Academic Publishers. [8] Hadden J., Tiwari A., Roy R., Ruta D. (2005) "Computer assisted customer churn management: State-of-the-art and future trends", Computers & Operations Research 34, pp. 2902-2917. [9] Mutanen T., Ahola J., Nousiainen S. (2006) "Customer churn prediction - a case study in retail banking", ECML/PKDD 2006 Workshop on Practical Data Mining: Applications, Experiences and Challenges, Berlin. [10] Theodoridis S., Koutroumbas K. (2003) Pattern Recognition, 2nd Ed., San Diego, USA: Academic Press, Elsevier Science. [11] Van den Poel D., Lariviere B. (2004) "Customer attrition analysis for financial services using proportional hazard models", European Journal of Operational Research 157 (1), pp. 196-217. [12] Yeo D. (2005) Applied Clustering Techniques Course Notes, Cary NC, USA: SAS Institute Inc.