Informatica 38 (2014) 321-327 321 Visualization and Concept Drift Detection Using Explanations of Incremental Models Jaka Demšar, Zoran Bosnic and Igor Kononenko University of Ljubljana, Faculty of Computer and Information Science Vecnapot 113, 1000 Ljubljana, Slovenia E-mail: jaka.demsar0@gmail.com, {zoran.bosnic, igor.kononenko}@fri.uni-lj.si Keywords: data stream mining, concept drift detection, visual perception Received: October 10, 2014 The temporal dimension that is ever more prevalent in data makes data stream mining (incremental learning ) an important field of machine learning. In addition to accurate predictions, explanations of the models and examples are a crucial component as they provide insight into model's decision and lessen its black box nature, thus increasing the user's trust. Proper visual representation of data is also very relevant to user's understanding - visualization is often utilised in machine learning since it shifts the balance between perception and cognition to take fuller advantage of the brain's abilities. In this paper we review visualisation in incremental setting and devise an improved version of an existing visualisation of explanations of incremental models. Additionally, we discuss the detection of concept drift in data streams and experiment with a novel detection method that uses the stream of model's explanations to determine the places of change in the data domain. Povzetek: V članku predstavimo novo vizualizacijo razlage inkrementalnih klasifikacijskih modelov in posameznih napovedi. Predlagamo tudi metodo zaznave spremembe koncepta, temelječo na nadzorovanju toka razlag. 1 Introduction Data streams are becoming ubiquitous. This is a consequence of the increasing number of automatic data feeds, sensoric networks and internet of things. The defining characteristics of data streams are their transient dynamic nature and temporal component. In contrast with static (tabular) datasets (used in batch learning), data streams (used in incremental learning) can be large, semi-structured, incomplete, irregular, distributed and possibly unlimited. This poses a challenge for storage and processing which should be done in constant time - for incremental models, operations of increment (fast update of the model with the new example) and decrement (forgetting old examples) are vital. Concepts and patterns in data domain can change (concept drift) - we need to adapt to this phenomenon or the quality of our predictions deteriorates. We discuss data streams and concept drift more in Section 2.1. Bare prediction quality is not a sufficient property of a good machine learning algorithm. Explanation of individual predictions and model as a whole is needed to increase the user's trust in the decision and provide insight in the workings of the model, which can significantly increase the model's credibility. The model independent methods of explanation have been developed for batch [12, 13] and incremental learning [2] (Section 2.2). Data visualisation is a versatile tool in machine learning that serves for sense-making and communication as it conveys abstract concepts in a form, understandable to humans. In Section 2.3 we discuss visual display of data in an incremental setting and describe the improper visualisation of explanation of incremental models. The main goal of the article is its improvement, which is presented in Section 3. An additional goal (Section 4) was to devise a method of concept drift detection which monitors the stream of explanations. Finally, we test the improved visualization and the novel concept drift detection method on two datasets and evaluate the results (Section 5). 2 Related work 2.1 Data stream mining In incremental learning, we can observe possibly infinite data stream of pairs (xj, Q), where xi is the i - th instance and Ci is its true label. After the model makes a prediction Pi =