https://doi.or g/10.31449/inf.v48i2.6105 Informatica 48 (2024) 289–290 289 An Efficient Iterative Algorithm to Explainab le Featur e Learning Dino Vlahek Faculty of Electrical Engineering and Computer Science at the University of Maribor (UM FERI) E-mail1: dino.vlahek1@um.si Thesis Summary Keywords: data classification, explainable artificial intelligence, feature learning Received: April 23, 2024 This paper summarizes a doctoral thesis intr oducing the new iterative appr oach to explainable featur e learning. Featur es ar e learned in thr ee steps during each iteration: featur e construction, evaluation, and selection. W e demonstrated superior performances compar ed to the state of the art on 13 of 15 test cases and the explainability of the learned featur e r epr esentation for knowledge discovery . Povzetek: T o delo povzema vsebino doktorske disertacije, v kateri pr edstavimo iterativni pristop k učenju razložljivih značilnic. Med vsako iteracijo se značilnice naučijo čez naslednje korake: gradnja, ocenjevanje in izbira značilnic. Na 13 od 15 testnih primer ov smo demonstrirali vr hunsko zmogljivost v primerjavi s stanjem tehnike in razložljivost pr edstavitve naučenih značilnic za odkrivanje znanja. 1 Intr oduction Supervised feature learning describes a set of techniques that enable defining augmented data representation for im- proved utilization of classification or regression models [ 1 ]. These methods replace traditional feature engineering tasks in a wide range of machine learning applications. Super - vised feature learning methods can be divided into feature selection, dimensionality reduction, supervised dictionary learning, and deep learning. Feature selection methods se- lect a subset of relevant features from the original feature space [ 3 ]. Such methods are limited in their accuracy as they cannot recombine features. In contrast, supervised di- mensionality reduction recombines input features by map- ping input samples on linear or non-linear manifolds [ 4 ]. However , significant distortions may be introduced to the data by this process as a consequence of changing distances between learning samples. Thus, resulting classification models are challenging to interpret. In addition, these ap- proaches can only reduce the feature space’ s dimensional- ity [ 4 ]. On the other hand, supervised dictionary learning learns new feature space from the input set by recombin- ing an arbitrary number of basic elements, called atoms, that compose a dictionary [ 2 ]. It is considered an opti- mization problem, where the sparsity of representation is maximized and the reconstruction errors minimized. Lat- ter is defined as the dif ference between learning data and sparse representation. Dictionaries can be shared and class- specific depending on the mechanism for processing dis- criminatory information. Shared ones are learned from the entire data set, regardless of class labels. Using such dic- tionaries requires an additional classifier that significantly increases computational complexity due to the non-convex optimization problem [ 2 ]. On the other hand, class-specific dictionaries are learned for each class separately [ 6 ], en- abling straightforward classification of unknown samples based o n the reconstruction error introduced by such dic- tionaries. However , this can become computationally de- manding with the increasing number of classes, while it is challenging to extract useful knowledge when the dictio- nary contains a lar ge number of atoms [ 2 ]. Similar draw- backs are also noted when considering deep learning ap- proaches. These are based on various architectures of arti- ficial neural networks with multiple hidden layers of neu- rons that allow for extracting higher -level features progres- sively from the raw input [ 5 ]. Both linear and nonlinear functions can model neurons’ activation functions, thus op- timizing feature representation within the decision function. By increasing the hidden layers, artificial neural networks can approximate increasingly complex decision functions and achieve high classification accuracies. However , the presence of multiple local optima and many hyperparame- ters [ 5 ] also increases the training procedure’ s complexity , while we consider these methods as black-box function ap- proximators [ 1 ]. In order to address above-mentioned chal- lenges, a new method is proposed in [ 7 ] that learned inter - pretable features from input ones that achieved improved accuracy in comparison to the current state-of-the-art. 2 Methodology The proposed method [ 7 ] allows for exploiting non-linear codependencies between features in order to improve an ar - bitrary classifier ’ s classification performance while provid- ing a meaningful feature representation for knowledge dis- covery . Each iteration consists of the following three steps: Feature construction that generates the new feature space, 290 Informatica 48 (2024) 289–290 D. Vlahek feature evaluation that assesses the quality of the individual feature by a new metric that defines the feature’ s suitability for classification tasks, and feature selection that selects the high-quality dissimilar features using a new method based on vertex-cut. Here, we introduce two input parameters used to define the graph. The first represents the neces- sary level of features’ quality to be included in the output feature space, and the second determines the minimal level of dissimilarity between them. 3 Results and discussion The proposed method is extensively tested on fifteen bench- mark datasets. During the sensitivity analysis, optimal val- ues of two input parameters were identified, and the per - formance of five traditional classifiers was estimated on learned features. The study showed that the learned fea- tures statistically significantly improved the classification accuracy of all tested classifiers, while the random forest classifier achieved the best results. As demonstrated by experiments, the proposed method achieved or exceeded the classification accuracy of six state-of-the-art in all test cases. The correctness of learned features interpretation on a well-studied dataset was also demonstrated. The proposed method is used in many research applica- tions, ranging from pure research to industrial and scientific projects. W e plan to extend the proposed method applica- tion to regression with a new feature evaluation metric for the suitability of features for regression tasks. Refer ences [1] Y . Bengio, A. Courville, and P . V incent. Repre- sentation Learning: A Review and New Perspec- tives. IEEE T ransactions on Pattern Analysis and Ma- chine Intelligence , 35(8):1798–1828, aug 2013. doi: https://doi.or g/10.1 109/TP AMI.2013.50. [2] Mehrdad J. Gangeh, Ahmed K. Farahat, Ali Ghodsi, and Mohamed S. Kamel. Supervised dictionary learning and sparse representation- a review . ArXiv , abs/1502.05928, 2015. doi: https://doi.or g/10.48550/arXiv .1502.05928. [3] Huan Liu and Hiroshi Motoda. Featur e Extraction, Construction and Selection: A Data Mining Perspec- tive . Kluwer Academic Publishers, Norwell, MA, USA, 1998. [4] Y unqian Ma and Y un Fu. Manifold Learning Theory and Applications . CRC Press, Inc., USA, 1st edition, 201 1. [5] Michael A. Nielsen. Neural Networks and Deep Learn- ing , page 216. Determination Press, 2018. [6] W . T ang, A. Panahi, H. Krim, and L. Dai. Anal- ysis dictionary learning based classification: Struc- ture for robustness. IEEE T ransactions on Im- age Pr ocessing , 28(12):6035–6046, Dec 2019. doi: http://doi.or g//10.1 109/TIP .2019.2919409. [7] Dino Vlahek. Učinkovit iterativni algoritem učenja ra- zložljivih značilnic za izboljšano klasifikacijo. PhD thesis, UM FERI , 2024.