https://doi.or g/10.31449/inf.v48i2.6105 Informatica 48 (2024) 289–290 289
An Efficient Iterative Algorithm to Explainab le Featur e Learning
Dino Vlahek
Faculty of Electrical Engineering and Computer Science at the University of Maribor (UM FERI)
E-mail1: dino.vlahek1@um.si
Thesis Summary
Keywords: data classification, explainable artificial intelligence, feature learning
Received: April 23, 2024
This paper summarizes a doctoral thesis intr oducing the new iterative appr oach to explainable featur e
learning. Featur es ar e learned in thr ee steps during each iteration: featur e construction, evaluation, and
selection. W e demonstrated superior performances compar ed to the state of the art on 13 of 15 test cases
and the explainability of the learned featur e r epr esentation for knowledge discovery .
Povzetek: T o delo povzema vsebino doktorske disertacije, v kateri pr edstavimo iterativni pristop k učenju
razložljivih značilnic. Med vsako iteracijo se značilnice naučijo čez naslednje korake: gradnja, ocenjevanje
in izbira značilnic. Na 13 od 15 testnih primer ov smo demonstrirali vr hunsko zmogljivost v primerjavi s
stanjem tehnike in razložljivost pr edstavitve naučenih značilnic za odkrivanje znanja.
1 Intr oduction
Supervised feature learning describes a set of techniques
that enable defining augmented data representation for im-
proved utilization of classification or regression models [ 1 ].
These methods replace traditional feature engineering tasks
in a wide range of machine learning applications. Super -
vised feature learning methods can be divided into feature
selection, dimensionality reduction, supervised dictionary
learning, and deep learning. Feature selection methods se-
lect a subset of relevant features from the original feature
space [ 3 ]. Such methods are limited in their accuracy as
they cannot recombine features. In contrast, supervised di-
mensionality reduction recombines input features by map-
ping input samples on linear or non-linear manifolds [ 4 ].
However , significant distortions may be introduced to the
data by this process as a consequence of changing distances
between learning samples. Thus, resulting classification
models are challenging to interpret. In addition, these ap-
proaches can only reduce the feature space’ s dimensional-
ity [ 4 ]. On the other hand, supervised dictionary learning
learns new feature space from the input set by recombin-
ing an arbitrary number of basic elements, called atoms,
that compose a dictionary [ 2 ]. It is considered an opti-
mization problem, where the sparsity of representation is
maximized and the reconstruction errors minimized. Lat-
ter is defined as the dif ference between learning data and
sparse representation. Dictionaries can be shared and class-
specific depending on the mechanism for processing dis-
criminatory information. Shared ones are learned from the
entire data set, regardless of class labels. Using such dic-
tionaries requires an additional classifier that significantly
increases computational complexity due to the non-convex
optimization problem [ 2 ]. On the other hand, class-specific
dictionaries are learned for each class separately [ 6 ], en-
abling straightforward classification of unknown samples
based o n the reconstruction error introduced by such dic-
tionaries. However , this can become computationally de-
manding with the increasing number of classes, while it is
challenging to extract useful knowledge when the dictio-
nary contains a lar ge number of atoms [ 2 ]. Similar draw-
backs are also noted when considering deep learning ap-
proaches. These are based on various architectures of arti-
ficial neural networks with multiple hidden layers of neu-
rons that allow for extracting higher -level features progres-
sively from the raw input [ 5 ]. Both linear and nonlinear
functions can model neurons’ activation functions, thus op-
timizing feature representation within the decision function.
By increasing the hidden layers, artificial neural networks
can approximate increasingly complex decision functions
and achieve high classification accuracies. However , the
presence of multiple local optima and many hyperparame-
ters [ 5 ] also increases the training procedure’ s complexity ,
while we consider these methods as black-box function ap-
proximators [ 1 ]. In order to address above-mentioned chal-
lenges, a new method is proposed in [ 7 ] that learned inter -
pretable features from input ones that achieved improved
accuracy in comparison to the current state-of-the-art.
2 Methodology
The proposed method [ 7 ] allows for exploiting non-linear
codependencies between features in order to improve an ar -
bitrary classifier ’ s classification performance while provid-
ing a meaningful feature representation for knowledge dis-
covery . Each iteration consists of the following three steps:
Feature construction that generates the new feature space,
290 Informatica 48 (2024) 289–290 D. Vlahek
feature evaluation that assesses the quality of the individual
feature by a new metric that defines the feature’ s suitability
for classification tasks, and feature selection that selects the
high-quality dissimilar features using a new method based
on vertex-cut. Here, we introduce two input parameters
used to define the graph. The first represents the neces-
sary level of features’ quality to be included in the output
feature space, and the second determines the minimal level
of dissimilarity between them.
3 Results and discussion
The proposed method is extensively tested on fifteen bench-
mark datasets. During the sensitivity analysis, optimal val-
ues of two input parameters were identified, and the per -
formance of five traditional classifiers was estimated on
learned features. The study showed that the learned fea-
tures statistically significantly improved the classification
accuracy of all tested classifiers, while the random forest
classifier achieved the best results. As demonstrated by
experiments, the proposed method achieved or exceeded
the classification accuracy of six state-of-the-art in all test
cases. The correctness of learned features interpretation on
a well-studied dataset was also demonstrated.
The proposed method is used in many research applica-
tions, ranging from pure research to industrial and scientific
projects. W e plan to extend the proposed method applica-
tion to regression with a new feature evaluation metric for
the suitability of features for regression tasks.
Refer ences
[1] Y . Bengio, A. Courville, and P . V incent. Repre-
sentation Learning: A Review and New Perspec-
tives. IEEE T ransactions on Pattern Analysis and Ma-
chine Intelligence , 35(8):1798–1828, aug 2013. doi:
https://doi.or g/10.1 109/TP AMI.2013.50.
[2] Mehrdad J. Gangeh, Ahmed K. Farahat, Ali
Ghodsi, and Mohamed S. Kamel. Supervised
dictionary learning and sparse representation-
a review . ArXiv , abs/1502.05928, 2015. doi:
https://doi.or g/10.48550/arXiv .1502.05928.
[3] Huan Liu and Hiroshi Motoda. Featur e Extraction,
Construction and Selection: A Data Mining Perspec-
tive . Kluwer Academic Publishers, Norwell, MA,
USA, 1998.
[4] Y unqian Ma and Y un Fu. Manifold Learning Theory
and Applications . CRC Press, Inc., USA, 1st edition,
201 1.
[5] Michael A. Nielsen. Neural Networks and Deep Learn-
ing , page 216. Determination Press, 2018.
[6] W . T ang, A. Panahi, H. Krim, and L. Dai. Anal-
ysis dictionary learning based classification: Struc-
ture for robustness. IEEE T ransactions on Im-
age Pr ocessing , 28(12):6035–6046, Dec 2019. doi:
http://doi.or g//10.1 109/TIP .2019.2919409.
[7] Dino Vlahek. Učinkovit iterativni algoritem učenja ra-
zložljivih značilnic za izboljšano klasifikacijo. PhD
thesis, UM FERI , 2024.