ELEKTROTEHNI ˇ SKI VESTNIK 91(1-2): 47–52, 2024 ORIGINAL SCIENTIFIC PAPER Advancements in Gait Recognition: A Study on Gait Energy Images and Gait Entropy Images Stella Dumenˇ ci´ c 1 , Domagoj Pinˇ ci´ c 1 , Diego Suˇ sanj 1 , and ˇ Ziga Emerˇ siˇ c 2,† 1 University of Rijeka, Faculty of Engineering, Rijeka, Croatia 2 University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia † E-mail: ziga.emersic@fri.uni-lj.si Abstract. Gait recognition is a promising biometric modality due to its non-invasive nature and difficulty to disguise. However, the performance still lacks compared to other, well established biometric modalities. This paper presents results of our study on gait recognition, focusing on the comparison between Gait Energy Images (GEI) and Gait Entropy Images (GEnI) under various conditions. Different methodologies are explored, including deep learning techniques and Vision Transformers (ViTs), for feature extraction and classification. The popular CASIA–B dataset is used to evaluate the performance across different walking conditions and entropy measures. The effectiveness of gait recognition systems in accurately identifying individuals is shown, thus highlighting the potential of GEnI in enhancing the recognition performance under varying conditions. Keywords: Gait biometrics, GEI, Gait Entropy Image, CNN Napredek pri prepoznavanju hoje: ˇ studija o energijskih slikah hoje in entropijskih slikah hoje Prepoznavanje hoje je obetavna biometriˇ cna modalnost zaradi svoje neinvazivne narave in teˇ zav pri prirejanju. Zmogljivost je ˇ se vedno slabˇ sa v primerjavi z drugimi, dobro uveljavljenimi biometriˇ cnimi pristopi. Ta ˇ clanek predstavlja rezultate naˇ se ˇ studije o prepoznavanju hoje, ki se osredotoˇ ca na primer- javo med energijskimi slikami hoje (GEI) in entropijskimi slikami hoje (GEnI) v razliˇ cnih pogojih. Predstavljena je analiza razliˇ cnih metodologij, vkljuˇ cno s tehnikami globokega uˇ cenja in transformatorji vida (ViT), za pridobivanje znaˇ cilk in klasifikacijo. Uporabljena je priljubljena zbirka podatkov CASIA–B za ocenjevanje zmogljivosti v razliˇ cnih pogojih hoje in meritvah entropije. Prikazana je uˇ cinkovitost sistemov za prepoznavanje hoje pri natanˇ cni identifikaciji posameznikov, s ˇ cimer je poudarjen potencial GEnI pri izboljˇ sanju uˇ cinkovitosti prepoznavanja v razliˇ cnih pogojih. 1 INTRODUCTION Each individual has its own gait which describes its unique way of walking. Unlike other biometric trait, such as facial features, iris patterns, ear shapes and fingerprints, the gait consists of several unique features. One of these features is the greater distance to the sensor, which does not require a direct interaction with a sensor such as a camera. Moreover, the inherent difficulty of altering the gait increases its reliability as a biometric identifier and reduces the possibility of fraud. Extraction of the gait data is possible even with low resolution sensors, e.g. surveillance cameras. The use of gait recog- nition includes the identification of individuals, which Received 22 February 2024 Accepted 18 March 2024 can be used in access control mechanisms, surveillance operations, and criminal investigations. However, the implementation of gait biometrics in real-life scenarios is hampered by several limitations and challenges. First, the environmental factors that affect the images, such as lighting changes, shadows and occlu- sions, can significantly distort a person’s perceived gait, similar to some other biometric modalities [1], [2], [3]. Second, different camera angles can result in different appearances of the gait, even though the individual’s gait signature. Common wearing modalities such as bags, coats, hats or other accessories can visually alter a person’s gait and thus complicate the interpretation of gait recognition. In addition, the use of gait recognition raises issues of privacy, bias and discrimination based on physical characteristics or movement impairments. Two methods for gait recognition are described in the literature. The first method is based on compressing silhouettes corresponding to a single gait cycle of an individual into a consolidated image, resulting in a representation of the gait features [4], [5]. Han et al. [4] introduce the Gait Energy Image (GEI) to compresses binary silhouettes extracted by background subtraction from video images into a unified representation of the gait. The second method considers the gait as a sequence of individual silhouettes, each of which is used as an input to a feature extractor [6], [7], [8], [9], [10], [11]. Newer methods are mostly based on deep learning techniques. Since their introduction in 2012 [12], Con- volutional Neural Networks (CNNs) have significantly influenced image-based deep learning and are now one of the standards for gait recognition methodologies [6], 48 DUMEN ˇ CI ´ C ET. AL. Classification GEI Traditional CNNs Preprocessing CASIA-B GEnI Vision Transformers Preprocessing Figure 1. Overview of the proposed evaluation pipeline of two different gait recognition approaches: GEI vs. GEnI. [7], [13], [11]. Wu et al. [13] utilize CNNs for gait feature extraction using similarity learning. GaitSet [6] propose a custom CNN framework with triplet loss for learning gait rep- resentations by treating a person’s silhouette as a set. A detailed explanation of a standard gait recognition pipeline is given in previous works [9], [11]. In recent years, Vision Transformers (ViTs) have emerged as a direct challenger to CNNs for image classification tasks. Dosovitskiy et al. [14] introduce the ViT architecture. The method employs a standard transformer encoder used in natural language processing in the field of computer vision, domain and specifically targets image classification tasks. ViTs demonstrate a robust general- ization capability [14]. In contrast to CNNs, ViTs require fewer computational resources for training and at the same time have better modeling capabilities. Gait Entropy Images (GEnI) were developed as an improvement of GEI. Their use helps to accumulating the most significant motion information. GEnIs encap- sulate a variance of pixel values in silhouette images throughout an entire gait cycle into one image. In this way, the motion data is preserved and remains resilient to changes in covariate conditions that affect the appear- ance. Bashir et al. [15] present GEnIs for capturing the motion information and exhibiting resilience to changes in the appearance. Extensive experiments show that GEnIs outperform other methods, especially in scenarios involving significant appearance changes. However, the GEnI’s performance is susceptible to covariates that directly affect the gait itself. Rokanujjaman et al. [16] investigate gait signatures by segmenting the human body into smaller fragments and examining the effectiveness of combining these fragments step by step. By using dynamically weighted entropy-based gait representations as input features, their approach outperforms both, the whole-based and the part-based gait recognition methods. The results remain consistent even with a subset of features, indicating the robustness of the proposed approach and the importance of specific parts. Jevan et al. [17] propose a novel approach utilizing the Pal and Pal Entropy (GPPE) features for gait recognition. The Principal Component Analysis (PCA) is then used to extract salient features, followed by training and testing with a Support Vector Machine (SVM). Through rigorous experiments on both the Treadmill dataset and the CASIA datasets A, B, and C, the proposed method demonstrates a superior effec- tiveness in gait representation, highlighting its potential for a robust individual identification. The contributions of this paper are: • The utility of the Gait Entropy Images (GEnI) over the traditional Gait Energy Images (GEI) in handling appearance changes in gait recognition is demonstrated. • The performance of different entropy measures (Shannon, Renyi, Tsallis) in the context of gait recognition, with a focus on resilience to appear- ance changes, is evaluated. • The efficacy of two feature extraction methods, i.e. the PCA-LDA and Vision Transformers (ViTs), in enhancing the gait recognition accuracy, is com- pared. • Extensive experiments using the CASIA-B dataset to assess the gait recognition performance across various walking conditions and viewing angles are conducted. • Insights into the implications of using ViTs for gait feature extraction, demonstrating their potential over conventional methods in certain scenarios, are provided. 2 MATERIALS AND METHODS 2.1 Datasets The dataset used was CASIA-B [18]. It contains 124 subjects, three distinct walking conditions and 11 different camera viewing angles (from 0 to 180 with an increment of 18°). The walking conditions are divided into three categories: normal walking (NM), walking ADV ANCEMENTS IN GAIT RECOGNITION: A STUDY ON GAIT ENERGY IMAGES AND GAIT ENTROPY IMAGES 49 while carrying a bag (BG) and walking with a coat or jacket (CL). NM has six sequences per subject, while BG and CL conditions have two sequences per subject. Each subject has a total of 110 sequences, resulting in a total number of nearly 1,118,000 silhouette images. The dataset is divided into a train and a test subset with the first 74 subjects used for training and the remaining 50 subjects used for testing. The first four sequences of the NM modality are assigned to the gallery, while the remaining six NM sequences together with the BG and CL sequences are assigned to the queries. In our study, no differentiation is made between the camera viewing angles. 2.2 Gait Energy Image The standard image preprocessing methods [9], [19], [20] are applied to the dataset used. First, the image noise is filtered. Then the silhouettes for each subject are extracted in a binary format, typically using methods such as background subtraction. The images are then standardized to ensure a uniform height and horizontal alignment of all silhouettes. In the next step, the gait cycle is estimated to generate a final representation of the gait. The image-based gait features are used in the form of GEIs [4]. GEIs capture the static features of a gait sequence, for example the subject’s body shape, and the dynamic aspects, including the frequency and phase variations during locomotion. The GEI representationG for a specific gait cycle is calculated using the following formula: G(i,j) = 1 N N X t=1 I(i,j,t), (1) where N is the number of the silhouette images in the gait cycle, t is the image number at a specific point in time within the gait cycle and I(i,j) is the original silhouette image with the coordinates (i,j) in a 2D image. Examples of the GEI representations for all three conditions in CASIA-B are shown in Figure 2. (a) NM (b) BG (c) CL Figure 2. CASIA-B Gait Energy Images, with walking con- ditions divided into three categories: normal walking (NM), walking while carrying a bag (BG) and walking with a coat or jacket (CL). 2.3 Gait Entropy Image In the context of the size-normalized and centered sil- houettes representing a gait cycle, the Shannon entropy [21] is calculated for each pixel in the silhouette images. It is used to quantify the uncertainty associated with a random variable. By treating the intensity value of the silhouettes at a specific pixel position as a discrete random variable, the entropy of this variable over the gait cycle is: H S (i,j) = − K X k=1 p k (i,j)· log 2 (p k (i,j)), (2) where i and j are the pixel coordinates and p k (i,j) is the probability that the pixel takes on the K-th value [15]. Since the silhouette images are binary, the number of levels used for the entropy calculation is K = 2. (a) Shannon, NM (b) Shannon, BG (c) Shannon, CL (d) Renyi (α = 0.9), NM (e) Renyi (α = 0.9), BG (f) Renyi (α = 0.9), CL (g) Tsallis (q = 0.5), NM (h) Tsallis (q = 0.5), BG (i) Tsallis (q = 0.5), CL Figure 3. Resulting GEnI images for each entropy type and walking condition. In the first column normal walking (NM), in the middle column walking while carrying a bag (BG), and in the last column walking with a coat or jacket (CL). In the top row the Shannon entropy, in the middle row Renyi, and in the bottom row Tsallis entropy types. Since the Shannon-based GEnI images show an im- provement compared to the GEI images, the effects of other types of the entropy, namely the Renyi and Tsallis entropy measures, are analyzed. All three entropies are used for different problems and compared [22], [23], [24], [25], [26], [27]. Both, the Renyi and Tsallis entropy measures are generalizations of the Shannon entropy. The calculation of the Renyi-based [28] GEnI image is performed as 50 DUMEN ˇ CI ´ C ET. AL. follows: H R,α (i,j) = 1 1− α · log 2 K X k=1 p k (i,j) α ! , (3) where α is the order of the entropy measure or a parameter that determines its behavior. The Tsallis entropy [29] is calculated using the equa- tion: H T,q (i,j) = 1 q− 1 · 1− K X k=1 p k (i,j) q ! , (4) whereq is the parameter that controls the degree of non- extensivity and has a similar effect to the parameter α in the Renyi’s entropy. For each entropy, the Gait Entropy Image G E (i,j) is derived by fitting and discretizing H(i,j) to ensure that its value falls in the range from 0 to 255, as given below: G E (i,j) = (H(i,j)− H min )· 255 H max − H min , (5) whereH min is the minimum calculated value andH max is the maximum calculated value for each GEnI image. Since GEnI is calculated based on an entire gait cycle, there are no issues with the temporal alignment. The resulting GEnI images for each entropy type and walking condition are shown in Figure 3. 2.4 Feature Extraction and Classification After preprocessing the initial silhouette images and converting them into GEI and GEnI images, the feature extraction and classification are performed using two feature extractors, i.e. the PCA-LDA combination and the Visual Transformer based method. After extracting GEI and GEnI from the video sequence, the image data is converted into a one- dimensional array by stacking the image columns on top of each other. The dimensionality of this array is reduced using the Principal Component Analysis (PCA) followed by the Linear Discriminant Analysis (LDA). This process is described in detail by Lenac et al. [9] and Hofmann et al. [30]. Based on a preliminary experimental assessment of the computation time, the accuracy and our previous research, the number of components to be extracted using PCA and LDA is set to 50 and 10, respectively. The second feature extractor is based on a self- supervised learning paradigm for learning discrimina- tive gait features. The DINO method is used, showing promising results on various computer vision tasks such as image classification and retrieval [31]. To adapt DINO to the gait-specific data, the input sizes and augmenta- tions used in training are modified. DINO demonstrates the ability to segment objects in the foreground. This is important in gait recognition scenarios where people stand out from the background. Given the limited data in gait datasets for training the ViT models from scratch, a fine-tuning strategy is applied where DINO is first trained on the ImageNet dataset and then fine-tuned for the gait data. With this approach, DINO can be used as a feature extractor to generate discriminative features for subsequent classification tasks. A small ViT model set up by Touvron et al. [32], with a patch size of 8, is used. PCA is applied to the training set. Both the gallery and query are transformed into 50 features per image. LDA is used to learn class-conditional densities from the transformed gallery and further transforms the query into 10 features per image, which are used for the classifi- cation. The DINO feature extractor is trained with the images from the training subsets. Then the gait feature for the gallery and query images is extracted and used for the classification. The classification is performed using the k-nearest neighbors (kNN) algorithm. 3 RESULTS AND DISCUSSION The tests are performed with GEI and GEnI images. For GEnI images, the Shannon, Renyi and Tsallis entropies are used. For both entropies, different values ofα andq are examined, ranging from 0.1 to 5.0. Table 1 compares the results of the GEI and GEnI preprocessing ap- proaches with two feature extraction extractor pipelines for all three gait walking conditions. GEI outperforms all GEnI approaches with the PCA- LDA features. With ViT-8 it achieves the best result for the BG condition. Tsallis GEnI with q = 1.5 achieves the highest accuracy for the NM conditions with ViT-8. Renyi with α = 0.9 proves to be the best for both the CL conditions and overall accuracy with ViT-8. Comparing the results between the entropies, shows that the Shannon entropy does not perform best in any of the categories. Tsallis performs best with PCA-LDA and Renyi with ViT-8. In the overall ranking, GEnI based on the Shannon entropy performs better than Renyi on three different α values and better than Tsallis on two different q values for the PCA-LDA feature extractor. ViT-8 strongly favors Renyi and Tsallis compared to the Shannon entropy and even GEI. 4 CONCLUSION Our analysis reveals that Gait Energy Images (GEI) excel when coupled with the traditional feature extrac- tion methods such as PCA-LDA, affirming their com- patibility with the established analytical frameworks. Oppositely, Vision Transformers (ViTs) demonstrate a notable preference for Gait Entropy Images (GEnI) processed with the Renyi and Tsallis entropies. This distinction suggests that ViTs’ advanced learning ca- pabilities, particularly in recognizing and distinguishing shapes, are better leveraged by the nuanced information ADV ANCEMENTS IN GAIT RECOGNITION: A STUDY ON GAIT ENERGY IMAGES AND GAIT ENTROPY IMAGES 51 Image type α,q PCA-LDA ViT-8 NM BG CL Overall NM BG CL Overall GEI 90.45 66.48 40.82 65.92 99.18 82.24 25.27 68.90 GEnI: Shannon 88.27 55.83 35.91 60.01 99.18 77.86 31.09 69.38 GEnI: Renyi 0.5 85.73 57.93 35.82 59.82 98.73 79.04 31.91 69.89 0.7 87.91 57.66 36.09 60.55 98.73 79.22 32.09 70.01 0.9 87.91 55.75 36.00 59.88 98.91 78.68 32.82 70.14 1.5 87.64 53.37 33.18 58.07 99.18 78.14 31.09 69.47 GEnI: Tsallis 0.5 86.91 58.93 36.27 60.70 98.55 79.04 32.36 69.98 0.7 88.27 57.20 36.45 60.64 98.82 78.68 32.09 69.87 0.9 87.91 55.75 36.09 59.92 99.00 78.87 32.27 70.05 1.5 88.36 54.47 35.64 59.49 99.27 78.32 31.82 69.80 Table 1. Accuracy scores for both approaches for the feature extraction on GEI and GEnI input images for all walking conditions of CASIA-B, where NM, BG and CL denote normal walking, walking while carrying a bag, and walking with a coat or jacket respectively. captured by the Renyi and Tsallis entropies. By provid- ing a detailed representation of the variability around the silhouette of a person, these entropies enhance the model’s ability to discern subtle differences in the gait patterns. Besides highlighting the evolving landscape of the gait recognition technologies, this study also shows a promising direction for a future research in optimizing feature extraction techniques to leverage the strengths of contemporary deep learning models. REFERENCES [1] ˇ Z. Emerˇ siˇ c, T. Ohki, M. Akasaka, T. Arakawa, S. Maeda, M. Okano, Y . Sato, A. George, S. Marcel, I. Ganapathi et al., “The unconstrained ear recognition challenge 2023: Maximizing performance and minimizing bias.” [2] A. Hrovatiˇ c, P. Peer, V . ˇ Struc, and ˇ Z. Emerˇ siˇ c, “Efficient ear alignment using a two-stack hourglass network,” IET Biometrics, 2023. [3] ˇ Z. Emerˇ siˇ c, D. Suˇ sanj, B. Meden, P. Peer, and V . ˇ Struc, “Con- texednet: Context–aware ear detection in unconstrained settings,” IEEE Access, pp. 1–1, 2021. [4] J. Han and B. Bhanu, “Individual recognition using gait energy image,” IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 2, pp. 316–322, 2005. [5] C. Wang, J. Zhang, L. Wang, J. Pu, and X. Yuan, “Human iden- tification using temporal information preserving gait template,” IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 11, pp. 2164–2176, 2011. [6] H. Chao, Y . He, J. Zhang, and J. Feng, “Gaitset: Regarding gait as a set for cross-view gait recognition,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 8126–8133. [7] C. Fan, Y . Peng, C. Cao, X. Liu, S. Hou, J. Chi, Y . Huang, Q. Li, and Z. He, “Gaitpart: Temporal part-based model for gait recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 14 225– 14 233. [8] R. Liao, S. Yu, W. An, and Y . Huang, “A model-based gait recognition method with body pose and human prior knowledge,” Pattern Recognition, vol. 98, p. 107069, 2020. [9] K. Lenac, D. Suˇ sanj, A. Ramaki´ c, and D. Pinˇ ci´ c, “Extending appearance based gait recognition with depth data,” Applied Sciences, vol. 9, no. 24, p. 5529, 2019. [10] A. Ramaki´ c, D. Suˇ sanj, K. Lenac, and Z. Bundalo, “Depth-Based Real-Time Gait Recognition,” Journal of Circuits, Systems and Computers, p. 2050266, 2020. [11] D. Pinˇ ci´ c, D. Suˇ sanj, and K. Lenac, “Gait recognition with self- supervised learning of gait features based on vision transform- ers,” Sensors, vol. 22, no. 19, p. 7140, 2022. [12] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classi- fication with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012. [13] Z. Wu, Y . Huang, L. Wang, X. Wang, and T. Tan, “A compre- hensive study on cross-view gait based human identification with deep cnns,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 2, pp. 209–226, 2016. [14] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020. [15] K. Bashir, T. Xiang, and S. Gong, “Gait recognition using gait entropy image,” 2009. [16] M. Rokanujjaman, M. A. Hossain, M. R. Islam, and M. S. Islam, “Effective part definition for gait identification using gait entropy image,” in 2013 International Conference on Informatics, Elec- tronics and Vision (ICIEV). IEEE, 2013, pp. 1–4. [17] M. Jeevan, N. Jain, M. Hanmandlu, and G. Chetty, “Gait recog- nition based on gait pal and pal entropy image,” in 2013 IEEE International Conference on Image Processing. IEEE, 2013, pp. 4195–4199. [18] S. Yu, D. Tan, and T. Tan, “A framework for evaluating the effect of view angle, clothing and carrying condition on gait recogni- tion,” in 18th International Conference on Pattern Recognition (ICPR’06), vol. 4. IEEE, 2006, pp. 441–444. [19] R. Liao, W. An, Z. Li, and S. S. Bhattacharyya, “A novel view synthesis approach based on view space covering for gait recognition,” Neurocomputing, vol. 453, pp. 13–25, 2021. [20] J. Kovaˇ c, V . ˇ Struc, and P. Peer, “Frame–based classification for cross-speed gait recognition,” Multimedia Tools and Applica- tions, vol. 78, pp. 5621–5643, 2019. [21] C. Shannon and W. Weaver, “The mathemtiatical theory of comnunication,” Urbana: University of Illinois Press, 1949. [22] M. Masi, “A step beyond tsallis and r´ enyi entropies,” Physics Letters A, vol. 338, no. 3-5, pp. 217–224, 2005. [23] T. Maszczyk and W. Duch, “Comparison of shannon, renyi and tsallis entropy used in decision trees,” in Artificial Intelligence and Soft Computing–ICAISC 2008: 9th International Conference Zakopane, Poland, June 22-26, 2008 Proceedings 9. Springer, 2008, pp. 643–651. [24] C. F. L. Lima, F. M. de Assis, and C. P. de Souza, “An 52 DUMEN ˇ CI ´ C ET. AL. empirical investigation of attribute selection techniques based on shannon, r´ enyi and tsallis entropies for network intrusion detection,” American Journal of Intelligent Systems, vol. 2, no. 5, pp. 111–117, 2012. [25] D. Suˇ sanj, V . Tuhtan, L. Lenac, G. Gulan, I. Koˇ zar, and ˇ Z. Jeriˇ cevi´ c, “Using entropy information measures for edge de- tection in digital images,” in 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). IEEE, 2015, pp. 352–355. [26] J.-H. Ou and Y . K. Ho, “Shannon, r´ enyi, tsallis entropies and onicescu information energy for low-lying singly excited states of helium,” Atoms, vol. 7, no. 3, p. 70, 2019. [27] O. Olendski, “R´ enyi and tsallis entropies: three analytic exam- ples,” European Journal of Physics, vol. 40, no. 2, p. 025402, 2019. [28] A. R´ enyi, “On measures of entropy and information,” in Pro- ceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, vol. 4. University of California Press, 1961, pp. 547–562. [29] C. Tsallis, “Possible generalization of boltzmann-gibbs statis- tics,” Journal of statistical physics, vol. 52, pp. 479–487, 1988. [30] M. Hofmann, S. Bachmann, and G. Rigoll, “2.5D gait biometrics using the Depth Gradient Histogram Energy Image,” in IEEE Fifth International Conference on Biometrics: Theory, Applica- tions and Systems (BTAS 2012), Arlington, Virginia, Sep 2012, pp. 399–403. [31] M. Caron, H. Touvron, I. Misra, H. J´ egou, J. Mairal, P. Bo- janowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, 2021, pp. 9650–9660. [32] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. J´ egou, “Training data-efficient image transformers & distillation through attention,” in International Conference on Machine Learning. PMLR, 2021, pp. 10 347–10 357. Stella Dumenˇ ci´ c is a researcher and PhD student at the University of Rijeka, Faculty of Engineering. She received her Bachelor’s degree in Computing in 2021 and her Master’s degree in Computing in 2023 from the University of Rijeka, Faculty of Engineering. Her research interests are in the fields of machine learning and mathematical optimizations. Dr. Domagoj Pinˇ ci´ c received the B.S., M.S. and Ph.D. degrees in Computer Engineering and Science from the Faculty of Engineering, University of Rijeka, in 2015, 2018 and 2022, respectively. His research interests include computer vision, machine learning, and image processing. Dr. Diego Suˇ sanj is a Postdoctoral Researcher at the University of Rijeka, Faculty of Engineering. He received his Bachelor’s degree in Computer Engineering in 2013, his Master’s degree in Computer Engineering in 2015 and his PhD in Computer Science in 2021 from the University of Rijeka, Faculty of Engineering. His research interests are in the fields of computer vision, machine learning, image processing and embedded systems. Dr. ˇ Ziga Emerˇ siˇ c graduated from the Faculty of Computer and Information Science, University of Ljubljana, where he also holds a position of assistant professor. His research interests include deep learning and biometrics which intersect with his teaching areas. He has co-authored more than 40 research papers and received multiple awards for teaching and research, including the European Association for Biometrics Award Max Snijder 2021.