https://doi.org/10.31449/inf.v46i2.3820 Informatica 46 (2022) 151–168 151 Unsupervised Deep Learning: Taxonomy and Algorithms Aida Chefrour 1,2,* and Labiba Souici-Meslati 2 E-mail: aida.chefrour@univ-soukahras.dz, souici_labiba@yahoo.fr 1 Computer Science Department, Mohamed Cherif Messaadia University, Souk Ahras, Algeria 2 LISCO Laboratory, Computer Science Department, Badji Mokhtar University, B.P-12 Annaba, 23000, Algeria Overview paper Keywords: clustering, deep learning, autoencoder, taxonomy Received: November 10, 2021 Clustering is a fundamental challenge in many data-driven application fields and machine learning techniques. The data distribution determines the quality of the outcomes, which has a significant impact on clustering performance. As a result, deep neural networks can be used to learn more accurate data representations for clustering. Many recent studies have focused on employing deep neural networks to develop a clustering-friendly representation, which has resulted in a significant improvement in clustering performance. We present a systematic survey of clustering with deep learning in this study. Then, a taxonomy of deep clustering is proposed, as well as some sample algorithms for our overview. Finally, we discuss some exciting future possibilities for clustering using deep learning and offer some remarks. Povzetek: Ta članek opisuje metode globokega združevanja v skupine in predlaga taksonomijo globokega združevanja v skupine. 1 Introduction Clustering is one of the most important aspects of unsupervised machine learning. Its main goal is to separate a data set into subsets or clusters so that data values in the same cluster have some common characteristics or attributes. It aims to divide the data into groups (clusters) of similar objects. The objects in the same cluster are more identical to each other than to those in other clusters. Clustering is widely used in Artificial Intelligence, pattern recognition, statistics, and other information processing fields. The input of a cluster analysis system is a set of samples and a measure of similarity (or dissimilarity) between two samples. The output is a set of clusters that form a partition, or a structure of partitions of the data set. Generally, finding clusters is not a simple task and the current clustering algorithms take a long time when they are applied to large databases [1]. In addition, the transformation of input data into a feature space where separation is easier concerning the problem's context, dimensionality reduction, and representation learning has been widely applied to clustering, because the similarity measurements utilized in these procedures are ineffective. Existing data transformation methods generally include linear transformations such as Principal Component Analysis (PCA) and non-linear transformations such as kernel approaches and spectral methods [2]. * Corresponding author To solve this problem, Deep Neural Networks (DNNs) are used to train non-linear mappings that allow the data to be transformed into clustering-friendly representations because they have a significant non-linear transformation feature. In this paper, we refer to clustering approaches involving deep learning as deep clustering for simplicity. In our research, we focus on Deep Clustering, which represents a family of clustering algorithms that adopt deep neural networks to learn cluster-oriented features [3]. Deep clustering has recently become popular as a method for data classification and feature representation discovery, a solution for large-scale and high-dimensional learning problems [4,5] We were particularly interested in the studies conducted in deep clustering for image recognition. We give an overview of deep clustering to review most methods and implementations in this field. The main contributions treated in this paper are: • use of a deep autoencoder for embedding the data into a lower-dimensional space; • integrate the extracting intermediate features phase and the performing phase of the traditional clustering algorithm; • employ the similarity of the representation features if they are assigned to the same cluster; • add dimensionality reduction and temporal clustering into a single unsupervised learning framework; 152 Informatica 46 (2022) 151–168 A. Chefrour et al. • apply the impressive ability to deal with unsupervised learning for structure analysis of high-dimensional visual data; • find a solution to the problem of subspace clustering by partitioning data drawn from a union of multiple subspaces. The contribution of this study is (1) to provide an overview of various deep learning-based clustering algorithms. It includes an explanation of the most recent improvements in unsupervised clustering; (2) propose a taxonomy of methods that use deep learning for clustering. We chose to synthesize studies published in the previous 3-4 years since they used deep learning to increase unsupervised clustering performance. On the MNIST dataset, several algorithms achieve more than 96% accuracy without using a single labeled datapoint. However, for more difficult datasets like CIFAR-10 and ImageNet, they are still a long way from achieving good accuracy. We'll go over all of the most recent deep learning- based clustering approaches in this article. The aim of most of these strategies is to discover feature representation and solve the problem of large-scale, high- dimensional learning, as well as to respond to the contributions mentioned above. The rest of the paper is organized as follows. In the next section, we survey in brief the literature on deep clustering overviews. We present the most recent works using unsupervised deep learning in section 3, with a synthesis of all of this work in section 4. In section 5, we describe the proposed taxonomy of clustering with deep learning algorithms and we introduce some representative methods. Section 6 includes a conclusion and proposals for further research. 2 Related work Several custom taxonomies for clustering with deep learning have been proposed in the literature. In this section, we outline the best known and most recent ones: [6] focus on a review of deep learning for multimodal data fusion, which provides readers with the fundamentals of the multimodal deep learning fusion methods. This study summarizes the representative architectures— DBN, SAE, CNN, and RNN—which are fundamental to understanding multimodal deep learning fusion models. This work summarizes the pioneering multimodal deep learning fusion models from the task, model framework, and data set perspectives, and groups them by the deep learning architecture used. [2] divide deep clustering algorithms into four categories: AE-based (Autoencoder), CDNN-based (Clustering DNN), VAE-based (Variational encoder), and GAN-based deep clustering (Generative Adversarial Network). Each category has some representative methods as well. • (a) AE-based has a (1) Deep Clustering Network (DCN), which combines an autoencoder with the k-means algorithm; (2) Deep Embedding Network (DEN) to extract effective representations for clustering, which utilizes a deep autoencoder; (3) Deep Subspace Clustering Networks (DSC-Nets) which introduces a novel autoencoder architecture; (4) Deep Multi-Manifold Clustering (DMC); (5) Deep Embedded Regularized Clustering (DEPICT); and (6) Deep Continuous Clustering (DCC); • (b) CDNN-based deep clustering algorithms can be divided into three categories according to the way of network initialization, i.e., unsupervised pre-trained (Deep Nonparametric Clustering (DNC), Deep Embedded Clustering (DEC), Discriminatively Boosted Clustering (DBC)), supervised pre-trained (Clustering Convolutional Neural Network (CCNN)), randomly initialized (non-pre-trained) (Information Maximizing Self-Augmented Training (IMSAT), Joint Unsupervised Learning (JULE) and Deep Adaptive Image Clustering (DAC)); • (c) VAE-based deep clustering, which can be considered as a generative variant of AE. It presents two algorithms: (1) Variational Deep Embedding (VaDE) and (2) Gaussian Mixture VAE (GMVAE); • (d) GAN-based deep clustering contains a (1) Deep Adversarial Clustering (DAC), (2) Categorial Generative Adversarial Network (CatGAN), and (3) Information Maximizing Generative Adversarial Network (InfoGAN). [7] propose a taxonomy of clustering algorithms that employ deep learning. Their taxonomy helps the user to see what methods are available and to create new ones by combining the best characteristics of existing methods in a simple context. This taxonomy's main principle is representation learning with DNNs and using these representations as input to a specific clustering approach. Every method is divided into the following parts, each of which has a variety of options: (1) Architecture of the main neural network branch (Multilayer perceptron (MLP), Convolutional neural network (CNN) and Deep Belief Network (DBN)); (2) Set of deep features used for clustering (one layer, several layers); (3) Non- clustering loss (No non-clustering loss, Autoencoder reconstruction loss); (4) Clustering loss (No clustering loss, k-Means loss, Cluster assignment hardening, Balanced assignments loss, Locality-preserving loss, Group sparsity loss, Cluster classification loss, and Agglomerative clustering loss); (5) Method to combine the losses (Pre-training, fine-tuning, Joint training and Variable schedule); (6) Cluster updates (Jointly updated with the network model, and Alternatingly updated with the network model); (7) After network training (Clustering a similar dataset and Obtaining better results). The methods which use this taxonomy are Deep Embedded Clustering (DEC), Deep Clustering Network (DCN), Discriminatively Boosted Clustering (DBC), Joint Unsupervised Learning of Deep Representations and Image Clusters (JULE), and Clustering CNN (CCNN). Unsupervised Deep Learning: Taxonomy and Algorithms Informatica 46 (2022) 151–168 153 [8] propose a simplified taxonomy based on deep clustering algorithms' overall procedural structure or design. Deep Clustering techniques can be classified into three broad families according to this taxonomy: (a) Sequential multistep Deep Clustering approaches: these approaches have two basic steps. The first stage involves learning richer deep (also known as latent) representation of the input data, followed by clustering on this deep or latent representation in the second step; (b) Joint Deep Clustering approaches: Instead of two independent processes for representation learning and clustering, this family of approaches includes a step where the representation learning is intimately associated with the clustering. Tight coupling is usually achieved by optimizing a combined or joint loss function that promotes good reconstruction while accounting for some sort of data grouping, clustering, or codebook representation; (c) Closed-loop multistep Deep Clustering approaches: Similar to the first family (sequential multistep Deep Clustering), this family of algorithms has two key phases that alternate in an iterative loop rather than being conducted in a single feedforward linear approach. 3 Contributions of deep clustering In recent years, we have noticed that there are many applications in the field of deep learning using unsupervised learning algorithms for image recognition. We now discuss some of the most common deep clustering approaches. [9] find that existing deep clustering algorithms either do not take advantage of convolutional neural networks well enough or do not preserve the local structure of data- generating distribution in the learned feature space sufficiently. In this research, they suggest a deep convolutional embedded clustering method as a solution to this problem. They create a convolutional autoencoder structure to learn embedded features from start to finish. Then, using embedded features, a clustering-oriented loss is created to accomplish feature refinement and cluster assignment simultaneously. They keep the decoder, which can preserve the local structure of data in feature space, to avoid feature space being affected by clustering loss. In summary, they minimize both the reconstruction and clustering losses of convolutional autoencoders. Mini- batch stochastic gradient descent with back-propagation can effectively solve the resulting optimization issue. Experiments on benchmark datasets (MNIST-full, MNIST-test, and USPS) empirically verify the usefulness of local structure preservation and the power of convolutional autoencoders for feature learning in terms of accuracy (acc) and the normalized mutual information (NMI). DeepCluster [10] is a clustering algorithm developed by the authors that learn both the parameters of a neural network and the cluster assignments of the generated features. DeepCluster uses a typical clustering technique, k-means, to iteratively group the features and uses the following assignments as supervision to update the network's weights. They use DeepCluster to train convolutional neural networks unsupervised on big datasets like ImageNet and YFCC100M, using accuracy criteria evaluation. On all typical benchmarks, the generated model exceeds the present state of the art by a significant margin. This study's [11] concern is that data representation affects the performance of subspace clustering. Subspace clustering data representation translates data from one space to another with higher separability. In recent years, a slew of new data visualization techniques has emerged. Low-rank representation (LRR) and an autoencoder are two examples. LRR is a low-rank constraint linear representation method that captures the global structure of data. An autoencoder, on the other hand, uses a neural network to nonlinearly map data into a latent space by minimizing the difference between the reconstruction and the output. The authors of this work suggest a unique data representation approach for subspace clustering that combines the benefits of an LRR (globality) and an autoencoder (self-supervision-based locality). The low- rank constrained autoencoder (LRAE) method introduced in this research causes the neural network's latent representation to be of low rank, and the low-rank constraint is derived as a prior from the input space. One of the most significant advantages of the LRAE is that the learned data representation not only preserves the data's local properties but also serves as a precursor to the underlying low-rank global structure. Extensive subspace clustering tests were carried out on a variety of datasets (MNIST, COIL-100, and ORL), using ACC, NMI, and the adjusted rand index (ARI). They showed that the suggested LRAE outperformed state-of-the-art subspace clustering approaches significantly. The researchers in this paper [12] created a hybrid autoencoder (BAE) model for image clustering by combining three AE-based models: the convolutional autoencoder (CAE), adversarial autoencoder (AAE), and stacking autoencoder (SAE). The MNIST and CIFAR-10 datasets are used to test the suggested models' results and compare them to those of other researchers. The proposed models outperform others in the numerical experiment, according to the clustering criteria: ACC, NMI, and ARI. GANs have demonstrated great performance in a variety of unsupervised learning problems, and clustering is unquestionably an important unsupervised learning challenge. While the latent-space back-projection in GANs could be used to cluster, they show that the cluster structure is not preserved in the GAN latent space. ClusterGAN is a new mechanism for clustering using GANs proposed by the authors in this study [13]. They achieve clustering in the latent space by sampling latent Figure 1: The structure of proposed Convolutional AutoEncoders (CAE) for MNIST [9]. 154 Informatica 46 (2022) 151–168 A. Chefrour et al. variables from a mixture of one-hot encoded variables and continuous latent variables, together with an inverse network (which projects the data to the latent space) trained jointly with a clustering specific loss. GANs can maintain latent space interpolation across categories, even though the discriminator is never exposed to such vectors, according to their findings. They compared their results to a variety of clustering benchmarks (MNIST, Synthetic, Fashion-10,6 Fashion-5, 10x_73k, and Pendigits) and showed that they outperformed them on both synthetic and real-world datasets according to the following evaluation criteria: ACC, NMI, and ARI. This work [14] proposes a new approach to this study, in which the embedding is performed using a differentiable model such as a deep neural network. They create a fully differentiable loss function that can be minimized concerning both the embedding parameters and the cluster parameters via stochastic gradient descent by rewriting the k-means clustering method as an optimal transport problem and adding an entropic regularization. They show that by including limits on cluster sizes, this new formulation generalizes a previously suggested state- of-the-art soft-k-means technique. According to empirical evaluations of image classification benchmarks (MNIST, CIFAR-10), their optimum transport-based technique provides greater unsupervised accuracy and does not require a pre-training step when compared to state-of-the- art methods. The researchers of this work [15] present a deep Generative Adversarial Clustering Network (ClusterGAN) in this publication, which addresses the challenges of unsupervised deep clustering model training. ClusterGAN is made up of three networks that include a discriminator, a generator, and a clustered (i.e. a clustering network). They use an adversarial game between these three players to use the generator to synthesize actual samples given discriminative latent variables, and the clustered to learn the inverse mapping of the real samples to the discriminative embedding space. Furthermore, they use a conditional entropy minimization loss to increase/decrease Intra/inter-cluster sample similarity. Because the ground-truth similarities in the clustering task are unknown, they offer a new balanced self-paced learning algorithm for gradually incorporating data into training from simple to tough while taking into account the diversity of selected samples from all clusters. Their unsupervised learning approach allows them to train clusters with a lot of depth quickly. On numerous datasets (MNIST, USPS, FRGC, CIFAR-10, and STL-10), ClusterGAN produces competitive outcomes when compared to state-of-the-art models, according to experimental results, using accuracy criteria evaluation Acc and NMI. The main topic of this work [3] is that deep clustering outperforms conventional clustering by combining feature learning and cluster assignment. Although several deep clustering algorithms have been developed for various purposes, the majority of them fail to learn robust cluster- oriented features, resulting in poor final clustering performance. The authors suggest a two-stage deep clustering technique (ASPC-DA) that incorporates data augmentation and self-paced learning to overcome this challenge. They discover robust features in the first stage by training an autoencoder with examples that have been enhanced by random shifting and rotating the clean instances. Then, in the second stage, they alternate between finetuning the encoder with augmented examples and modifying the cluster assignments of the clean examples to encourage the learned features to be cluster- oriented. The center of the cluster to which the clean example is assigned is the target of each augmented example in the loss function during finetuning of the encoder. The targets could be computed improperly, and the encoder network could be misled by instances of inaccurate targets. They use adaptive self-paced learning to select the most confident instances in each iteration to stabilize the network training. Extensive testing shows that their algorithm outperforms the competition on four image datasets (MNIST-full, MNIST-test, USPS, and Fashion) in terms of ACC and NMI. The authors of this study [16] present a system for improving unsupervised clustering performance using semi-supervised models called Kingdra. To use semi- supervised models, they must first create pseudo-labels, which are automatically generated labels. Prior approaches to creating pseudo-labels have been found to degrade clustering performance due to their low precision. Figure 3: Kingdra overview . They train all the models using the unlabeled samples, in step 1. In step 2, they construct a graph modeling the pairwise agreement of the models. In step 3, they get k high confidence clusters by pruning out data points for which the models do not agree. In step 4 they take the high confidence clusters and generate pseudo labels. In step 5 they train the models using both unlabeled samples and pseudo labeled samples. They iterate from step 2 to step 5 and final clusters are generated [16]. Figure 2: ClusterGAN Architecture [13]. Unsupervised Deep Learning: Taxonomy and Algorithms Informatica 46 (2022) 151–168 155 Instead, they generate a similarity graph using an ensemble of deep networks, from which they extract high- accuracy pseudo labels. The method of employing ensembles to find high-quality pseudo-labels and training the semi-supervised model is iterated, resulting in continual improvement. For numerous image and text datasets, they show that their approach beats state-of-the- art clustering findings. To evaluate their method, they used the accuracy evaluation criteria and five datasets (MNIST, STL, CIFAR10, Reuters, and 20news). They reached 54.6 % accuracy for CIFAR-10 and 43.9 % for 20 news. In [17], discriminative models are the most common in the literature, and they produce the best results. These algorithms learn a deep discriminative neural network classifier with latent labels. As is common in supervised learning, they typically use multinomial logistic regression posteriors and parameter regularization. Discriminative objective functions (e.g., those based on mutual information or KL divergence) are generally thought to be more flexible than generative approaches (e.g., K-means) in that they make fewer assumptions about data distributions and, as a result, produce much better unsupervised deep learning results. Several contemporary discriminative models may appear to be unrelated to K- means at first glance. Under mild conditions, these models are similar to K-means, common posterior models, and parameter regularization, as demonstrated in this paper. The authors show that maximizing the L2 regularized mutual information via an approximate alternating direction method (MI-ADM) for commonly used logistic regression posteriors is comparable to minimizing a soft and regularized K-means loss. Their theoretical study not only ties numerous recent state-of-the-art discriminative models directly to K-means but also leads to a novel soft and regularized deep K-means algorithm that performs well on a variety of image clustering benchmarks. They used the accuracy and normalized mutual information criteria for the evaluation of five datasets: USPS, MNIST, YTF, CMU-PIE, and FRGC. The researchers [18] introduced a new clustering objective that develops a neural network classifier from the start using only unlabeled input samples. In eight unsupervised clustering benchmarks spanning image classification and segmentation, the model discovers clusters that accurately match semantic classes, delivering state-of-the-art performance. These include STL10, an unsupervised ImageNet variation, and CIFAR10, which outperformed their closest competitors by 6.6 and 9.5 absolute percentage points, respectively. The strategy isn't limited to computer vision and can be applied to any paired dataset sample; in their studies, they used random transforms to generate a pair from each image. Instead of high-dimensional representations that require further processing to be useable for semantic clustering, the trained network outputs semantic labels directly. The goal is simple: to maximize the mutual information between each pair's class assignments. It's simple to use and is firmly rooted in information theory, so it easily avoids the degenerate solutions that other clustering algorithms are prone to. The experiments used four datasets: STL10, CIFAR10, CIFAR 100-20, and MNIST. They examine two semi-supervised settings in addition to the unsupervised mode. The first achieves a global state-of- the-art of 88.8% accuracy in STL10 classification, surpassing all current approaches (whether supervised, semi-supervised or unsupervised). The second reveals that it can withstand 90 percent reductions in label coverage, which is useful for applications that just need a few labels. In [19], the authors of this paper discuss a variant of variationally-oriented autoencoders where the superstructure of latent variables is on top of the features of the autoencoders. Their model is based on a tree structure that consists of multiple super latent variables. When there is only one active variable in the superstructure, it generates a model that assumes the latent features of that variable are generated by the Gaussian mixture model. The model, known as the Latent Tree Variational AutomaticEncoder (LTVAE) learns by creating multiple partitions of data, each containing a super latent variable. It is a type of deep learning method that produces multiple partitions of data. This method allows us to partition high-dimensional data into multiple ways. To evaluate this model, they used four datasets: MNIST, STL-10, Reuters, and HHAR, the criteria for clustering accuracy. In [20], to resolve the problem of high-dimensional dataset clustering difficulties, the authors of this paper describe a clustering approach that simultaneously conducts nonlinear dimensionality reduction and clustering. A deep autoencoder embeds the data in a lower-dimensional space. As part of the clustering process, the autoencoder is optimized. The resulting network generates data that is clustered. The proposed method, Deep Continuous Clustering (DCC) does not rely on knowing the number of ground-truth clusters in advance. The optimization of a global continuous objective is used to combine nonlinear dimensionality reduction and clustering. As a result, they avoid the discrete reconfigurations of the objective that previous clustering algorithms are known for. Experiments on six datasets (MNIST, Coil100, YTF, YaleB, Reuters, and RCV1) using the accuracy evaluation criteria (AMI) show that the proposed approach outperforms current clustering approaches, including deep network-based approaches like k-means, DBSCAN, AC-W, SEC, LDMGI, GDL, and RCC. Deep clustering through a Gaussian-mixture variational autoencoder (VAE) with Graph embedding is proposed by the authors in [21]. They use the Gaussian mixture model (GMM) as the prior in VAE to make clustering easier. They use graph embedding to handle data with a complicated spread. Their hypothesis is that graph data, which captures local data structures, is a great complement to deep GMM. When they're combined, the network can develop more powerful representations that adhere to global models and local structural restrictions. As a result, their method unites model-based and similarity-based clustering methodologies. They propose a novel stochastic extension of graph embedding to combine graph embedding with probabilistic deep GMM: they consider samples as nodes on a graph and minimize 156 Informatica 46 (2022) 151–168 A. Chefrour et al. the weighted distance between their posterior distributions. The distance is calculated using the Jenson- Shannon divergence. They integrate the deep GMM's divergence minimization and log-likelihood maximization. They came up with formulations to achieve a unified goal that allows deep representation learning and clustering to happen at the same time. Their findings on four datasets (MNIST, STL-10, Reuters, and HHAR) in terms of accuracy reveal that their suggested DGG outperforms recent deep Gaussian mixture approaches (model-based) and deep spectral clustering techniques (similarity-based). The benefits of integrating model- based and similarity-based clustering, as advocated in this paper, are highlighted by their findings. The authors [22] present a shared learning paradigm for discriminative embedding and spectral clustering in this research. To embed the inputs into a latent space for clustering, they first build a dual autoencoder network that enforces the reconstruction requirement for the latent representations and their noisy variants. As a result, the learned latent representations may be more noise-resistant. Then, to give more discriminative information from the inputs, mutual information estimation is used. Furthermore, a deep spectral clustering method is used to embed the latent representations in the eigenspace and then cluster them, allowing for optimal clustering outcomes by fully exploiting the link between inputs. Experiments on benchmark datasets (MNIST-full, MNIST-test, USPS, Fashion-10, and YTF) reveal that their strategy outperforms state-of-the-art clustering algorithms significantly (k-means, NMF,...) in terms of ACC and NMI. The researchers [23] offer a unique clustering framework called deep comprehensive correlation mining (DCCM) in this paper for analyzing and exploiting various types of correlations behind unlabeled data from three perspectives: 1) Pseudo-label supervision is presented as an alternative to employing only pair-wise information to examine category information and develop discriminative features. 2) The resilience of the features to picture alteration in the input space is completely studied, which aids network learning and boosts performance greatly. 3) For the clustering problem, triplet mutual information among features is introduced to lift the recently discovered instance-level deep mutual information to a triplet-level formation, which aids in the learning of more discriminative features. Extensive tests on a variety of tough datasets (CIFAR-10, CIFAR-100, STL-10, ImageNet-10, Imagenet-dog-15, and Tiny-ImageNet) in terms of ACC, NMI, and adjusted rand index (ARI) reveal that their method works well, with 62.3 % clustering accuracy on CIFAR-10, which is 10.1 % better than the state-of-the-art results (k-means, AE,...). By jointly maximizing a clustering loss and a non- clustering loss, deep clustering algorithms combine representation learning with clustering. In such systems, a deep neural network is combined with a clustering network to learn representations. Rather than using this framework to increase clustering performance, the researchers [24] offer a simpler method of maximizing the entanglement of an autoencoder's learned latent code representation. They define entanglement as the distance between pairs of points belonging to the same class or structure and pairs of points belonging to different classes or structures. They employ the soft closest neighbor loss and expand it by adding an annealing temperature factor to assess the entanglement of data points. The test clustering accuracy was 96.2% on the MNIST dataset, 85.6% on the Fashion-MNIST dataset, and 79.2% on the EMNIST Balanced dataset when they used their proposed approach, beating their baseline models. The Matching Priors and Conditionals for Clustering (MPCC) is a GAN-based model featuring an encoder for inferring latent variables and cluster categories from data and a flexible decoder for generating samples from a conditional latent space, according to the researchers of [25]. They show via MPCC that a deep generative model may compete/outperform discriminative approaches in clustering tasks, outperforming the state of the art across a variety of benchmark datasets (MNIST, CIFAR10). In CIFAR10, their tests show that adding a learnable prior and increasing the number of encoder updates improves the quality of the generated samples, resulting in an inception score of 9,49± 0,15 and a 46,9% improvement in the Fréchet inception distance above the state of the art. The researchers of [26] show that greedy or local methods of maximizing mutual information (such as stochastic gradient optimization) identify local optimal for the mutual information criterion; as a result, the resulting representations are less-than-ideal for complex downstream tasks. This problem has not been identified or addressed in previous research. They introduced deep hierarchical object grouping (DHOG), which generates representations that better optimize the mutual information objective by computing many separate discrete representations of pictures in a hierarchical sequence. They also discovered that these representations are more suited to the task of grouping objects into underlying object classes. They put DHOG to the test on unsupervised clustering, which is a natural downstream test given that the target representation is discrete data labeling. They produced new state-of-the-art scores on the three key benchmarks (CIFAR-100-20, STL-10, and SVHN) without any of the pre-filtering or Sobel-edge detection that many earlier approaches needed to work. They obtained accuracy improvements of 4,3% on CIFAR-10, 1,5% on CIFAR-100-20, and 7,2% on SVHN. The researchers in this work [27] tackle the problem of Federated Learning (FL), where users are spread and partitioned into clusters. This configuration represents scenarios in which separate groups of users have their own goals (learning tasks), but by aggregating their data with those of others in the same cluster (same learning task), they can take advantage of the power of numbers to execute more efficient Federated Learning. They present the Iterative Federated Clustering Algorithm (IFCA), a new framework that uses gradient descent to estimate user cluster identities and improve model parameters for user clusters. They investigated the algorithm's convergence rate in a linear model with squared loss, as well as for generic strongly convex and smooth loss functions. They demonstrate that IFCA converges at an exponential rate in Unsupervised Deep Learning: Taxonomy and Algorithms Informatica 46 (2022) 151–168 157 both scenarios with good initialization, and they explain the statistical error rate's optimality. They propose training the models by combining IFCA with the weight sharing strategy in multi-task learning when the clustering structure is uncertain. They show that our technique can succeed even if we reduce the initialization criteria by using random initialization and repeated restarts in the tests. They also offer practical data demonstrating the efficiency of our technique in non-convex problems like neural networks. On numerous clustered FL benchmarks (Rotated MNIST, Rotated CIFAR), they show how IFCA outperforms the baselines in terms of precision. The problem with this work [28] is that unsupervised image classification is a difficult computer vision task. Deep learning-based algorithms have produced excellent results, with the most recent technique using uniform embedding and class assignment losses. Because these processes have distinct goals fundamentally, improving them together may result in a suboptimal solution. To overcome this problem, the researchers suggest the IIC model (Invariant Information Clustering), a novel two- stage approach in which a pretraining embedding module is followed by a refining module that does both embedding and class assignment simultaneously. When evaluated with different datasets (CIFAR-10, CIFAR-100-20, and STL-10), their model outperforms SOTA in unsupervised tasks, with an accuracy of 81.0% for the CIFAR-10 dataset (an increase of 19.3% points), 35.3 % for CIFAR-100-20 (9.6 pp), and 66.5 % for STL-10 (6.9 pp). Deep clustering has demonstrated an excellent ability to deal with unsupervised learning for structure analysis of high-dimensional visual data by learning visual features and data grouping at the same time. Local learning constraints based on inter-sample relations and/or self- estimated pseudo labels are commonly used in existing deep clustering algorithms. This is vulnerable to unavoidable errors that spread throughout the neighborhood, as well as to error propagation during training. Based on the observation that assigning samples from the same semantic categories into different clusters reduces both intra-cluster compactness and inter-cluster diversity, i.e. lower partition confidence, the authors of [29] propose to solve this problem by learning the most confident clustering solution from all possible separations. In particular, they present PartItion Confidence MAximisation, a unique deep clustering method (PICA). It is based on the principle of learning the most semantically plausible data separation, in which all clusters may be mapped one-to-one to the ground-truth classes, by increasing the "global" partition confidence of the clustering solution. This is accomplished by introducing a differentiable partition uncertainty index and its stochastic approximation, as well as a principled objective loss function that minimizes such an index, all of which, when combined, allow for direct application of traditional deep networks and mini-batch based model training. Extensive testing on six frequently used clustering benchmarks (CIFAR-10, CIFAR-100, STL-10, imageNet-10, ImageNet-dogs, and Tiny-ImageNet) demonstrates that their model outperforms a wide range of state-of-the-art techniques in terms of ACC, NMI, and ARI. The challenge with this study [30] is that there is no obvious easy-cost function that can capture the major variables of differences and similarities in unsupervised learning. Because natural systems feature smooth dynamics, if an unsupervised objective function remains static during the training process, an opportunity is missed. Smooth dynamics should be introduced in the absence of concrete monitoring. Dynamic goal functions, as opposed to static cost functions, enable greater use of the progressive and unpredictable knowledge gained through pseudo supervision. In this study, they present Dynamic Autoencoder (DynAE), a new deep clustering model that eliminates the clustering reconstruction trade- off by gradually and seamlessly removing the reconstruction objective function in favor of a Figure 4: Methods for unsupervised image classification. (a) The sequential method embeds and assigns data points to classes one by one, whereas (b) the joint technique embeds and organizes data points into classes all at once. (c) The proposed technique performs embedding learning as a pretraining step to determine suitable initialization, then optimizes the embedding and class assignment processes simultaneously. During the pretraining stage of their two-stage design, they experience distinctive losses [28]. Figure 5: Unsupervised deep clustering using the proposed PartItion Confidence mAximisation (PICA) approach. (a) Given the input data as well as the CNN model's decision bounds, (b) Using a mini-batch of data and its randomly perturbed copy, PICA computes the cluster-wise Assignment Statistics Vector (ASV) in the forward pass. (c) To reduce the partition uncertainty index as much as possible (PUI), (d) PICA is taught to use a specific objective loss function to distinguish the ASV of all clusters on the hypersphere to discover the most confident and potentially promising clustering solution [29]. 158 Informatica 46 (2022) 151–168 A. Chefrour et al. construction one. In comparison to the most relevant deep clustering algorithms, experimental evaluations on benchmark datasets (MNIST-full, MNIST-test, USPS, and Fashion-MNIST) reveal that our methodology achieves state-of-the-art outcomes in terms of ACC and NMI. The problem addressed in this paper [31] is: Clustering with deep autoencoders has received a lot of attention in recent years. Current methods rely on learning embedded features and clustering data points in the latent space at the same time. Although many deep clustering algorithms beat shallow models in achieving good findings on a variety of high-semantic datasets, a major flaw in such models has gone unnoticed. The embedded clustering objective function may distort the latent space by learning from faulty pseudo-labels in the absence of concrete supervisory signals. As a result, the network can learn non-representative features, lowering its discriminative ability and resulting in inferior pseudo- labels. Modern autoencoder-based clustering articles advocate using the reconstruction loss for pretraining and as a regularizer during the clustering phase to mitigate the effect of random discriminative features. Feature Drift can, however, be caused by a clustering reconstruction trade-off. The authors suggest ADEC (Adversarial Deep Embedded Clustering), a novel autoencoder-based clustering model that uses adversarial training to handle a dual problem, namely, Feature Randomness and Feature Drift. They use benchmark real datasets (MNIST-full, MNIST-test, USPS, Fashion-MNIST, Reuters-10K, and Mice Protein) to empirically illustrate the applicability of their model for dealing with these difficulties. The researchers' model outperforms state-of-the-art autoencoder-based clustering approaches in terms of ACC and NMI. For image clustering, the authors of [32] suggest a self-supervised Gaussian ATtention network (GATCluster). GATCluster delivers semantic cluster labels without further post-processing, rather than extracting intermediate features first and then conducting the standard clustering technique. The Label Feature Theorem is used to ensure that the learned features are one-hot encoded vectors and that trivial solution are avoided. They created four self-learning tasks with the restrictions of transformation invariance, separability maximization, entropy analysis, and attention mapping to train the GATCluster unsupervised. The transformation invariance and separability maximization tasks, in particular, are used to understand the relationships between sample pairs. The goal of the entropy analysis task is to avoid finding simple solutions. They created a self-supervised attention method that incorporates a parameterized attention module and a soft attention loss to capture object-oriented semantics. During the training process, all of the clustering guiding signals are self- generated. Furthermore, they create a memory-efficient two-step learning approach for grouping large-size images. Extensive trials show that their suggested method outperforms the current state-of-the-art image clustering benchmarks (CIFAR-10, CIFAR-100, STL-10, imageNet- 10, ImageNet-dogs, and Tiny-ImageNet) in terms of ACC, NMI, and ARI. Deep learning has recently demonstrated its ability to learn strong feature representations for images. The work of image clustering necessitates appropriate feature representations to capture the data distribution and, as a result, distinguish data points from one another. Often, these two aspects are dealt with independently, and thus, traditional feature learning alone does not suffice in partitioning the data meaningfully. Variational Autoencoders (VAEs) naturally lend themselves to learning data distributions in a latent space. The authors [33] suggest a method based on VAEs that uses a Gaussian Mixture before helping cluster the images appropriately since they seek to efficiently differentiate between distinct clusters in the data. They learn the parameters of both the prior and posterior distributions at the same time. Their method represents a true Gaussian Mixture VAE. In this way, their system learns a prior that captures the latent distribution of the images as well as a posterior that aids in data point discrimination. They also suggest a new reparametrization of the latent space that includes both discrete and continuous variables. One important takeaway is that, unlike existing methods, their method generalizes well across diverse datasets without the use of pre-training or learned models, allowing it to be trained from scratch in an end-to-end manner. They demonstrate our efficacy and generalizability in the lab by achieving state-of-the-art outcomes on a variety of datasets using unsupervised approaches. To the best of their knowledge, they are the first to use VAEs for image clustering on real image datasets (MNIST, Fashion-MNIST, STL-10, CIFAR10, CIFAR100, and FRGCv2) in an unsupervised manner and the accuracy evaluation criteria. The authors of this research [34] deviate from current work by advocating the SCAN method (Semantic Clustering by Adopting Nearest neighbors), a two-step strategy in which feature learning and clustering are separated. To obtain semantically relevant features, a self- supervised task from representation learning is used first. Second, in a learnable clustering strategy, they employ the collected features as a prior. They accomplish so by removing cluster learning's capacity to rely on low-level Figure 6: GATCluster framework. CNN is a convolutional neural network, GP means global pooling, Mul represents channel-independent multiplication, Conv is a convolution layer, FC is a fully connected layer, and AFG represents an attention feature generator [32]. Unsupervised Deep Learning: Taxonomy and Algorithms Informatica 46 (2022) 151–168 159 features, which are present in existing end-to-end learning systems. In terms of classification accuracy, they surpass state-of-the-art approaches by substantial margins, with +26,6 % on CIFAR10, +25,0 % on CIFAR100-20, and +21,3 % on STL10, respectively. Furthermore, their technology is the first to successfully classify images on a large-scale dataset. In this paper [35], the authors offer a new deep image clustering framework for learning a category-style latent representation (Deep Clustering with Category-Style representation (DCCS) for unsupervised image clustering), in which the category information is decoupled from the image style and may be used directly for cluster assignment. Mutual information maximization is used to embed relevant information in the latent representation to achieve this goal. Furthermore, the augmentation-invariant loss is used to separate the representation into two parts: category and style. Last but not least, the latent representation is given a prior distribution to ensure that the elements of the category vector can be used as probabilities over clusters. Extensive tests show that the suggested method significantly outperforms state-of-the-art approaches on a variety of public datasets (MNIST and Fashion-MNIST) in terms of ACC, NMI, and ARI. The study's authors [36] proposed Deep Robust Clustering (DRC). Unlike existing methods, DRC approaches deep clustering from two perspectives: semantic clustering assignment and representation features, which can simultaneously improve inter-class and intra-class diversities. Furthermore, by examining the internal relationship between mutual information and contrastive learning, they established a generic framework that may change maximizing mutual information into minimizing contrastive loss. They used it to learn invariant features and robust clusters in DRC with great success. Extensive tests on six widely used deep clustering benchmarks (CIFAR-10, CIFAR-100, STL-10, imageNet- 10, ImageNet-dogs, and Tiny-ImageNet) show that DRC outperforms them in terms of both stability and accuracy. For example, on CIFAR-10, they achieved a mean accuracy of 71.6%, which is 7.1% higher than current values. In this research [37], they introduced Contrastive Clustering (CC), a one-stage online clustering algorithm that performs explicit instance-and cluster-level contrastive learning. To be more exact, the positive and negative instance pairs for a given dataset are created using data augmentation and then projected into a feature space. In this case, instance- and cluster-level contrastive learning are carried out in the row and column space, respectively, by maximizing positive pair similarities while minimizing negative pair similarities. Their main finding is that the feature matrix's rows can be thought of as soft labels, for instance, and the columns can be thought of as cluster representations. The model learns representations and cluster assignments in an end-to-end way by maximizing the instance- and cluster-level contrastive loss at the same time. On six challenging image benchmarks (CIFAR-10, CIFAR-100, STL-10, imageNet-10, ImageNet-dogs, and Tiny-ImageNet), extensive experimental data shows that CC beats 17 competitive clustering approaches. On the CIFAR-10 (CIFAR-100) dataset, in particular, CC obtains an NMI of 0.705 (0.431), which is a performance gain of up to 19% (39%) above the best baseline. The authors of this paper [38] propose learning an autoencoder embedding and then searching for the underlying manifold using it. They then cluster this using a shallow clustering technique rather than a deeper network for simplicity. They investigated a variety of local and global manifold learning methods on both raw data and autoencoder embeddings, concluding that UMAP in their framework is capable of determining the optimal clusterable manifold of the embedding. This shows that using local manifold learning on an autoencoder embedding to find higher-quality clusters is a good idea. They show numerically that their method outperforms the existing state-of-the-art on a variety of image and time- series datasets (MNIST, MNIST-test, USPS, Fashion, Pendigits, and HAR) including outperforming the current state-of-the-art on numerous in terms of ACC and NMI. They believe these findings point to a viable research direction in deep clustering. SPICE, a Semantic Pseudo-labeling framework for Image ClustEring, is presented in this work [39]. SPICE generates pseudo-labels by self-learning and directly employs the pseudo-label-based classification loss to train a deep clustering network, rather than requiring indirect loss functions as required by the recently proposed approaches. The core idea behind SPICE is to use a semantically-driven paradigm to improve the clustering network by combining the discrepancy between semantic clusters, similarity across instance samples, and semantic consistency of local samples in an embedding space. To train a clustering network by unsupervised representation learning, a semantic-similarity-based pseudo-labeling approach was presented initially. A local semantic consistency principle is employed to pick a set of consistently labeled samples based on the initial clustering results, and a semi-pseudo-labeling technique (SPICE- Figure 7: Contrastive Clustering framework. Two data augmentations are used to create data pairs. One shared deep neural network is utilized to extract features from distinct augmentations given data pairs. To project the features into the row and column space, two distinct MLPs (denotes the ReLU activation and denotes the Softmax operation to produce soft labels) are utilized to undertake instance- and cluster-level contrastive learning, respectively [37]. 160 Informatica 46 (2022) 151–168 A. Chefrour et al. Semi) is adopted for performance boosting. On six typical benchmark datasets, including STL10, Cifar10, Cifar100- 20, ImageNet-10, ImageNet-Dog, and Tiny ImageNet, extensive studies show that SPICE outperforms existing approaches. In terms of adjusted rand index, normalized mutual information, and clustering accuracy, the proposed SPICE technique improves the existing best results by roughly 10% on average. Unsupervised image clustering approaches are prone to incorrect predictions and overconfident outcomes since they use alternate objectives to indirectly train the model. To address these issues, the current study [40] provides a new RUC model that is based on resilient learning. RUC is unique in that it uses the pseudo-labels of existing picture clustering algorithms as a noisy dataset with potentially misclassified samples. Its retraining method can correct mismatched knowledge and reduce the problem of overconfidence in forecasts. The model's flexible structure allows it to be used as an add-on module to existing clustering algorithms, allowing them to perform better on a variety of datasets (CIFAR-10, CIFAR-20, STL-10). Extensive studies show that the suggested approach can improve model confidence and gain additional robustness against adversarial noise by properly calibrating it. RUC is a module that may be added to any off-the-shelf unsupervised learning method to improve its performance. RUC is motivated by a desire to learn more. It separates clustered data points into clean and noisy sets before fine-tuning the clustering results. SCAN and TSUC, two state-of-the-art unsupervised clustering algorithms, exhibited considerable performance increases with RUC. (STL-10 : 86.7 %, CIFAR-10 : 90.3 %, CIFAR-20 : 54.3 %). In the research [41], the authors use instance discrimination and feature decorrelation to propose a clustering-friendly representation learning approach. The principles of classical spectral clustering inspired their deep-learning-based representation learning method. Instance discrimination discovers data commonalities, whereas feature decorrelation eliminates redundant correlation between features. They employ a method of instance discrimination in which knowing individual instance classes leads to learning similarities between examples. They show that the methodology may be extended to learning a latent space for clustering through comprehensive experimentation and examination of the benchmark datasets (CIFAR-10, CIFAR-100, STL-10, ImageNet-10, and ImageNet-Dog). For learning, they create new softmax-formulated decorrelation constraints. Their method achieves an accuracy of 81,5% and 95,4% in image clustering tests using CIFAR-10 and ImageNet- 10, respectively. They also demonstrate that the softmax- formulated constraints work with a variety of neural networks. The authors of this study [42] introduced Mixture of Contrastive Experts (MiCE), a unified probabilistic clustering approach that concurrently uses contrastive learning's discriminative representations and a latent mixture model's semantic structures. MiCE uses a gating function to partition an unlabeled dataset into subsets according to latent semantics and numerous experts to differentiate separate subsets of instances allotted to them in a contrastive learning method, which is motivated by the mixing of experts. They designed a scalable form of the Expectation-Maximization (EM) algorithm for MiCE and showed proof of convergence to overcome the nontrivial inference and learning challenges caused by latent variables. They tested MiCE's clustering performance empirically on four frequently used natural image datasets (CIFAR-10, CIFAR-100, STL10, and ImageNet-Dog). MiCE outperforms a variety of earlier approaches and provides a strong contrastive learning baseline using the criteria ACC, NMI, and ARI. The problem with this study [43] is that, as measured by curated class-balanced datasets, unsupervised feature learning has made significant progress with contrastive learning based on instance discrimination and invariant mapping. Natural data, on the other hand, maybe highly linked and skewed. The supposed instance distinction clashes with natural between-instance similarity, resulting in inconsistency in training and poor performance. The goal is to identify and integrate between-instance similarity into contrastive learning via cross-level discrimination (CLD) between instances and local instance groups rather than instance grouping directly. While attraction inside each instance's augmented perspectives forces invariant mapping, between-instance similarity comes via common repulsion against instance groupings. The batch-wise and cross-view comparisons also help to increase contrastive learning's positive/negative sample ratio and produce improved invariant mapping. We impose both grouping and discrimination objectives on characteristics obtained separately from a shared representation to achieve both goals. For the first time, they also present normalized projection heads and unsupervised hyper-parameter adjustment. CLD is a lean and powerful add-on to existing methods (e.g., NPID, MoCo, InfoMin, BYOL) on highly correlated, long-tail, or balanced datasets, as demonstrated by considerable experimentation. It not only sets new Figure 8: Representation of the SPICE framework. (a) SPICE-Self uses pseudo labeling to train a classification model, with CNN fixed after pretraining using representation learning. (b) SPICE-Semi retrains the classification model by semi-pseudo-labeling, in which reliable labels are chosen from the SPICE-Self findings based on the local consistency of nearby samples. (c) A simple example of pseudo labeling, with red, green, and blue indicating different clusters [39]. Unsupervised Deep Learning: Taxonomy and Algorithms Informatica 46 (2022) 151–168 161 benchmarks (CIFAR-10, CIFAR-100, and ImageNet) for self-supervision, semi-supervision, and transfer learning, but it also outperforms MoCo v2 and SimCLR on every reported performance achieved with a far larger compute in terms of accuracy. Unsupervised learning is effectively extended to natural data with CLD, bringing it closer to real-world applications. 4 Discussion Based on this short and selective survey of deep clustering algorithms, we make the following observations: • most deep clustering techniques have been tested in the area of image recognition; • performances of these techniques are great in terms of recognition accuracy, as the study of [35], where obtained recognition accuracy achieves 98.9 %; • most studies enhance the embedding of the data into a lower-dimensional space; • several researchers use the MNIST database for experimentation and k-means algorithm for results comparison; • we remark that the appearance of the hybrid version of Autoencoder gives satisfactory results too ; • deep learning is a technology that continues to mature and has been applied to pattern recognition to great effect; • we have identified the name of the proposed method, the category to which it belongs, a dataset of each approach with the methods of comparison, these are seen in table 1; • Table 1 summarizes the sorted works in chronological order. We observed in Table 1, that the MNIST dataset provides good results compared to other databases like USPS; CIFAR- 10; CIFAR-100; Table 1: General comparison of various deep clustering algorithms for image recognition. References Method Category Dataset Compared results with Obtained results [24] SNNL Soft Nearest Neighbor Loss AE MNIST; Fashion- MNIST; and EMNIST Balanced. SNNL-2; SNNL-4; Baseline AE; DEC; VaDE; N2D; and ClusterGAN;.... 1. The best accuracy (acc)=96.2% with MNIST; 2. The best NMI=90.3% with MNIST; 3. The best ARI=91.8% with MNIST; [25] MPCC Matching Priors and Conditionals for Clustering AE MNIST; Onmiglot; FMNIST; CIFAR-10; and CIFAR-20. DEC; VADE; InfoGAN; ClusterGAN; DAC; IMSAT (VAT); ADC; SCAE; and IIC. The best accuracy (acc)= 98.76 ± 0.03% with MNIST; [10] DeepCluster is a new clustering strategy for large- scale end-to-end convent training. AE ImageNet; Places. The methods have a standard AlexNet architecture. The best is 73.7% on classification with deepCluster [11] Low-rank Constrained Deep Autoencoder for Subspace Clustering (LRAE) AE MNIST; COIL- 100, and ORL SSC; LRR; LRSC; LSR; AESC, and PARTY. 1. The best accuracy (acc)= 81.49 ± 2.19 with ORL; 2. The best NMI= 90.77 ± 2.01 with ORL; 3. The best ARI= 73.92 ± 2.11 with ORL; [12] Hybrid Autoencoder (BAE), the combination of three AE-based models—the convolutional autoencoder (CAE), adversarial autoencoder (AAE), and stacked autoencoder (SAE) AE MNIST and CIFAR-10. Fuzzy objective function algorithm (FCM), Spectral clustering algorithm (SC), Low- rank representation algorithm (LRR), LSR1 and LSR2 are the variants of the least-squares regression (LSR), SLRR is the scalable LRR, LSC-R and LSC- K are the variants of the large- scale spectral clustering (LSC) algorithms, NMF is the non- negative matrix factorization algorithm, ZAC is the Zeta function based agglomerative clustering algorithm, and DEC is the deep embedding clustering algorithm. 1. The best accuracy (acc)= 83.67% with MNIST; 2. The best NMI= 80.85% with MNIST; 162 Informatica 46 (2022) 151–168 A. Chefrour et al. References Method Category Dataset Compared results with Obtained results [14] Clustering with Optimal Clustering (OT) is a new approach where the embedding is performed by a differentiable model such as a deep neural network GAN MNIST and CIFAR10 k-means; AE + k-means; soft k- means and soft k-means (p) The best NMI= 85.10% with MNIST; [15] ClusterGAN is a deep Generative Adversarial Clustering Network. GAN MNIST; USPS; FRGC; CIFAR-10 and STL-10. Kmeans; N-Cuts; SC-LS; AC- PIC; SEC and LDMGI. 1. The best accuracy (acc)= 97% with USPS; 2. The best NMI= 93.10% with USPS; 3. The best accuracy (acc)= 96.4% with MNIST; 4. The best NMI= 92.10% with MNIST; [27] IFCA a new framework dubbed the Iterative Federated Clustering Algorithm AE Rotated MNIST; and Rotated CIFAR The global model for IFCA; and local model The best accuracy (acc)= 95.25 ± 0.40% with Rotated MNIST; [9] Deep Convolutional Embedded Clustering (DCEC) AE MNIST -full; MNIST-test; USPS 1. Deep Embedded Clustering (DEC); 2. K-means; 3. Stacked AutoEncoders (SAE). 1. The best accuracy (acc)=88.97% with MNIST-full; 2. The best NMI=88.49% with MNIST-full. [3] ASPC-DA is an Adaptive Self- Paced deep Clustering with Data Augmentation AE MNIST-full; MNIST-test; USPS and Fashion 1. The best accuracy (acc)=98.8% with MNIST-full; 2. The best NMI=96.6% with MNIST-full. [16] Kingdra is a framework that leverages semi- supervised models AE MNIST; STL; CIFAR10; Reuters and 20news. k-means; AC; DEC; Deep RIM; and IMSAT... The best accuracy (acc)=98.5% with MNIST. [28] A novel two-stage algorithm in which an embedding module for pretraining precedes a refining module that concurrently performs embedding and class assignment AE CIFAR-10; CIFAR-20; and STL-10 Random network; k-means; Autoencoder (AE); SWWAE; GAN; JULE; DEC; DAC; DeepCluster; ADC; and IIC The best accuracy (acc)= 81% with CIFAR- 10; [29] PICA a novel deep clustering method named PartItion Confidence mAximisation AE CIFAR-10; CIFAR-100; STL-10; ImageNet-10; ImageNet- Dogs and Tiny- ImageNet K-means; SC; AC; NMF; AE; DAE; and IIC;... 1. The best accuracy (acc)= 87% with ImageNet-10; 2. The best NMI= 80.2% with ImageNet-10; 3. The best ARI= 76.1% with ImageNet-10; [17] MIADM is an approximate alternating direction method. AE USPS; MNIST-test; MNIST-full; YTF; CMU- PIE and FRGC SR-K-means; DEPICT; DCN (K-means based) and DEC (KL based). 1. The best accuracy (acc)=97.9% with USPS; 2. The best NMI=94.8% with USPS. [18] IIC Invariant Information Clustering AE STL10; CIFAR10; CFR100-20 and MNIST. Random network; Kmeans; Spectral clustering; Triplets; Variational Bayes AE and DeepCluster 2018,..... The best accuracy (acc)=99.2% with MNIST; Unsupervised Deep Learning: Taxonomy and Algorithms Informatica 46 (2022) 151–168 163 [19] LTVAE latent tree variational autoencoder. VAE MNIST; STL- 10; Reuters and HHAR AE+GMM; VAE+GMM; DEC and DCN. The best accuracy (acc)=90% with STL- 10; [37] CC Contrastive Clustering is an online clustering method AE CIFAR-10; CIFAR-100; STL-10; ImageNet-10; ImageNet- Dogs; and Tiny-ImageNet k-means; SC; AC; NMF; DEC; JULE; VaE; DCGAN; DeCNN; DCCM; IIC; and PICA;... 1. The best accuracy (acc)= 89.3% with ImageNet- 10; 2. The best NMI= 85.9% with ImageNet-10; 3. The best ARI= 82.2% with ImageNet-10; [38] N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding AE MNIST; MNIST-test; USPS; Fashion; pendigits; and HAR k-means; SC; GMM; DEC; DCN; JULE; VaDE; DEPICT; DBC; and ASPC-DA;... 1. The best accuracy (acc)= 97.9% with MNIST; 2. The best NMI= 94.2% with MNIST; [30] DynAE Dynamic Autoencoder, a novel model for deep clustering that addresses a clustering– reconstruction trade-off. AE MNIST-full; MNIST-test; USPS; and Fashion- MNIST K-Means; GMM; LSNMF; AC; SSC-OMP; EnSC; LMVSC; RBF K-Means −; DEC; JULE; and DEPICT;.... 1. The best accuracy (acc)= 98.7% with MNIST- full; 2. The best NMI= 96.4% with MNIST-full; [31] ADEC (Adversarial Deep Embedded Clustering) is a novel autoencoder- based clustering model AE MNIST-full; MNIST-test; USPS; Fashion- MNIST; REUTERS- 10K; and Mice Protein DEC*; IDEC*; k-means; GMM; LSNMF; AC; RBF k-means; ...... 1. The best accuracy (acc)= 98.6% with MNIST- full; 2. The best NMI= 96.1% with MNIST-full; [13] ClusterGAN method is a new mechanism for clustering using GANs (Generative Adversarial Networks ) GAN Synthetic data; MNIST; Fashion- MNIST; 10x_73k and Pendigits. WGAN (normal); WGAN (One- Hot) and Info GAN. The best accuracy (acc)= 95% with MNIST; [32] SPICE, a Semantic Pseudo-labeling framework for Image ClustEring AE STL10; ImageNet-10; ImageNet-Dog- 15; Cifar10; Cifar100-20; and Tiny- ImageNet-200 JULE; DEC; DAC; DeepCluster; DDC; IIC; DCCM; GATCluster; PIC; and CC 1. The best accuracy (acc)= 93.8% with STL10; 2. The best NMI= 87.2% with STL10; 3. The best ARI= 87% with STL10; [40] RUC is inspired by robust learning. RUC’s novelty is at utilizing pseudo- labels of existing image clustering models as a noisy dataset. AE CIFAR-10; CIFAR-20; and STL-10 k-means; SC; Triplets; AE; GAN; JULE; DAC; DEC; DeepCluster; IIC; TSUC and SCAN;... The best accuracy (acc)= 90.1% with CIFAR-10; [33] A method based on VAEs where we use a Gaussian Mixture before help cluster the images accurately VAE STL-10; CIFAR10; MNIST; and Fashion- MNIST k-means; AE+k-means; and DEC The best accuracy (acc)= 98.4% with MNIST; [20] DCC Deep Continuous Clustering AE MNIST; Coil100; YTF; YaleB; Reuters and RCV1 k-means++; AC-W; DBSCAN; SEC and LDMGI;..... 1. The best accuracy (acc)=91.3% with MNIST; 2. The best accuracy (acc)=98.5% with YaleB; 164 Informatica 46 (2022) 151–168 A. Chefrour et al. 5 Proposed taxonomy of deep clustering Figure 9 illustrates the taxonomy of Deep Clustering techniques that we describe, which in turn indicates the study's structure. The basic algorithmic structure, network architecture, loss functions, and training optimization methodologies for deep clustering systems vary (or learning the parameters). We focus on deep learning for clustering approaches in this paper, where those approaches either use deep learning for grouping (or partitioning) the data and/or creating low-rank deep representations or embeddings of AE based DC DCEC GAN based DC VAE based DL DCN HAE DGG VaDE LTVA E Taxonomy of Deep clustering (DC) Figure 9: The proposed taxonomy. [41] IDFD a clustering- friendly representation learning method using instance discrimination and feature decorrelation. AE CIFAR-10; CIFAR-100; STL-10; Imagenet-10; and Imagenet- Dog AE; DEC; DAC; DCCM; ID; IIC; IDFO; and SCAN 1. The best accuracy (acc)= 95.4% with ImageNet- 10; 2. The best NMI= 89.8% with ImageNet-10; 3. The best ARI= 90.1% with ImageNet-10; [42] MiCE Mixture of Contrastive Experts, a unified probabilistic clustering framework AE CIFAR-10; CIFAR-100; STL-10; and Imagenet-Dog K-means; AE; DHOG; DAC; DCCM; MMDC; IIC; IDFO; and MoCo 1. The best accuracy (acc)= 83.5% with CIFAR-10; 2. The best NMI= 73.7% with CIFAR-10; 3. The best ARI= 69.8% with CIFAR-10; [34] SCAN Semantic Clustering by Adopting Nearest neighbors AE CIFAR10; CIFAR100- 20; STL10; and ImageNet k-means; SC; Triplets; JULE; AEVB; SAE; DAE; GAN; DAC; and IIC 1. The best accuracy (acc)= 88.3% with CIFAR10; 2. The best NMI= 79.7% with CIFAR10; 3. The best ARI= 77.2% with CIFAR10; [43] CLD cross-level discrimination AE STL10; CIFAR10; CIFAR100; and ImageNet100 DeepCluster; MoCo; Exemplar; Inv. Spread; NPID; and BYOL;.... 1. The best retrieval= 78.6% with CIFAR-10; 2. The best NMI= 69% with CIFAR-10; 3. The best kNN= 86.7% with CIFAR-10; [23] DCCM is a deep comprehensive correlation mining AE CIFAR-10; CIFAR-100; STL-10; ImageNet-10; Imagenet-dog- 15; and Tiny- ImageNet. K-means; SC; AC; NMF; AE; and DAE;..... 1. The best accuracy (acc)=60.8% with ImageNet-10; 2. The best NMI=71% with ImageNet-10; 3. The best ARI=55.5% with ImageNet-10; [21] DGG: Deep clustering via a Gaussian mixture variational autoencoder (VAE) with Graph embedding VAE MNIST; STL- 10; Reuters and HHAR. AE+GMM; DEC; IMSAT; VaDE; SpectralNet; and LTVAE. The best accuracy (acc)=97.58±0.1% with MNIST; [22] A joint learning framework for discriminative embedding and spectral clustering AE MNIST-full; MNIST-test; USPS; Fashion-10; and YTF. K-means; SC-Ncut; SC-LS; NMF; AC-GDL; and DASC;...... 1. The best accuracy (acc)=98% with MNIST-test; 2. The best NMI=94.6% with MNIST-test; [35] DCCS a novel deep image clustering framework to learn a category-style latent representation AE MNIST; and Fashion- MNIST k-means; SC; AC; NMF; DEC; JULE; VaDE; DEPICT; IMSAT; ClusterGan; IIC; and DLS- clustering;... 1. The best accuracy (acc)= 98.9% with MNIST; 2. The best NMI= 97% with MNIST; 3. The best ARI= 97.6% with MNIST; [36] DRC Deep Robust Clustering AE CIFAR-10; CIFAR-100; STL-10; ImageNet-10; Imagenet-dog- 15; and Tiny- ImageNet k-means; SC; AC; NMF; DEC; JULE; VaDE; DEPICT; IMSAT; DCCM; IIC; and PICA;... 1. The best accuracy (acc)= 88.4% with ImageNet- 10; 2. The best NMI= 83% with ImageNet-10; 3. The best ARI= 79.8% with ImageNet-10; Unsupervised Deep Learning: Taxonomy and Algorithms Informatica 46 (2022) 151–168 165 the data, which could play a significant supporting role as a building block of supervised learning, among other goals. There are numerous approaches to developing a taxonomy of deep clustering algorithms; in this study, we took the approach of seeing the methods as a process. As a result, we provide a simplified taxonomy based on deep clustering algorithms' overall procedural structure or architecture. Beginners and experienced readers will benefit from the simplified classification. We have chosen to propose to divide deep learning into three categories: AE-based deep clustering: Artificial neural networks (ANNs) are a type of machine learning model made up of numerous nodes grouped in layers that compute an output depending on node activation mediated by weights in the connections between them. ANNs are capable of solving a variety of machine learning tasks, including classification, regression, and dimensionality reduction [44]. A neural network that has been trained to duplicate its input to its output is called an autoencoder. It has a hidden layer h on the inside that defines the code used to represent the input. The network is made up of two parts: an encoder function h=f(x) and a decoder function r=g(h) that provides a reconstruction. Figure 10 illustrates this architecture. If an autoencoder only succeeds in learning to set g(f(x)) =x everywhere, it isn't particularly useful. Autoencoders, on the other hand, are meant to be incapable of flawless copying. They are usually limited in some way, allowing them to copy only roughly and only input that closely mimics the training data. Because the model must prioritize which features of the input should be duplicated, it frequently discovers interesting data attributes. The following is an overview of representative methods of Autoencoder: 1. Deep Convolutional Embedded Clustering (DCEC): the DCEC system is composed of Convolutional Clustering (CAE) and a clustering layer that is connected to the embedded layer of CAE [9]. Each embedded point z i of the input image xi is mapped into a soft label by the clustering layer. The Kullback-Leibler divergence (KL divergence) between the distribution of soft labels and the precisely defined distribution is then defined as the clustering loss Lc. The clustering loss leads the embedded features to be resistant to forming clusters, and CAE is used to learn embedded features. The objective of DCEC is: L = L r + γL c (1) where L r and L c are reconstruction loss and clustering loss respectively, and γ > 0 is a coefficient that controls the degree of distorting embedded space. When γ = 1 and L r ≡ 0, (1) reduces to the objective of DEC. 2. Deep Clustering Network (DCN): this method [45] which combines the autoencoder and the k- means algorithm, is one of the most remarkable in the field. It pre-trains an autoencoder in the first stage. The reconstruction loss and the k- means loss are then optimized together. Because k-means relies on discrete cluster assignments, it necessitates the employment of a different optimization procedure. When compared to other methods, DCN's goal is simple, and the computing complexity is modest; 3. Hybrid Autoencoder (HAE): [46] CAE (convolutional autoencoder), VAE (adversarial autoencoder), and SAE (stacked autoencoder) combine the advantages of three autoencoders to learn low and high-level feature representation. GAN-based deep clustering: In recent years, the Generative Adversarial Network (GAN) has become a popular deep generative model. A min-max adversarial game is established between two neural networks in the (GAN) [47]: a generating network, G, and a discriminative network, D. The generative network attempts to map a sample z from a prior distribution p(z) to the data space, whereas the discriminative network attempts to compute the probability that an input is a real sample from the data distribution rather than one created by the generative network. GAN is an exciting idea since it offers an adversarial approach to matching the distribution of data or its representations to an arbitrary prior distribution. VAE- based deep clustering: [48] VAE is a generative variant of AE since it causes AE's latent code to follow a predetermined distribution. VAE blends variational Bayesian approaches with neural network flexibility and scalability. It applies neural networks to the conditional posterior and uses stochastic gradient descent and standard backpropagation to optimize the variational inference objective. It employs the reparameterization of the variational lower bound to produce a simple, differentiable, unbiased lower bound Figure 10: The structure of deep convolutional embedded clustering (DCEC). It is composed of convolutional autoencoders and a clustering layer connected to the embedded layer of autoencoders [9]. Figure 11: GAN-based deep clustering [47]. 166 Informatica 46 (2022) 151–168 A. Chefrour et al. estimator. In nearly every model with continuous latent variables, this estimator can be utilized for efficient approximate posterior inference: 1. Deep clustering via a Gaussian mixture VAE with Graph embedding (DGG): [21] a new VAE- based model that assumes the latent variables have a tree structure; 2. Variational Deep Embedding (VaDE): introduces a VAE-based generative model that assumes the latent variables are a mixture of Gaussians with trainable means and variances [49]; 3. Latent Tree Variational Autoencoder (LTVAE): a VAE-based model that assumes the latent variables have a tree structure [19]. 6 Conclusion and perspectives Deep learning is made up of a number of well-known and effective models that are used to solve a variety of problems [50]. In the context of deep clustering, we have presented, in this article, an introductory study of the main deep unsupervised learning algorithms that have been found in the last 3-4 years in the literature. We have presented an overview of clustering methods and algorithms for deep learning. We noticed the multitude of contributions developed in the area of image recognition and we studied and synthesized different recent works in this context. We have proposed a taxonomy of clustering with deep learning algorithms based on previous studies and some treated representative methods in the survey. This study is the first step of our research for which we can consider several future extensions, such as exploring the possibilities of hybridization between different deep clustering approaches and their application in evolving patterns. We will be able to make a comparative study of the performance of deep learning approaches based on the autoencoder, such as the work of [51]. We will be able to apply the deep clustering method in fields such as face recognition, etc [52]. Acknowledgment The authors would like to thank the DGRSDT (General Directorate of Scientific Research and Technological Development) - MESRS (Ministry of Higher Education and Scientific Research), ALGERIA, for the financial support of LISCO Laboratory. R efer ence s [1] A. Chefrour, and L. Souici-Meslati (2019). AMF- IDBSCAN: Incremental Density Based Clustering Algorithm Using Adaptive Median Filtering Technique. Informatica, vol. 43(4). https://doi.org/10.31449/inf.v43i4.2629 [2] E. Min, X. Guo, Q. Liu, G. Zhang, J. Cui, and J. Long (2018). A survey of clustering with deep learning: From the perspective of network architecture. IEEE Access, vol. 6, pp. 39501-39514. https://doi.org/10.1109/access.2018.2855437 [3] X. Guo, X. Liu, E. Zhu, X. Zhu, M. Li, X. Xu, and J. Yin (2019). Adaptive self-paced deep clustering with data augmentation. IEEE Transactions on Knowledge and Data Engineering, vol. 32(9), pp. 1680-1693. https://doi.org/10.1109/tkde.2019.2911833 [4] C. C. Wang, K. L. Tan, C. T. Chen, Y. H. Lin, S. S. Keerthi, D. Mahajan, and C. J. Lin (2018). Distributed newton methods for deep neural networks. Neural computation, vol. 30(6), pp. 1673- 1724. https://doi.org/10.1162/neco_a_01088 [5] Z. Shen, H. Yang, and S. Zhang, S (2021). Deep network with approximation error being reciprocal of width to power of square root of depth. Neural Computation, vol. 33(4), pp. 1005-1036. https://doi.org/10.1162/neco_a_01364 [6] J. Gao, P. Li, Z. Chen, and J. Zhang (2020). A survey on deep learning for multimodal data fusion. Neural Computation, vol. 32(5), pp.829-864. https://doi.org/10.1162/neco_a_01273 [7] E. Aljalbout, V. Golkov, Y. Siddiqui, M., Strobel, and D. Cremers (2018). Clustering with deep learning: Taxonomy and new methods. arXiv preprint arXiv:1801.07648. [8] G. C. Nutakki, B. Abdollahi, W. Sun, and O. Nasraoui (2019). An introduction to deep clustering. Clustering Methods for Big Data Analytics , pp. 73- 89. Springer, Cham. https://doi.org/10.1007/978-3-319-97864-2_4 [9] X. Guo, X. Liu, E Zhu, and J. Yin (2017, November). Deep clustering with convolutional autoencoders. In International conference on neural information processing, pp. 373-382. Springer, Cham. https://doi.org/10.1007/978-3-319-70096-0_39 [10] M. Caron, P. Bojanowski, A. Joulin, and M. Douze (2018). Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision (ECCV) , pp. 132- 149. https://doi.org/10.1007/978-3-030-01264-9_9 [11] Y. Chen, L. Zhang, and Z. Yi (2018). Subspace clustering using a low-rank constrained autoencoder. Information Sciences, vol. 424, pp. 27- 38. https://doi.org/10.1016/j.ins.2017.09.047 [12] P. Y. Chen, and J. J. Huang (2019). A hybrid autoencoder network for unsupervised image clustering. Algorithms, vol. 12(6), pp.122. https://doi.org/10.3390/a12060122 [13] S. Mukherjee, H. Asnani, E. Lin, and S. Kannan (2019, July). ClusterGAN: Latent space clustering in generative adversarial networks. In Proceedings of Unsupervised Deep Learning: Taxonomy and Algorithms Informatica 46 (2022) 151–168 167 the AAAI Conference on Artificial Intelligence , Vol. 33, No. 01, pp. 4610-4617. [14] A. Genevay, G. Dulac-Arnold, and J. P. Vert (2019). Differentiable deep clustering with cluster size constraints. [15] K. Ghasedi, X. Wang, C. Deng, and H. Huang (2019). Balanced self-paced learning for generative adversarial clustering network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4391-4400. https://doi.org/10.1109/CVPR.2019.00452 [16] Gupta, D., Ramjee, R., Kwatra, N., & Sivathanu, M. (2019, September). Unsupervised clustering using pseudo-semi-supervised learning. In International Conference on Learning Representations. [17] M. Jabi, M. Pedersoli, A. Mitiche, and I. B. Ayed, (2019). Deep clustering: On the link between discriminative models and k-means. IEEE transactions on pattern analysis and machine intelligence. https://doi.org/10.1109/TPAMI.2019.2962683 [18] X. Ji, J. F. Henriques, and A. Vedaldi (2019). Invariant information clustering for unsupervised image classification and segmentation. https://doi.org/10.1109/ICCV.2019.00996 [19] X. Li, Z. Chen, L. K. Poon, and N. L. Zhang (2018). Learning latent superstructures in variational autoencoders for deep multidimensional clustering. arXiv preprint arXiv:1803.05206. iclr 2019. [20] S. A. Shah, and V. Koltun (2018). Deep continuous clustering. arXiv preprint arXiv:1803.01449. [21] L. Yang, N. M. Cheung, J. Li, and J. Fang (2019). Deep clustering by gaussian mixture variational autoencoders with graph embedding. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 6440-6449. https://doi.org/10.1109/ICCV.2019.00654 [22] X. Yang, C. Deng, F. Zheng, J. Yan, and W. Liu (2019). Deep spectral clustering using dual autoencoder network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4066-4075 [23] J. Wu, K. Long, F. Wang, C. Qian, C. Li, Z. Lin, and H. Zha (2019). Deep comprehensive correlation mining for image clustering. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8150-8159. https://doi.org/10.1109/ICCV.2019.00824 [24] A. F. Agarap, and A. P. Azcarraga (2020, July). Improving k-Means Clustering Performance with Disentangled Internal Representations. In 2020 International Joint Conference on Neural Networks (IJCNN) , pp. 1-8. IEEE. https://doi.org/10.1109/IJCNN48605.2020.9207192 [25] N. Astorga, P. Huijse, P. Protopapas, and P. Estévez(2020, August). MPCC: Matching Priors and Conditionals for Clustering. In European Conference on Computer Vision , pp. 658-677. Springer, Cham. https://doi.org/10.1007/978-3-030-58592-1_39 [26] L. N. Darlow, and A. Storkey (2020). DHOG: Deep Hierarchical Object Grouping. arXiv preprint arXiv:2003.08821. [27] A. Ghosh, J. Chung, D. Yin, and K. Ramchandran (2020). An efficient framework for clustered federated learning. In 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada. [28] S. Han, S. Park, S. Park, S. Kim, and M. Cha (2020, August). Mitigating Embedding and Class Assignment Mismatch in Unsupervised Image Classification. In 16th European Conference on Computer Vision, ECCV 2020. Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-58586-0_45 [29] J. Huang, S. Gong, and X. Zhu (2020). Deep semantic clustering by partition confidence maximisation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8849-8858. https://doi.org/10.1109/CVPR42600.2020.00887 [30] N. Mrabah, N. M. Khan, R. Ksantini, and Z. Lachiri (2020). Deep clustering with a Dynamic Autoencoder: From reconstruction towards centroids construction. Neural Networks, vol. 130, pp. 206- 228. https://doi.org/10.1016/j.neunet.2020.07.005 [31] N. Mrabah, M. Bouguessa, and R. Ksantini (2020). Adversarial deep embedded clustering: on a better trade-off between feature randomness and feature drift. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2020.2997772 [32] C. Niu, J. Zhang, G. Wang, and J. Liang (2020, August). Gatcluster: Self-supervised gaussian- attention network for image clustering. In European Conference on Computer Vision , pp. 735-751. Springer, Cham. https://doi.org/10.1007/978-3-030-58595-2_44 [33] V. Prasad, D. Das, and B. Bhowmick (2020, July). Variational Clustering: Leveraging Variational Autoencoders for Image Clustering. In 2020 International Joint Conference on Neural Networks (IJCNN) , pp. 1-10. IEEE. https://doi.org/10.1109/IJCNN48605.2020.9207523 [34] W. Van Gansbeke, S. Vandenhende, S. Georgoulis, M. Proesmans, and L. Van Gool (2020, August). 168 Informatica 46 (2022) 151–168 A. Chefrour et al. Scan: Learning to classify images without labels. In European Conference on Computer Vision, pp. 268-285. Springer, Cham. https://doi.org/10.1007/978-3-030-58607-2_16 [35] J. Zhao, D. Lu, K. Ma, Y. Zhang, and Y. Zheng (2020, August). Deep Image Clustering with Category-Style Representation. In European Conference on Computer Vision , pp. 54-70. Springer, Cham. https://doi.org/10.1007/978-3-030-58568-6_4 [36] H. Zhong, C. Chen, Z. Jin, and X. S. Hua (2020). Deep robust clustering by contrastive learning. arXiv preprint arXiv:2008.03030. [37] Y. Li, P. Hu, Z. Liu, D. Peng, J. T. Zhou, and X. Peng (2021). Contrastive Clustering. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35(10), pp. 8547-8555. [38] R. McConville, R. Santos-Rodriguez, R. J., Piechocki, and I. Craddock (2021, January). N2d:(not too) deep clustering via clustering the local manifold of an autoencoded embedding. In 25th International Conference on Pattern Recognition (ICPR), pp. 5145-5152. IEEE. https://doi.org/10.1109/ICPR48806.2021.9413131 [39] C. Niu, and G. Wang (2021). SPICE: Semantic Pseudo-labeling for Image Clustering. arXiv preprint arXiv:2103.09382. [40] S. Park, S. Han, S. Kim, S. Kim, D. Park, S. Hong, and M. Cha (2021). Improving Unsupervised Image Clustering With Robust Learning. Accepted at Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG). [41] Y. Tao, K. Takagi, and K. Nakata (2021). Clustering- friendly Representation Learning via Instance Discrimination and Feature Decorrelation. arXi preprint arXiv:2106.00131.ICLR 2021 Workshop on Embodied Multimodal Learning (EML). [42] T. W. Tsai, C. Li, and J. Zhu (2021). MiCE: Mixture of Contrastive Experts for Unsupervised Image Clustering. ICLR 2021. [43] X. Wang, Z. Liu, and S. X. Yu (2021). Unsupervised Feature Learning by Cross-Level Instance-Group Discrimination. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12586-12595. [44] L. Amado, and F. Meneguzzi (2018). Q-Table compression for reinforcement learning. The Knowledge Engineering Review, vol. 33. https://doi.org/10.1017/S0269888918000280 [45] B. Yang, X. Fu, N. D. Sidiropoulos, and M. Hong (2017, July). Towards k-means-friendly spaces: Simultaneous deep learning and clustering. In International conference on machine learning, pp. 3861-3870. [46] K. Gupta, M. Y. Raghuprasad, and P. Kumar (2018). A hybrid variational autoencoder for collaborative filtering. arXiv preprint arXiv:1808.01006. [47] F. Shoeleh, N. M. Yadollahi, and M. Asadpour. (2020). Domain adaptation-based transfer learning using adversarial networks. The Knowledge Engineering Review, vol. 35. https://doi.org/10.1017/S0269888920000107 [48] K. L. Lim, X. Jiang, and C. Yi (2020). Deep clustering with variational autoencoder. IEEE Signal Processing Letters, vol. 27, pp. 231-235. https://doi.org/10.1109/LSP.2020.2965328 [49] Z. Jiang, Y. Zheng, H. Tan, B. Tang, and H. Zhou (2016). Variational deep embedding: An unsupervised and generative approach to clustering. arXiv preprint arXiv:1611.05148 [50] W. Etaiwi, D. Suleiman, and A. Awajan (2021). Deep Learning Based Techniques for Sentiment Analysis: A Survey. Informatica, vol. 45(7). https://doi.org/10.31449/inf.v45i7.3674 [51] A. S. Gaafar, J. M. Dahr, and A. K. Hamoud (2022). Comparative Analysis of Performance of Deep Learning Classification Approach based on LSTM- RNN for Textual and Image Datasets. Informatica, vol. 46(5). https://doi.org/10.31449/inf.v46i5.3872 [52] H. Ni. (2020). Face recognition based on deep learning under the background of big data. Informatica, vol. 44(4). https://doi.org/10.31449/inf.v44i4.3390