https://doi.org/10.31449/inf.v45i5.3321 Informatica 45 (2021) 675–686 675 An Improved Pattern Mining Technique For Graph Pattern Analysis Using a Novel Behavior of Artificial Bee Colony Algorithm Shriya Sahu, Meenu Chawla and Nilay Khare Manit, Bhopal, India E-mail: s.shriya88@gmail.com , chawlam@manit.ac.in, nilay.khare@rediffmail.com Keywords: pattern mining, swarm intelligence, machine learning Received: September 28, 2020 Rising data complexity and volume in the network has attracted researchers towards substructure analysis. Subgraph mining is an area that has gained remarkable attention in the last couple of years to offer an intelligent analysis of more massive graphs and complicated data structures. It has been observed that graph pattern mining faces issues regarding the matching ruleset and complex instruction set execution problem. This paper introduces modern-day intelligence architecture based on Swarm Intelligence that is cross-validated by supervised machine learning mechanisms. A new behavior incorporated with a new inter and intra hive behavior is incorporated in Swarm based Artificial Bee Colony. The proposed work model is evaluated over two different datasets with more than 4900 nodes in the graph. The proposed framework is evaluated using True Detection Rate, False Detection Rate, precision, and F-Measure, demonstrating an average improvement of 9.8%, 8.35%, 8.35% and 9.15% against existing GraMi work that represent an enhanced performance of the proposed pattern mining technique. Povzetek: Uporabljene so raznovrstne metode umetne inteligence in strojnega učenja za iskanje vzorcev, tj. podgrafov v grafu. 1 Introduction Frequent Subgraph Mining (FSM) plays a central role in solving complex problems for various applications such as text retrieval, computer vision, social networks, computational chemistry, and bioinformatics. Additionally, FSM also caters to the graphical problems in data mining tasks such as designing database, clustering, and classification of graphs. The main objective of the mining graphs is to compute the subgraphs whose appearances exceeds a certain threshold. Such a perspective is quite useful in understanding real-life applications. For example, protein-protein structures and their interactions easily modeled by labeled graphs. But it is a challenging task for uncertain graphs. Therefore, researchers focussed on efficient mining of frequent patterns on such graphs (Chen et al., 2018). Recently, there is a quite interest of the researchers to study the relationship between the entities and attributes in a social graph. Such a relation widely used in social media marketing likewise 90%, 14%, and 60% users said that customer’s trust, advertisement, and Twitter respectively play a critical role in shopping. Nonetheless, the mining subgraph in social graphs is more related than rules in case of itemsets. Consequently, the researcher of bioinformatics may determine the substructures within protein interaction graphs and structures. Such graphs have nodes and edges which represent proteins and their interactions. In addition, these graphs are updated whenever there is a need to represent the interactions of new proteins. However, a critical task for the researcher is to forecast the working of a newly added protein without any experimentation. But it is possible only through frequent mining by interacting with the new proteins having identical interactions. The problem of FSM is categorized into two phases, such as determining frequent patterns in either (a) graphical database having multiple inputs (Protein interaction or chemical compounds) or (b) large graph having single input (e.g., social media,) (Elseidy et al., 2014). The main task of FSM is to calculate all the subgraphs having support or frequency exceeds the minimum frequency threshold. In the case of multiple graphs, frequency is the count of pattern graphs (Ingalalli et al., 2018). But it is quite challenging to define the support notion in a single large graph. Thus, it is not enough to define the pattern that exists in a graph, whether it exists or not. Therefore, it is vital to determine all the isomorphisms (I) of A, which are distinct in nature from the pattern graph (G). Actually, ‘I’ is the exact match of A in the graph, which is used to pair the nodes, and edges with their respective labels (Cheng et al., 2014). For instance, if we talk about the collaboration graph (G) as depicted in Figure 1, subgraph (U 1) is having four isomorphisms. However, a typical approach to mine frequent subgraphs is using grow and store method which includes different phases such as (Gu et al., 2016; Yuan et al., 2012; Li et al., 2012) (a) Computation of all the nodes which exists at least user-defined threshold (£) and load their appearances (b) Frequently, extend the loaded 676 Informatica 45 (2021) 675–686 S. Sahu et al. appearances to build large, frequent subgraphs and then assess their frequency (c) new frequent subgraphs appearances storage (d) Repeat the phase 2 until no more these subgraphs detected (Rehman et al., 2018; Abdelhamid et al., 2017). A graph is the simplest form to represent the data by modeling relationships between the various objects. The interesting problem that comes in concern is pattern matching when handling graphical data. This matching sorted using the problem of subgraph isomorphism. For instance, there are two graphs, Y and Z, the role of subgraph isomorphism is to describe whether Z includes a subgraph which is isomorphic to Y, and it is an NP-hard problem. There are various algorithms proposed in the literature to solve this complex problem using genetic algorithm, MapReduce and Pregel (Bhuiyan & Hasan, 2015; Zhao et al., 2016; Choi et al., 2019). Moreover, generalized subgraph problem of the isomorphism sorted by developing some algorithms but these are limited to uncertain graphs which increases complexity, limited scalability and work with redundant data having supplementary data such as attributes or edge labels. Alternatively, researchers rely on metaheuristic algorithms such as Genetic algorithms to address the consequence of this problem. Most of the algorithms provide quality solutions with less time, but these are limited to the search capability in case of large space for the problem of subgraph isomorphism (Choi et al., 2019). Therefore, this paper solves this problem in frequent pattern mining using the Cuckoo search algorithm. The main advantage of using this algorithm is that it works efficiently within a large space in case of an NP-hard problem. The leveraged search capability detected the frequent patterns in a subgraph and evaluated the problem of subgraph isomorphism. Preliminaries A collaborative graph 𝐺 = (𝑆 , 𝑇 , 𝐾 ) contains various nodes S, edges T with a labeling function K which assigns labels to S and T. A subgraph of G consists of Y and Z such as 𝑌 ⊆ 𝑆 , 𝑇 and 𝑍 ⊆ 𝑆 , 𝑇 if 𝑆 𝑌 𝑎𝑛𝑑 𝑆 𝑍 ⊆ 𝑆 and 𝑇 𝑌 𝑎𝑛𝑑 𝑇 𝑍 ⊆ 𝑇 . 1.1 Subgraph Isomorphism (SI) Definition 1: There are two graphs given such as G: Y= (S Y, T Y) and Z = (S Z, T Z), the SI is an injective function such as d: 𝑆 𝑌 → 𝑆 𝑍 such as (m, n) Є 𝑇 𝑌 in case of (d(m), d(n))Є 𝑇 𝑅 where 𝑅 = (𝑆 𝑅 , 𝑇 𝑅 ) ⊆ 𝑍 . However, d is an induced SI in case if (𝑚 , 𝑛 ) ∉ 𝑇 𝑌 , then( d(m), d(n)) ∉ 𝑇 𝑅 . The basic difference between SI and induced SI is that edge absence in Y corresponds to the presence of an edge in Z must not present in case of induced SI. This mapping preserves the nodes and edges labels. For instance, subgraph Y has four isomorphism (𝑚 1 2 𝑚 2 6𝑚 3 10𝑚 4 ) with respect to collaborative graph G, and (𝑛 1 2 𝑛 2 6 𝑛 3 20𝑛 4 ; 𝑛 4 10𝑛 5 4 𝑛 6 𝑎𝑛𝑑 𝑛 0 10𝑛 6 10 𝑛 7 ; 𝑛 8 6 𝑛 9 10𝑛 10 ). But, an intuitive way to determine the frequency of a subgraph in a graph is to count the number of isomorphism. In a given graph, the SI problem is the computation of subgraphs 𝑍 ⊆ 𝑅 such that 𝑓 : 𝑆 𝑌 → 𝑆 𝑍 is an isomorphism from Y to R. This is rather a complex problem. Consequently, it is not an anti- monotone as the graphical representation shows that extension exceeds the subgraphs. For instance, in a given graph, node A appears 3 times while its extension (B) appears 4 times such that 𝐴 4 𝐵 . The graph having such anti monotonic nature is of prime importance as it provides various methods without avoiding a situation. There are several anti-monotone metrics developed in literature (Talukder and Zaki, 2016; Elseidy et al., 2014). Definition 2: There are two directed graphs such as G: Y= (S Y, T Y) and Z = (S Z, T Z) where ⌈𝑆 𝑌 ≤ 𝑆 𝑍 ⌉, the problem of SI represented by SI (Y, Z) is to determine an injective function 𝑑 : 𝑆 𝑌 → 𝑆 𝑍 that reduces the value of fitness function (f). The optimal solution using the Cuckoo search having f=0 is the SI from Y to Z.𝑠 𝐺 (𝑈 ) = min {𝑡 |𝑡 = ⌊𝐹 (𝑚 )| 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑚 ∈ 𝑌 The fitness function (f) is defined as edges count, which match or may not match during the mapping. This Figure 1: Directed Graphs with their isomorphism (a) Collaborative Graph (b) Subgraph 1 with four extensions (c) Subgraph 2 with three isomorphisms (d) Subgraph 3 with their appendices. An Improved Pattern Mining Technique For Graph... Informatica 45 (2021) 675–686 677 function is used to solve the real-world problems by constructing into SI problem and then solved it using the metaheuristic approach. For instance, subgraph U 1 in Figure 1b and the graph G given in Figure 1(a) have 𝐹 (𝑚 2 ) = {𝑛 2 , 𝑛 6 , 𝑛 7 }; 𝐹 (𝑚 1 ) = {𝑛 4 , 𝑛 5 , 𝑛 0 }; 𝐹 (𝑚 3 ) = {𝑛 4 , 𝑛 5 , 𝑛 9 }; 𝐹 (𝑚 4 ) = {𝑛 0 , 𝑛 9 , 𝑛 10 }. Thus, 𝑠 𝐺 (𝑈 1 ) = 4. In order to compare, the respective minimum support function is 2 𝑛 2 4𝑛 4 10 𝑛 5 𝑎𝑛𝑑 𝑛 6 4𝑛 5 10𝑛 0 as its isomorphism overlap and minimum support function regarded as unity. So, the main problem of FSM is given as follows: Problem 1: In Figure 1, graph G is given with a minimum threshold (£), so the frequent subgraph mining problem is to compute all the subgraphs (U) in Graph G, such as 𝑠 𝐺 (𝑈 ) ≥ (£). Actual appearances which exceed the (£) does not compute in the given problem. This is quite impressive, and it is useful for many applications, but some prefer actual appearances such as graph indexing. Definition 1 relies on matching the labels of edges and nodes. For instance, subgraph U 2 has only a single isomorphism constructed by nodes 𝑛 2 , 𝑛 3 , 𝑛 4 . However, research argues that developed matching is restrictive in nature and maintained by developing indirect relationships and differences of edges graphs and subgraphs. Such matching may also be possible for 𝑛 7 4𝑛 9 6 𝑛 8 , as seen in subgraph U 3. Rather, there is an indirect relation between A and C. This match often recognized as a pattern. For frequent mining patterns in this document, we use the definition from past research (Cheng et al., 2014). Specifically, a distance matric has been employed, which computes the distance between two nodes, as given in the graph. In Figure 2 for graph G, a distance function that connects the m and n has been defined as ∆ ℎ (𝑚 , 𝑛 ). The solid lines represent the relation using graph edges while dotted lines depict the transition. Let us consider that ∆ ℎ (𝑛 0 , 𝑛 3 ) =4, then it is easy to use ∆ 𝑝 (𝑚 , 𝑛 ) as the minimum sum of inversely proportional to the edge weights between the paths m and n. For instance, ∆ 𝑝 (𝑛 7 , 𝑛 8 ) = 1 4 + 1 6 = 0.4. Thus, a shorter distance belongs to robust collaboration. Definition 3: In a pattern graph, 𝑄 = (𝑆 𝑄 , 𝑇 𝑄 , 𝐾 𝑄 ) of a graph G (S,T,K) if 𝑆 𝑄 ⊆ 𝑆 , 𝐾 𝑄 (𝑚 ) = 𝐾 (𝑚 )𝑓𝑜𝑟 𝑎𝑙𝑙 𝑚 ∈ 𝑆 𝑄 𝑎𝑛𝑑 𝐾 𝑄 (𝑒𝑑𝑔𝑒𝑠 ) =∝ 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑒𝑑𝑔𝑒𝑠 ∈ 𝑇 𝑄 . In Figure 2, a pattern corresponds to a subgraph without any edge labels. However, Figure 2 (b) shows a pattern graph of a G. Definition 4: Let us consider a pattern graph 𝑄 = (𝑆 𝑄 , 𝑇 𝑄 , 𝐾 𝑄 )of a graph G = (S, T, K), and ∆ is a distance metric with a user − defined threshold (£). An injective function (𝜑 )from pattern Q to G is 𝑆 𝑄 → 𝑆 𝑜𝑛𝑙𝑦 𝑠𝑎𝑡𝑖𝑠𝑓𝑖𝑒𝑑 𝑓𝑜𝑟 𝐾 𝑄 (𝑚 ) = 𝐾 (𝜑 (𝑚 )) for all nodes 𝑚 ∈ 𝑆 𝑄 and ∆( 𝜑 (𝑛 ), 𝜑 (𝑚 )) ≤ £. The frequency and minimum support function of a pattern graph denoted by 𝜕 𝐺 (𝑄 )𝑐𝑎𝑛 𝑏𝑒 𝑒𝑎𝑠𝑖𝑙𝑦 𝑑𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑒𝑑 𝑢𝑠𝑖𝑛𝑔 𝑑𝑒 𝑓 𝑖𝑛𝑖𝑡𝑖𝑜𝑛 2. Let us consider a threshold value £ = 0.3 then we get 𝜕 𝐺 (𝑄 ) = 2. Notify that there are only two constraints satisfy through this pattern graph such as ∆( 𝜑 (𝑛 ), 𝜑 (𝑚 )) ≤ £. In this paper, the GRAMI approach used in conjunction with the optimization technique to address the frequent mining problem in graphs. Additionally, GRAMI is a novel approach that solves the frequent mining problem by satisfying the constraints without affecting isomorphism in the graph. This paper introduces a new algorithmic structure for the identification of the isomorphic patterns through the Cuckoo search algorithm. The rest of the paper is organized in the following manner. Section 3 represents the related work section, whereas section 4 represents the issue and solution as the proposed methodology of this paper. Figure 2: (a) Computation of distance for Graph G, as given in Figure 1. (b) Pattern graph. 678 Informatica 45 (2021) 675–686 S. Sahu et al. 2 Related work In today's era, graph pattern mining is a frequent problem that comes in concern due to wide applications across various domains. In the literature, various approaches developed to address this problem, but still, there are enough gaps related to the isomorphism problem. For instance, Zhao had developed the Pregel based frequent subgraph mining approach to improve the scalability. Pregel is a computational model used to process the vertex graphs. A modern, distributed framework developed using this model to overcome the mining problem. The robust results obtained, but this approach still does not solve the constrained subgraph patterns on massive pattern graphs (Zhao et al., 2016). Aridhi and Nguifo had presented a study that summarized the existing data mining and graph processing techniques that could address the challenges faced by big graphs. Further, they provided a detailed classification of various graph processing designs along with vivid large-scale patterns or subgraph mining approaches (Aridhi and Nguifo, 2016). Moussaoui et al. had addressed the problem faced when subgraphs similarity could not be established. In this regard, researchers had proposed a flexible approach based on probabilistic graph mining to identify similar subgraphs. In this approach, probabilistic matching was implemented in comparison to the traditional exact similarity check. Experimentation against a real dataset of vivid domains had established that the proposed probabilistic model demonstrated better performance in terms of time processing and similar subgraph mining (Moussaoui et al., 2018). It has been observed that the structure and shape of the graph vary with respect to their applications. In this context, Jena et al. had introduced the SparkFSM approach that was proficient in dealing with isomorphism as well as directed and undirected graphs related to Spark or Scala technologies (Jena et al., 2018). Islam group had proposed WFSM-MaxPWS as an effective approach for subgraph mining based on weighted graphs. The mining approach proved to be very efficient in subgraph pruning. The approach was evaluated against different graph datasets representing normal and negative exponential weight distributions. Results had demonstrated that the runtime has significantly improved in comparison to the MaxW pattern mining approach (Islam et al., 2018). Iyer et al. had presented ASAP as an approximation-based subgraph and pattern mining technique. The authors also constructed an Error Latency Profile to specify the fluctuations observed for accuracy and current state in addition to approximating the graph patterns. Experimentation demonstrated that ASAP could successfully handle higher degree graphs comprising of billions of edges (Iyer et al., 2018). Researchers developed the metaheuristic-based algorithm to solve the isomorphism problem. The design issues have been considered to address the problem, which helps to decompose the consequences of a problem into the substructure. The optimal structure obtained using the hybrid genetic algorithm, which shows better results, but this approach has limited scalability and works in a concise search space (Choi et al., 2019). Preti et al. addressed the issue of pattern mining in large graphs representing multiple weight patterns. In the study, a scoring function-based pruning strategy was proposed that exemplified approximate as well as exact results to present subgraph mining (Preti et al., 2019). Detection strategies related to graphs were considered to be a very challenging task by Rao and Mishra. They had implemented pattern mining based on Edge Weight Detection (EdWePat) approach for identifying the subgraph patterns present in a weighted graph (Rao and Mishra, 2019). Li et al. had addressed the complex relationships existing in big graphs by introducing a fuzzy approach to traditional graph and pattern mining strategies. Authors had presented a multi- fuzzy based optimization using the Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The experimental evaluation demonstrated the effectiveness of the proposed strategy over the existing approaches (Li et al., 2019). Ray et al. had addressed the issue faced by subgraph mining that needs to be repeated frequently with respect to streaming larger graphs. In the process, they had developed a sampling design that could successfully mine out the subgraphs that represent the latest modification in the larger graph. Authors had involved 5 large graph datasets and a network motif mining algorithm to evaluate the proposed design. The results demonstrated that the proposed design could speedily identify the changing patterns (Ray et al. 2019). Priyadarshini and Rodda had proposed a Geometric Multi-Way Frequent Subgraph Mining (GMFSM) method. This method took advantage of the Frequent Subgraph Mining and filtration technique to shortlist the subgraphs from a single large database. The approach proved to be very effective and robust in achieving the required results and reduced the mining time from 1 3 ⁄ rd to 1 2 ⁄ in comparison to the existing approaches (Priyadarshini and Rodda, 2020). Le and his group had postulated a Weighted Graph Mining (WeGraMi) algorithm as an effective approach for subgraph pruning. The design first calculated the weights of the pruned subgraphs, followed by applying search space analytics for subgraph pruning. The subgraph mining approach, based on the weighted threshold, had effectively addressed the issues concerning storage space and processing time (Le et al., 2020). Consequently, the probabilistic approach was investigated for frequent mining patterns on the uncertain graphs. An enumeration evaluation algorithm was proposed to address the semantic problem. Additionally, the computation sharing approach was used to obtain better performance. The issue of mining and proposed solution 2.1 Dataset The dataset is gathered from the following data sources. a) http://data-mining.philippe-fournier- viger.com/subgraph-mining-datasets/ (dataset-1) b) http://www.kaggle.com (dataset-2) Both the dataset links have more than 5000 data elements and are open for download and processing. The dataset-1 contains two standard subgraph mining data, which is provided for a small graph dataset. The file An Improved Pattern Mining Technique For Graph... Informatica 45 (2021) 675–686 679 is available in the form of text, which composed of one or more graph. The graph is available in different format such as t≠ 𝑁 , vML, ePQL. t≠ 𝑁 → It represents the first line and is the N th graph in the file. vML → It represents the M th vertex of the recent graph with a label L ePQL → This attribute represents an edge with P th and Q th vertex for L number of labels. Dataset-2 has been collected from Kaggle site. Comma-separated list is the simplest and supported file type available in Kaggle. Kaggle-loaded CSV’s must have a header column with field names which can be easily read by human. The CSV file composed of two columns each contains metadata and description of data. 2.2 Issue and solution Big graphs have always been an era of interest for different research world field experts. In order to understand the exact laying pattern of the big graphs, the normal mapping will result in a faulty rate of classification architecture. As the false placement of the pattern value can be done smartly and hence the standard mining architecture is not suitable enough for such kind of processing. This paper presents an improved behavior of the Artificial Bee Colony (ABC) algorithm to identify the pattern of the graphs. The general architecture of artificial bee colony has three kinds of bees as follows a) The employed bee b) The onlooker bee c) The scout bee The employed bee is the one who is responsible for the food collection, onlooker bee is for the monitoring purpose, and scout bee is searching for food sources randomly. This paper presents a new behavioral architecture of the artificial bee colony. At the initial phase, the entire graph is divided into 4 subsequent parts taking the initial point to be random, as shown in Figure 3(a) and (b). As shown in Figure 3(b), the entire graph is divided into 4 different populations as Area 1, 2, 3, and 4. Now the bee colony algorithm will form 4 hives in each section, and the inter, as well as intra mining, will be formed. There is a semi queen for each population area, which determines the threshold of the mapped graph in each section, proceeded by the 20% selection rule. Apart from this inter clustering mechanism for bees, there is an intra mechanism as well. Pseudo Code 1 illustrates the working of ABC for the intracluster region. PSEUDO CODE 1: 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐴𝑝𝑝𝑙 𝑦 𝐴𝐵𝐶 1. 𝐼𝑛𝑝𝑢𝑡𝑠 : 𝑆𝑢 𝑏 𝐺𝑟𝑎𝑝 ℎ 𝑁𝑜𝑑𝑒𝑠 , 𝑂𝑢𝑡𝑝𝑢𝑡 : 𝑀𝑎𝑡𝑐 ℎ𝑒 𝑑 𝑃𝑎𝑡𝑡𝑒𝑟𝑛 2. 𝐹𝑜𝑟𝑒𝑎𝑐 ℎ 𝑛𝑜𝑑𝑒 𝑖𝑛 𝑆𝑢 𝑏 𝐺𝑟𝑎𝑝 ℎ // For every node in Node List 3. 𝐼𝑛𝑖𝑡𝑖𝑎𝑙 𝑁 𝑒𝑐𝑡𝑎 𝑟 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 = 𝑅𝑎𝑛𝑑𝑜 𝑚 𝑆𝑎𝑚𝑝𝑙𝑒 // Initializing the Nectar 4. 𝐸𝑚𝑝𝑙𝑜𝑦𝑒 𝑑 𝐵𝑒𝑒 = 𝑛𝑜𝑑𝑒 . 𝐸𝑑𝑔 𝑒 𝑊𝑒𝑖𝑔 ℎ𝑡 // The employed bee will be the edge weight of containing nectar 5. 𝐹𝑖𝑛𝑑 𝐼 𝑛 𝐷𝑒𝑔𝑟𝑒𝑒 ; // Find inputs to the containing vertex 6. 𝐹𝑖𝑛𝑑 𝑂𝑢 𝑡 𝐷𝑒𝑔𝑟𝑒𝑒 ; // Find outs to the containing vertex 7. 𝑇𝑜𝑡𝑎 𝑙 𝐹𝑜𝑜 𝑑 𝑃𝑒 𝑟 𝑁𝑒𝑐𝑡𝑎𝑟 = 𝐼 𝑛 𝐷𝑒𝑔𝑟𝑒𝑒 . 𝐸𝑑𝑔𝑒 𝑊𝑒𝑖𝑔 ℎ𝑡 + 𝑂𝑢 𝑡 𝐷𝑒𝑔𝑟𝑒𝑒 . 𝐸𝑑𝑔𝑒 𝑊𝑒𝑖𝑔 ℎ𝑡 // Total food in the hive will be equal to the edge weight of inward degree and the outward degree 8. 𝑆𝑡𝑜𝑟𝑒 𝑡𝑜 𝐹𝑜𝑜 𝑑 𝐶𝑜𝑛𝑡𝑎𝑖𝑛𝑒𝑟 // Add the calculated value to the Food Container 9. 𝐸𝑛𝑑 10. 𝐹𝑜𝑟𝑒𝑎𝑐 ℎ 𝐵𝑒 𝑒 𝐹𝑜𝑜𝑑 𝑖𝑛 𝑇𝑜𝑡𝑎 𝑙 𝐹𝑜𝑜 𝑑 𝑃𝑒 𝑟 𝑁𝑒𝑐𝑡𝑎𝑟 // Two different ranges are created C B A B B B A C A B B 0.1 0.05 0.6 0.1 0.16 0.16 n 1 n 2 n 3 n 4 n 0 n 5 n 6 n 7 n 8 n 9 n 10 G Figure 3: (a). Normal Pattern which has subsequent sections. 680 Informatica 45 (2021) 675–686 S. Sahu et al. 11. 𝑅𝑎𝑛𝑔𝑒 1 = 𝐵𝑒 𝑒 𝐹𝑜𝑜𝑑 + 𝐵𝑒 𝑒 𝐹𝑜𝑜𝑑 ∗ .20 // The first is 20 % above the provided belt 12. 𝑅𝑎𝑛𝑔𝑒 2 = 𝐵𝑒 𝑒 𝐹𝑜𝑜𝑑 – 𝐵𝑒 𝑒 𝐹𝑜𝑜𝑑 ∗ .20 // Second is 20 % below the provided belt 13. 𝑆𝑒𝑎𝑟𝑐 ℎ 𝑅𝑒𝑠𝑢𝑙𝑡 𝑅𝑎𝑛𝑔𝑒 1 <= 𝐷𝑎𝑡 𝑎 𝑉𝑎𝑙𝑢𝑒 <= 𝑅𝑎𝑛𝑔𝑒 2 // Searching any other value in the same range 14. 𝐹𝑜𝑟𝑒𝑎𝑐 ℎ 𝑆𝑅 𝑖𝑛 𝑆𝑒𝑎𝑟𝑐 ℎ_𝑅𝑒𝑠𝑢𝑙𝑡 15. 𝑆𝑢𝑠𝑝𝑒𝑐 𝑡 𝑀𝑎𝑡𝑐 ℎ + + // This could be a suspect similar graph pattern 16. 𝐸𝑛 𝑑 𝐹𝑜𝑟 The artificial bee colony creates a random population for the processing of the graph pattern. Each edge weight value will act as food to the nectar. The food calculation is done by summing up the edge weights of the in-degree and the out-degree of the nectar. The in-degree is increased by one if any node gets an edge from any other node in the graph. The out-degree is then incremented by one if the current node has an edge for any other node in the graph. Two range belts are created out of which the first proposed belt is 20% above and the second belt is 20% below the given belt. The search is done on the base of the calculated two new belts. The working is also represented by the flowchart, which is illustrated in Figure 4. The found architectures could be a match of graph pattern, but it can't be termed as a final match. To find whether it is an exact match or not, the connecting edge value is passed to neural network. The ordinal measures of neural are defined in Table 1. The pseudo-code for the architecture of neural network is given by Pseudo Code 2. PSEUDO CODE 2 1. 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐴𝑝𝑝𝑙𝑦 𝑁𝑒𝑢𝑟𝑎𝑙 𝑁𝑒𝑡𝑤𝑜𝑟 𝑘 𝑠 2. 𝐼𝑛𝑝𝑢𝑡 : 𝑆𝑢𝑠𝑝𝑒𝑐𝑡 𝑁𝑜𝑑𝑒𝑠 𝑂𝑢𝑡𝑝𝑢𝑡 : 𝑀𝑎𝑡𝑐 ℎ𝑒𝑑 𝐺𝑟𝑎𝑝 ℎ 𝑉𝑎𝑙𝑢𝑒 3. 𝑆𝑒𝑡 𝑇𝑟𝑎𝑖𝑛𝑖𝑛 𝑔 𝑉𝑎𝑙𝑢𝑒 𝑡𝑜 𝐸𝑚𝑝𝑡𝑦 // Initialize the Training Value to Empty, the matched architecture's edge weight will be passed as the training value 4. 𝑆𝑒𝑡 𝑇𝑎𝑟𝑔𝑒 𝑡 𝑉𝑎𝑙𝑢𝑒 𝑡𝑜 𝐸𝑚𝑝𝑡𝑦 // The associated target value will be initialized to null 5. 𝐴𝑠𝑠𝑖𝑔𝑛 𝑇𝑟𝑎𝑖𝑛𝑖𝑛 𝑔 𝑉𝑎𝑙𝑢𝑒 𝑎𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑒𝑑 𝑤𝑖𝑡 ℎ 𝑆𝑢𝑠𝑝𝑒𝑐𝑡 𝑁𝑜𝑑𝑒𝑠 6. 𝐹𝑜 𝑟 𝑒𝑎𝑐 ℎ 𝑠𝑝 𝑖𝑛 𝑆𝑢𝑠𝑝𝑒𝑐 𝑡 𝐿𝑖𝑠𝑡 7. 𝑇𝑟𝑎𝑖𝑛𝑖𝑛 𝑔 𝑉𝑎𝑙𝑢𝑒 . 𝐴𝑝𝑝𝑒𝑛𝑑  𝑆𝑒𝑡 . 𝐸𝑑𝑔 𝑒 𝑉𝑎𝑙𝑢𝑒 // Assigning Edge Value 8. 𝐴𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑒 𝑑 𝑇𝑎𝑟𝑔𝑒 𝑡 𝑉𝑎𝑙𝑢𝑒  𝑆𝑒 𝑡 . 𝑠𝑝 . 𝐼𝑑 9. // Setting the target value as the edge value 10. 𝑆𝑡𝑎𝑟𝑡 𝑇𝑟𝑎𝑖𝑛𝑖𝑛 𝑔 𝐴𝑟𝑐 ℎ𝑖𝑡𝑒𝑐𝑡𝑢𝑟𝑒 // Starting the training architecture as per Ordinal Measures of Table 1 11. 𝐼𝑓 𝑖𝑠 𝑠𝑎𝑡𝑖𝑠𝑓𝑖𝑒𝑑 𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 𝑉𝑎𝑙𝑢𝑒 // Check whether the gradient is satisfied or not 12. 𝑇𝑟𝑎𝑖𝑛𝑖𝑛 𝑔 𝐶𝑜𝑚𝑝𝑙𝑒𝑡𝑒 // If the gradient is satisfied, the training is complete 13. 𝐸𝑛𝑑 14. 𝑆𝑡𝑜𝑟𝑒 𝑇𝑟𝑎𝑖𝑛𝑒𝑑 𝐴𝑟𝑐 ℎ𝑖𝑡𝑒𝑐𝑡𝑢𝑟𝑒 𝑎𝑠 𝑝𝑒𝑟 𝑆𝑢𝑝𝑒𝑟𝑣𝑖𝑠𝑒𝑑 𝑀𝑎𝑐 ℎ𝑖𝑛𝑒 𝐿𝑒𝑎𝑟𝑛𝑖𝑛𝑔 // Applying Machine Learning as per 15. 𝑈𝑝𝑙𝑜𝑎𝑑 𝑎𝑙𝑙 𝑡𝑟𝑎𝑖𝑛𝑖𝑛 𝑔 𝑑𝑎𝑡𝑎 𝑎𝑠 𝑇𝑒𝑠𝑡 𝐷𝑎𝑡𝑎 // Uploading the test data 16. 𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑦 𝑢𝑠𝑖𝑛𝑔 𝑇𝑟𝑎𝑖𝑛𝑒 𝑑 𝐴𝑟𝑐 ℎ𝑖𝑡𝑒𝑐𝑡𝑢𝑟𝑒 // Classify the test data as per stored trained value C B A B B B A C A B B 0.1 0.05 0.6 0.1 0.16 0.16 n 1 n 2 n 3 n 4 n 0 n 5 n 6 n 7 n 8 n 9 n 10 G Population Area 1 Population Area 2 Population Area 3 Population Area 4 Figure 3: (b). Initially divided sections. Propagation Iterations 100-500 Hidden Neuron Count 20-100 Hidden Layer Count 2 Back Propagation Architecture Levenberg Satisfaction Criteria Gradient Back Propagation Parameter Mean Squared Error Table 1: Ordinal Measures of Neural Network. An Improved Pattern Mining Technique For Graph... Informatica 45 (2021) 675–686 681 17. 𝐼𝑓 𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑 𝐿𝑎𝑏𝑒𝑙 𝑖𝑠 𝑛𝑜𝑡 𝑠𝑖𝑚𝑖𝑙𝑎𝑟 𝑡𝑜 𝑡 ℎ𝑒 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑙𝑎𝑏𝑒𝑙 // If the classified value is not similar to trained label 18. 𝑀𝑎𝑡𝑐 ℎ𝑒 𝑑 𝑃𝑎𝑡𝑡𝑒𝑟𝑛 + + // The architecture is similar to other architecture 19. 𝐸𝑛𝑑 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 The learning and classification mechanism is represented in Figure 5. The flow chart is labeled from 1-10 as per its process occurrence. Based on the proposed algorithm, the evaluated results are discussed in Section 5. 3 Results The performance of the proposed subgraph mining approach is evaluated in terms of time performance, memory overhead and number of subgraphs pruned with variation in the supported threshold frequency. The supported threshold is varied to investigate its effect in returning a non-empty set of patterns or subgraphs. Time performance of the proposed work is evaluated against the four existing studies namely, Ingalalli et al., 2018, Qiao et al., 2018, Abdelhamid et al., 2017, Elseidy et al.,2014 and Le et al., 2020. The considered studies have proposed subgraph pruning strategies inspired by Elseidy et al., 2014 work, who had proposed GraMi for subgraph from larger complex graphs based on the supported threshold frequency. Ingalalli et al., 2018 had proposed MuGraM as an algorithm to identify frequent subgraph patterns from multigraph structure. Qiao et al., 2018 has proposed SSiGraM as a parallel subgraph mining algorithm that was based on Apache Spark framework. Abdelhamid et al., 2017 proposed IncGM+ as a fast incremental system for frequent subgraph mining to resolve the challenges of evolving graphs. Le et al., 2020 developed a Weighted Graph Mining algorithm for subgraph pruning that was named as WeGraMi. In this approach, the weighted graph mining was followed by search space analytics for subgraph pruning. The experiments are conducted for 10 frequency thresholds that are plotted on X-axis against the running time on Y-axis to evaluate the effectiveness of the proposed work as shown in Figure 6. It is observed that original GraMi required highest running time; however, MuGraM, SSiGraM, IncGM+, WeGraMi including proposed work involved lower running time over different supported thresholds. Further it is also established that the proposed work exhibited the lowest time for subgraph mining on the threshold values under study. This establishes the fact that the proposed work not only outperformed the GraMi but also proved to be better than most of the existing works that were inspired by GraMi. In addition to running time, memory consumption is another important parameter that decides the feasibility of the proposed technique. Figure 7 compares the memory overhead of the proposed work with IncGM+ and WeGraMi over the supported threshold frequency. It is observed that with decrease in the threshold, the memory consumption rises for all the works. However, this trend is very gradual in case of proposed work. Overall, minimum memory usage is found for the proposed work in comparison to IncGM+ and WeGrami. START Divide Graph into Different Hives For each Hive Value , Calculate In Degree and Out Degree In degree  In coming edge Out Degree  Outgoing Edge Calculate the Total Degree Edge Weight as Bee Food Bee Nector Range 1= Bee Food + 20% 2.= Bee Food – 20% Find Similar BeeFoods If Similar Food Found Yes Add to Suspect List No Process Next All nodes are Processed No Pick Next Node Yes STOP Figure 4: The working architecture of Proposed ABC. 682 Informatica 45 (2021) 675–686 S. Sahu et al. Figure 8 compares the number of subgraphs pruned using various approaches. In addition to above evaluations, the performance of the proposed work is also estimated using quality parameters in terms of True Detection Rate (TDR), False Detection Rate (FDR) and F- Measure in comparison to GraMi. The parametric values are calculated using as follows: 𝑇𝐷𝑅 = 𝑇𝑜𝑡𝑎 𝑙 𝑡𝑟 𝑢 𝑒 𝑑𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛𝑠 𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐷𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛𝑠 (1) Where, 𝑇𝐷𝑅 is the ratio of the total number of true matchings to the total number of detections. 𝐹𝐷𝑅 = 𝑇𝑜𝑡𝑎 𝑙 𝐹𝑎𝑙𝑠 𝑒 𝐷𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛𝑠 𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐷𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛𝑠 (2) Where, 𝐹𝐷𝑅 is the total number of false detections observed to the total number of detections. 𝐹 − 𝑀𝑒𝑎𝑠𝑢𝑟𝑒 = 2∗𝑇𝐷𝑅 ∗𝐹𝐷𝑅 𝑇𝐷𝑅 +𝐹𝐷𝑅 (3) Where, 𝐹 − 𝑀𝑒𝑎𝑠𝑢𝑟𝑒 is twice the product of 𝑇𝐷𝑅 and 𝐹𝐷𝑅 to the summed-up value of 𝑇𝐷𝑅 and 𝐹𝐷𝑅 . Table 2 and Table 3 summarises the analysis of the data for precision, TDR, FDR, and f-measure. Precision and f-measure values observed for both GraMi and proposed work are listed in Table 2. The range of nodes for evaluation lies from 100 to 5000.The parametric values observed for precision calculation are plotted in Figure 9. The parametric values of precision are plotted against a number of nodes from 100 to 5000. It is observed that GraMi achieved an average precision of 66.76%, whereas the average precision of the proposed work is 75.11%. Overall, the proposed work achieved an enhanced precision of 8.35%. F-measure values are compared in Figure 10. It is observed that f-measure for GraMi lies in the range of 0.633 to 0.645, and for proposed work, it lies in the range of 0.721 to 0.751. An average f-measure observed for GraMi and proposed work is 0.65 and 0.74 respectively. START 1 For every suspected Node in the List 2 Calculate Edge Weight and assign target label 3 Initialize Neural Training with 20- 100 Neurons Layer Count=2 4 Start Propagation 5 If is satisfied the gradient 6-Yes Stop Neural Training Store the Propagated Structure 6' 6'-No Continue Propagation Assign Test Data  Multi level Training Data 7 Reconstruct Group Size Simulate Using Trained Structure If Simulated Group == Re constructed Group 10 No match 10' Similar Pattern Identified 8 9 STOP 11 11' Figure 5: Process Diagram of Neural Networks. An Improved Pattern Mining Technique For Graph... Informatica 45 (2021) 675–686 683 F-measure can be understood in terms of the harmonic mean of precision and TDR. It is observed that both values are higher for the proposed work as compared to the GraMi. Therefore f-measure is also higher for the proposed work. On average, there are 0.0915 differences in the f-measure values between the two works. True Detection Rates and False Detection Rates for GraMi and Proposed work are summarized in Table 3. TDR values for GraMi and proposed work are listed in columns 2 and 3 while FDR values of GraMi and proposed work are listed in column 4 and column 5. The numbers of nodes are in the range from 100 to 5000. TDR of the GraMi and proposed work are compared in Figure 11. The parametric values of TDR are plotted on Y-axis against the number of nodes plotted on the X-axis. GraMi achieved an average TDR of 0.624 as compared to an average TDR of 0.722 for the proposed work. On average, it is concluded that the proposed work had 9% better TDR as compared to the GraMi. FDR observed for GraMi and proposed work are comparatively plotted in Figure 12. The graph shows that the proposed work demonstrates comparatively low FDR as compared to GraMi. On average, FDR of 0.3324 and 0.2488 is observed for GraMi and proposed work respectively. In other words, the proposed work achieved an average lower FDR of 8.35%. 4 Conclusion The paper has addressed the challenges faced by subgraph pattern mining of larger network graphs. The authors had designed and evaluated the performance of the proposed Figure 6: Comparison of Time Performance. Figure 7: Comparison of Memory Overhead. 10 100 1000 10000 4,5 4,3 4,1 3,9 3,4 3,5 3,3 3,1 2,9 2,7 Time in seconds (log scale) Supported Threshold (in 1000) Time Performance MuGraM GraMi SSiGraM 0 20 40 60 80 100 3,5 3,3 3,1 2,9 2,7 2,5 2,3 Memory Overhead in MB Supported Threshold (in 1000) Memory Overhead IncGM+ WeGraMi Proposed 684 Informatica 45 (2021) 675–686 S. Sahu et al. structure, which is a combination of Swarm Intelligence and Machine Learning for pattern mining. A new fitness function and a inter and intra hive behavior are introduced for Artificial bee Colony and are cross-validated by Machine learning based Feed Forward Back Propagation Neural Network. The performance of the proposed work is evaluated in terms of TDR, FDR, precision, and f- measure. A range from 100 to 5000 nodes are being analyzed for both proposed and GraMi. It is observed that both proposed work and GraMi achieve an average precision of 75.114% and 66.76%, TDR of 0.7215 and 0.6225, FDR of 0.2488 and 0.3324, and f-measure of 0.736 and 0.645. It is observed that an improved average precision, TDR, FDR, and f-measure of 8.35%, 9.8%, 8.35%, and 9.15% have been demonstrated by the proposed work in comparison to the GraMi. Hence, it is concluded that the proposed work outperformed the existing work. References [1] Chen, Y., Zhao, X., Lin, X., Wang, Y. and Guo, D., 2018. Efficient Mining of Frequent Patterns on Uncertain Graphs. IEEE Transactions on Knowledge and Data Engineering, 31(2), pp.287-300. https://doi.org/10.1109/tkde.2018.2830336 Figure 8: Comparison of subgraph pruning. 0 10 20 30 40 50 60 70 80 3,5 3,3 3,1 2,9 2,7 2,5 Number of Patterns Supported Threshold (in 1000) Subgraph Pruning MuGraM WeGraMi Proposed Number of Nodes Precision F-measure GraMi Proposed GraMi Proposed 100 0.6571 0.7315 0.633 0.721 500 0.6598 0.7414 0.635 0.726 1000 0.6642 0.7465 0.642 0.73 2000 0.6689 0.7546 0.65 0.738 3000 0.6711 0.7587 0.653 0.743 4000 0.6734 0.7599 0.657 0.747 5000 0.6787 0.7654 0.645 0.751 Table 2: Precision and f-measure for both the datasets. Number of Nodes TDR FDR GraMi Proposed GraMi Proposed 100 0.6101 0.7089 0.3429 0.2685 500 0.6111 0.7102 0.3402 0.2586 1000 0.6211 0.7141 0.3358 0.2535 2000 0.6314 0.7214 0.3311 0.2454 3000 0.6354 0.7276 0.3289 0.2413 4000 0.6412 0.7329 0.3266 0.2401 5000 0.6144 0.7356 0.3213 0.2346 Table 3: TDR and FDR for both the datasets. An Improved Pattern Mining Technique For Graph... Informatica 45 (2021) 675–686 685 [2] Elseidy, M., Abdelhamid, E., Skiadopoulos, S., & Kalnis, P. (2014). Grami: Frequent subgraph and pattern mining in a single large graph. Proceedings of the VLDB Endowment, 7(7), 517-528. https://doi.org/10.14778/2732286.2732289 [3] Ingalalli, V., Ienco, D. and Poncelet, P., 2018. Mining frequent subgraphs in multigraphs. Information Sciences, 451, pp.50-66. https://doi.org/10.1016/j.ins.2018.04.001 [4] Cheng, H., Yan, X., & Han, J. (2014). Mining graph patterns. In Frequent pattern mining (pp. 307-338). Springer, Cham. https://doi.org/10.1007/978-3-319-07821-2_13 [5] Gu, Y., Gao, C., Wang, L., & Yu, G. (2016). Subgraph similarity maximal all-matching over a large uncertain graph. World Wide Web, 19(5), 755- 782. https://doi.org/10.1007/s11280-015-0358-9 [6] Yuan, Y., Wang, G., Chen, L., & Wang, H. (2012). Efficient subgraph similarity search on large probabilistic graph databases. Proceedings of the VLDB Endowment, 5(9), 800-811. https://doi.org/10.14778/2311906.2311908 [7] Li, J., Zou, Z., &Gao, H. (2012). Mining frequent subgraphs over uncertain graph databases under probabilistic semantics. The VLDB Journal, 21(6), 753-777. https://doi.org/10.1007/s00778-012-0268-8 [8] Rehman, S.U., Asghar, S. and Fong, S.J., 2018. Optimized and Frequent Subgraphs: How Are They Related? IEEE Access, 6, pp.37237-37249. https://doi.org/10.1109/access.2018.2846604 [9] Abdelhamid, E., Canim, M., Sadoghi, M., Bhattacharjee, B., Chang, Y. C., &Kalnis, P. (2017). Incremental frequent subgraph mining on large evolving graphs. IEEE Transactions on Knowledge and Data Engineering, 29(12), 2710-2723. https://doi.org/10.1109/tkde.2017.2743075 [10] Bhuiyan, M. and Hasan, M.A. (2015) An iterative mapreduce based frequent subgraph mining algorithm. IEEE Trans. Knowl. Data Eng., 27, 608– 620. https://doi.org/10.1109/tkde.2014.2345408 [11] Zhao, X., Chen, Y., Xiao, C., Ishikawa, Y. and Tang, J., 2016. Frequent subgraph mining based on Pregel. The Computer Journal, 59(8), pp.1113-1128. https://doi.org/10.1093/comjnl/bxv118 [12] Talukder, N. and Zaki, M.J., 2016. A distributed approach for graph mining in massive networks. Data Mining and Knowledge Discovery, 30(5), pp.1024- 1052. https://doi.org/10.1007/s10618-016-0466-x [13] Aridhi, S., & Nguifo, E. M. (2016). Big graph mining: Frameworks and techniques. Big Data Research, 6, 1- 10. https://doi.org/10.1016/j.bdr.2016.07.002 [14] Moussaoui, M., Zaghdoud, M. and Akaichi, J., 2018. A New Framework of Frequent Uncertain Subgraph Figure 12: Precision. 0,6 0,62 0,64 0,66 0,68 0,7 0,72 0,74 0,76 0,78 100 500 10002000300040005000 Precision Number of Nodes GraMi Proposed Figure 9: F-measure. 0,6 0,62 0,64 0,66 0,68 0,7 0,72 0,74 0,76 F-measure Number of Nodes GraMi Proposed Figure 11: True Detection Rate 0,5 0,55 0,6 0,65 0,7 0,75 100 500 10002000300040005000 True Detection Rate Number of Nodes GraMi Proposed Figure 10: False Detection Rate. 0,2 0,22 0,24 0,26 0,28 0,3 0,32 0,34 0,36 100 500 10002000300040005000 False Detection Rate Number of Nodes GraMi Proposed 686 Informatica 45 (2021) 675–686 S. Sahu et al. Mining. Procedia Computer Science, 126, pp.413- 422. https://doi.org/10.1016/j.procs.2018.07.275 [15] Jena, B., Khan, C., & Sunderraman, R. (2018, November). SparkFSM: A Highly Scalable Frequent Subgraph Mining Approach using Apache Spark. In 2018 IEEE International Conference on Data Mining Workshops (ICDMW) (pp. 990-997). IEEE. https://doi.org/10.1109/icdmw.2018.00143 [16] Islam, M. A., Ahmed, C. F., Leung, C. K., & Hoi, C. S. (2018, June). WFSM-MaxPWS: an efficient approach for mining weighted frequent subgraphs from edge-weighted graph databases. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 664-676). Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_52 [17] Iyer, A. P., Liu, Z., Jin, X., Venkataraman, S., Braverman, V., & Stoica, I. (2018). {ASAP}: Fast, Approximate Graph Pattern Mining at Scale. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18) (pp. 745- 761). [18] Choi, H., Kim, J., Yoon, Y. and Moon, B.R., 2019. Investigation of incremental hybrid genetic algorithm with subgraph isomorphism problem. Swarm and Evolutionary Computation, 49, pp.75-86. https://doi.org/10.1016/j.swevo.2019.05.004 [19] Preti, G., Lissandrini, M., Mottin, D., & Velegrakis, Y. (2019). Mining patterns in graphs with multiple weights. Distributed and Parallel Databases, 1-39. https://doi.org/10.1007/s10619-019-07259-w [20] Rao, B., & Mishra, S. (2019). An Approach to Detect Patterns (Subgraphs) with Edge Weight in Graph Using Graph Mining Techniques. In Computational Intelligence in Data Mining (pp. 807-817). Springer, Singapore. https://doi.org/10.1007/978-981-10-8055-5_71 [21] Li, L., Zhang, F., & Liu, G. (2019). Multi-fuzzy- objective graph pattern matching with big graph data. Journal of Database Management (JDM), 30(4), 24- 40. https://doi.org/10.4018/jdm.2019100102 [22] Ray, A., Holder, L. B., & Bifet, A. (2019). Efficient frequent subgraph mining on large streaming graphs. Intelligent Data Analysis, 23(1), 103-132. https://doi.org/10.3233/ida-173705 [23] Priyadarshini, S., & Rodda, S. (2020). Geometric Multi-Way Frequent Subgraph Mining Approach to a Single Large Database. In Smart Intelligent Computing and Applications (pp. 233-244). Springer, Singapore. https://doi.org/10.1007/978-981-32-9690-9_23 [24] Le, N. T., Vo, B., Nguyen, L. B., Fujita, H., & Le, B. (2020). Mining weighted subgraphs in a single large graph. Information Sciences, 514, 149-165. https://doi.org/10.1016/j.ins.2019.12.010