https://doi.org/10.31449/inf.v46i5.3817 Informatica46 (2022) 85–94 85 EvaluatingGroupDegreeCentralityandCentralizationinNetworks Mario Karlovˇ cec 1 , Matjaž Krnc 2 and Riste Škrekovski 3 E-mail: mario.karlovcec@ijs.si, matjaz.krnc@gmail.com, skrekovski@gmail.com 1 Artificial Intelligence Laboratory, Jozef Stefan Institute and Jozef Stefan International Postgraduate School Jamova 39, 1000 Ljubljana, Slovenia 2 University of Primorska, Faculty of Mathematics, Natural Sciences and Information Technologies, Koper, Slovenia 3 Faculty of information studies, Novo Mesto, Slovenia Keywords: degree centralization, group centrality, network analysis Received: November 8, 2021 Given a networkG, the importance of groups can be modeled by group centrality measures. Freeman’s centralization is a way to normalize any given centrality or group centrality measure, which enables us to compare individuals or groups from different networks. We focus on a degree-based measure of group centrality and centralization, presented by Krnc and Škrekovski (2020). We describe its efficient imple- mentation and study the behaviour of various real-world networks within this context. We conclude that very small groups, as well as very big ones, are not very central, i.e. as the group is growing, its value is increasing but, at some point it starts decreasing. Such unimodular behaviour is confirmed by our analysis of of group degree centralization of six real-world networks. At the end, we provide some challenges for future work. Povzetek: Glede na podano omrežjeG lahko pomembnost skupin modeliramo s pomoˇ cjo mer skupinske centralnosti. Freemanova centralizacija predstavlja naˇ cin normalizacije za poljubno obstojeˇ co mero cen- tralnosti, kot tudi za poljubno obstojeˇ co mero skupinske centralnosti. Omenjena normalizacija omogoˇ ca primerjavo posameznikov oz. skupin razliˇ cnih omrežij. ˇ Clanek se osredotoˇ ca se na mero centralnosti in centralizacije skupine, ki temelji na stopnji, kot je predstavljeno v Krnc in Škrekovski (2020). Opišemo njegovo uˇ cinkovito izvajanje in preuˇ cimo obnašanje razliˇ cnih omrežij v resniˇ cnem svetu, glede na to mero. Ugotavljamo, da zelo majhne skupine, kot tudi tudi zelo velike, nimajo velike centralnosti — ko skupina raste, njena vrednost narašˇ ca, a na neki toˇ cki zaˇ cne upadati. Takšno unimodularno vedenje potrjuje naša analiza centralizacije skupinske stopnje šestih omrežij iz realnega sveta. Na koncu podamo nekaj izzivov za prihodnje delo. 1 Introduction In most networks some vertices are more central than the others. To model this intuitive feeling, centrality indices were introduced. The first mathematical concept of cen- trality of graphs was introduced almost 150 years ago by Jordan [13]. There are many ways to provide a measure of the relative “importance” of a node in a network, thus dif- ferent motivations lead to different centrality measures that were developed in several areas of science. A social network is typically represented as a graph, where individual persons or nodes are represented as ver- tices and the relationships between pairs of individuals as edges. In the paper, we will therefore freely interchange the terms vertex/node and graph/network, without any mean- ingful difference. Various vertex-based measures of cen- trality have been proposed to determine the relative impor- tance of a vertex within a graph. Arguably, the most common branch of centrality indices is based on the distance between the nodes of the net- work. Some of the standard centrality indices from this branch are degree, betweenness, closeness and eccentric- ity. Among other measures of node centrality, a few bet- ter known in network analysis are: eigenvector centrality, Google PageRank, Katz centrality, Alpha centrality, and others. For detailed definitions and discussions on various centrality indices, we refer the reader to [5, 1, 2, 14, 22, 23]. In large networks, measuring a vertex centrality with re- spect to the whole network is often not relevant, nor com- putationally feasible. In this sense, another concept of ver- tex centrality with respect to some subset of vertices has been introduced and studied throughout the last decade. The personalization, introduced in 2003 (see [25]), is a measure that shows how central an individual is accord- ing to a given subset R (group of important people) in a given social network. In 2005, the subgraph centrality [6] was introduced, which characterizes the participation of each node in all subgraphs in a network and is calcu- lated from the spectra of the adjacency matrix of the net- work. In the same year, Everett et al. [8] introduced the 86 Informatica46 (2022) 85–94 M. Karlovˇ cec et al. core centrality measure, where they evaluate the extent to which a network revolves around a core group of nodes. Fi- nally, very recently Bell [3] introduced the concept called the subgroup centrality, where centrality (of one vertex) is calculated only on a restricted set of vertices. Arguably, the most basic measure is the degree of a vertex, which we study in this paper. In 1999, Everett and Borgatti [7] introduced the concept of group centrality which enables researchers to answer questions such as “how central is the engineering depart- ment in the informal influence network of this company?” or “among middle managers in a given organization, which are more central, men or women?” With these measures we can also solve the inverse problem: given a network of ties among organization members, how can we form a team that is maximally central? In [7], the authors introduced group centrality for measures of degree, closeness and be- tweenness centrality, which we use in this paper. In 2006, Borgatti introduced an important group centrality measure (usually called KPP) that is motivated by the key players problem (see [4]). In his paper he focused on finding a set of vertices for the purpose of optimally diffusing infor- mation through the network by using selected vertices as seeds, or to maximally fragment the network by removing the key nodes. Interestingly, Borgatti claims that previously mentioned group closeness and betweenness are not proper tools to define KPP centrality. He therefore used tools such as graph fragmentation and information entropy to define KPP centrality. In his study, Freeman [10] realized that despite all of the vertex-centrality indices defined up to that point, there was a need for a normalization which could measure a relative importance of a given vertex in a network and would be based on any chosen centrality index. Hence, he defined a centralization measure based on normalized variance in vertex centrality of any chosen centrality measure, with an aim to allow a comparison of whole networks on the basis of their highest vertex-centralization scores. One may also consider his approach as another type of vertex-centrality which measures the extent of how some vertex in a net- work stands out from others in terms of a given centrality index. This is useful since it arguably allows us to compare the centralization scores of nodes that belong to different networks. In the same paper Freeman remarked that the centraliza- tions of degree centrality, betweenness centrality and close- ness centrality achieve their maximum if and only ifG is a star. The statement was later proved in detail by Everett, Sinclair and Dankelmann [9]. In order to compare central- ization values of graphs with different sizes, in the defini- tion of centralization, Freeman used a normalized formula, where the normalizing divisor is based on the theoretically largest centrality variance in any graph from a given class of graphs [10]. 1.1 Relatedwork Following Freeman’s approach, the group centralization notion was introduced in [16], where the authors stud- ied some extremal graphs regarding several group central- ity measures, including group degree centralization. They showed that the maxumum value of group degree central- ization is attained in a star graph, with centrality value of (k + 1) n 1 k+1 , see Proposition 2. Our work is related with the domination theory. Among the relevant papers Miyano et al. [21] discussed the prob- lem of finding the best group for the so called k-vertex maximumdominationproblem (ork-maxVD, in short). Al- though they claim thatk-maxVD is a new variant of vertex- domination problem, it is in fact equivalent to maximizing the group degree centrality (introduced 12 years earlier by Everett and Borgatti [7]), with the score further increased by a constantk. In the paper authors reduce the problem to a Maximum Coverage problem, which is nicely discussed and analyzed by V orha and Hall [24], or by Hochbaum and Pathria [12]. In [15] authors provide a conceptual description of a greedy procedure for estimating maximal k-group degree centralization values, for any value ofk. Those results are based on a greedy appriximation algorithm for MAX COV- ERAGE or MAX k-VERTEX DOMINATION problems (see [21, 12]). While this greedy approach gives approximation ratio of 1 1=e for both mentioned algorithms, it is shown in [15] that any such constant is not attainable for the k- group degree centrality. The historical overview of the above-mentioned results is illustrated in Table 1, while the approximability and time complexity of Algorithm 1, is described as follows. Theorem 1 (Krnc and Škrekovski [15]). Given a graph G,thegreedyalgorithmfork-groupdegreecentralityover all set sizes 1 k n altogether runs in linear time and achievesthek-groupdegreecentralityvalueofatleast (1 1=e)(w k),wherew isthemaximizingk-groupdegree centralityofG. The rest of this paper is structured as follows. In Prelim- inaries we provide notations and definitions from [16, 15] that we use. We give definitions for Freeman centralization of group degree centrality and centralization. And revisit some relevant results from the field. In the next section we develop an efficient greedy algorithm for finding a group with approximate maximal degree centralization over all group sizes k from 1 to n and altogether runs in linear time. We describe the procedure in detail and provide com- plexity analysis of the algorithm. In Section 4 we continue our exposition by describing the behaviour of the imple- mented algorithm when used on some real-world complex networks. We describe the experiments made with six net- works, ranging up to 3.9 million vertices and 16 million edges. We present the datasets used and discuss the results. In experiments we observe that, in the examples studied, there seems to be only one candidate for optimal value of k in each network. In other words, the shape of a function Evaluating Group Degree Centrality. . . Informatica46 (2022) 85–94 87 Combinatorial Optimization 1993 Vorha and Hall [24] Maximal covering location 1998 Hochbaum and Pathria [12] problems and analysis 2011 Miyano and Ono [21] Max k-vertex domination Network Analysis 1869 Jordan [13] Network centrality 1979 Freeman [10] Network centralization 1999 Everett and Borgatti [7] Group degree centrality 2004 Everett et al. [9] Extremal networks for degree centrality 2015 Krnc and Skrekovski [16] Group degree centralization 2020 Krnc and Skrekovski [15] Conceptual description of algorithm Present paper Big data empirical analysis Table 1: The milestones from the fields of Network Analysis and Combinatorial Optimization, leading to the present paper. of group degree centralization over a group size seems to contain only one evident local minimum. We conclude the paper with remarks and future work. 2 Preliminaries LetG n be a family of all graphs onn vertices, letG2G n be a graph onm edges, and letS V (G). According to [7], the group degree centrality is defined as GD G (S) = [ v2S N(v)nS : Given a graph G and integer k, let S k be one of the sets from V(G) k that achieves the maximum value of group degree centrality, i.e. GD G (S k ) = max S2( V (G) k ) GD G (S). Whenever the graphG is known from the context, we omit the subscript from the notations of centrality or centraliza- tion. According to [10, 16], GD 1 (S;G) stands for the group degree centralization. Define k :=jSj, and observe that GD 1 (S;G) is equal to P S 0 2( V (G) k ) (GD G (S) GD G (S 0 )) max H2Gn P S 00 2( V (H) k ) (GD H (S k ) GD(S 00 )) : (1) According to Freeman [10], the denominator is needed to efficiently normalize centralization to the interval [0; 1], for better relative comparison. Clearly GD 1 (S;G) is max- imized whenever GD(S;G) is maximized, and by Krnc et al. [16] we have that the maximum value of the denomi- nator corresponds to the star graph S n and a maximizing setS k corresponds to anyk-set containing the center of the star. In Fig. 1 the calculation of the group degree central- ization is demonstrated. Denote the maximizing group size dc(G) to be the posi- tive integer for whichS dc(G) achieves the maximum value of the group degree centralization, i.e. GD 1 (S dc(G) ;G) = max k2[n] GD 1 (S k ;G), and also denoteS :=S dc(G) . Let (G) be the minimum cardinality of any set that domi- nates a graph G (also known as the domination number). The notation ( G) stands for the highest vertex-degree, i.e. ( G) = max v2V(G) deg G (v). A functionf is said to be unimodal if it contains only a single local maximum in f. 2.1 Evaluatingdegreecentralization The goal of this section is to optimize the procedure of cal- culating the group degree centralization for a given graph and an input integerk. To this end we need to revisit certain important parts of its definition which will be later used in our procedures. In [16] authors describe the denominator of (1) in a closed form, as follows. Proposition2 (Krnc and Škrekovski [16]). LetGbeastar onn vertices with the centerc, and letS 2 V(G) k such thatc2S . Then X S 0 2( V (G) k ) [GD(S ) GD(S 0 )] = (k + 1) n 1 k + 1 : In order to compute the sum P S 0 2( V (G) k ) GD(S 0 ;G) from (1) efficiently, we need the following claim. Proposition 3 (Krnc and Škrekovski [15]). Let G be a graph on n vertices, and let k n be a positive integer. Itholdsthat P S 0 2( V (G) k ) GD(S 0 ;G)isequalto n n 1 k X v2V(G) n deg(v) 1 k : 88 Informatica46 (2022) 85–94 M. Karlovˇ cec et al. Figure 1: The Eiffel-tower graph and the calculation of its group degree centralization for groups of sizes one and two. Inparticular,thesum P S 0 2( V (G) k ) GD(S 0 ;G)canbecom- putedinO(n). The results from Propositions 2 and 3 can be joined to further develop (1), i.e. GD 1 (S;G) evaluattes to n k GD(S;G) + P v2V(G) n deg(v) 1 k n n 1 k (k + 1) n 1 k+1 ; (2) which can be computed inO(n) steps. It is easy to see that finding the best possible groupS that maximizes the group degree centrality (and hence the group degree centralization) can be computed inO(n k+1 ), traversing over all k-tuples and computing group degree centrality at each step. Note that, since an input integerk is in the exponent, this straightforward approach is far from polynomial. In fact it is easy to see that the mentioned problem is NP-hard. This can be done reducing a well-knownNP- problem of determining the existence of ak-dominating set to our problem of findingS k . Let us assume that there ex- ists a polynomial algorithm for finding ak-setS k V (G) such that GD G (S k ) = max S2( V (G) k ) GD G (S): Now observe that the existence of a k-dominating set is equivalent to the property GD G (S k ) =n k: As group degree centrality of a given fixed set S k can be computed in polynomial timeO(nk), it is clear that the setS k provides us an answer regarding the existence of a k-dominating set. This trivially implies: Proposition4. TheproblemthatdeterminesasetS k fora giveninputgraphGandanintegerk isNP-hard. In the last section we present an efficient algorithm that achieves the best possible linear-time approximation for calculating group degree centrality scores for all group sizes. 3 Algorithmicapproach For reasons described in Introduction, finding the group with the biggest degree centralization may be needed for many real-world networks, which can be very large in some cases (the largest network from the experiments contains 16 million edges). As shown in (2), calculating the group de- gree centralization for a given setS can be computed effi- ciently, while on the other hand, finding a maximizing set is NP-hard (see Proposition 4). In this section we present an efficient implementation and demonstration of calculating an estimate of the group degree centralization for a given network. In particular we implement and study in detail a greedy approximate algorithm presented in [15], for find- ing a group with maximal degree centralization. To calculate the group degree centralization efficiently, in [15] authors use a greedy algorithm for Maximum Cov- erage Problem which is a polynomial time (1 1=e)- approximation algorithm, see [12] or [21]. Here we de- scribe the implementation of such algorithm in more de- tail, while retaining the same time efficiency. Particularly, more emphasis is given on efficient calculation of group centralization, as not all details are described in [15]. An implementation of the procedure that calculates an approxi- mation for the group with the biggest degree centralization, for all meaningful group sizes, is given by Algorithm 1. Let us now describe the parts of the algorithm. 3.1 Algorithmdescription We start with k = 0 and in each step of the main while loop increment the size of group S. Every time when k increases, we add some greedily chosen vertex to the setS and remove it fromG while maintaining some dictionaries that we use (contribution andhistogram, in particular). Note that by the data structure of a dictionaryD we mean thatD is a set of keys with additionally defined valuesD [i] for eachi2D. In the first phase of Algorithm 1 (throughout lines 1–12), we construct graphG from an input graphG 0 by orienting Evaluating Group Degree Centrality. . . Informatica46 (2022) 85–94 89 Algorithm1 Finding a group with the maximal degree centralization Input: a graphG 0 . Output: a list centralization of group degree centralization scores, where centralization[i] is an approximation of max S2( V (G 0 ) i ) GD(S;G 0 ). 1: n jV (G 0 )j;S ; . Centrality variables initialization. 2: dominated ;;histogram ; 3: G a directed instance of graphG 0 4: forallv inV (G 0 )do 5: addv tohistogram[deg G 0(v)] 6: dominated[v] False 7: centralization ; . Centralization variables initialization. 8: A 1=(n 1) 9: C n=(n 1) 10: foralli2histogramdo 11: sum[i] 1=(n 1) 12: degDistribution[i] jhistogram[i]j 13: forall 0 k ndo . Main loop 14: v any vertex fromhistogram with the highest contribution 15: S S[v 16: k k + 1 17: GD GD+CONTRIBUTION(v) 18: forallu2N G (v)do 19: DECREASECONTRIBUTION(u) 20: forallu2N + G (v)do 21: DECREASECONTRIBUTION(u) 22: dominated[u] True 23: forallw2N G (u)do 24: DECREASECONTRIBUTION(w) 25: E(G) E(G) uw 26: G G v 27: UPDATECENTRALIZATIONVARIABLES() 28: centralization[k] AGD +B C . Computing centralization. 29: returncentralization 90 Informatica46 (2022) 85–94 M. Karlovˇ cec et al. every edge ofG 0 in both directions. We also initialize the starting values of all dictionaries and other variables. In the main loop (lines 13–28) we first chose the vertexv to add to our groupS, update variablesS andk accordingly (lines 14–16), and then calculate the group degree centrality for the increased setS (in line 17). Then, throughout the lines 18–26 we update the dictionary histogram, and also re- movev and some additional directed edges from the graph. Finally, in lines 27 and 28, we update some centralization variables and calculate the Freeman centralization of the centrality from the line 17. We now present some details of how we maintain some values and graph properties. For each group, we efficiently calculate the group de- gree centralization using Propositions 2 and 3. While the directed graphG that we work with is changing with each iteration, notice that the original instance of the original graphG 0 stays the same throughout the algorithm. LetS be the group of vertices whose group degree centralization we are calculating, and setk :=jSj. Note that we initially have deg G (v) = deg + G (v) = deg G 0(v) for all vertices in the network. Before the beginning of each iteration of the mainloop, the existence of a directed edgeuv means that – in the initial graphG 0 , we haveuv2E(G 0 ); – neither ofv;u is a member ofS, and – in the initial graphG 0 , vertexv is not connected with any vertex fromS, i.e.v = 2[ v2S N(v). For any vertexv2 V (G)nS, we define the contribution ofv to be the value GD (S[fvg;G) GD (S;G) and ob- serve that GD (S[fvg;G) GD (S;G) evaluates to ( deg + G (v); ifv is not dominated, deg + G (v) 1; otherwise. The calculation of the value of the contribution is done by a short functioncontribution (see Algorithm 3). As differ- ent nodes have various contributions, we define the dictio- naryhistogram, initialized in line 5 of Algorithm 1, where the keys are all possible values of contribution (for any key i, it clearly holds that 1 i ( G)), and the values are unordered sets of nodes with corresponding contribu- tions. While the dictionaryhistogram is initially indeed a degree histogram, we update it with each modification of variablesG orS. The goal of the algorithm is to calculate the value of (2) for each S. We implement this by intro- ducing variablesA;B;C;GD, each of them assigned to a different part of the expression (2), in particular, A = n (n k) (n k 1) ; GD = GD(S;G); B = (n k 2)! (n 1)! X v2V(G) (n deg(v) 1)! (n k deg(v) 1)! ; C = n n k 1 : Hence, for algorithmic purposes (see Propositions 2 and 3, and Eq. (2)), we may write GD 1 (S;G) =AGD +B C: Clearly, whenever the group S increases, the values of A;B;C;GD also change. To handle the change of variable B, we introduce dictionaries degDistribution and sum. The keys of both are all possible degrees of the vertices in G, and the values are defined as sum[i] = (n i 1)! (n k i 1)! (n k 2)! (n 1)! ; degDistribution[i] =jfv :deg G (v) =igj: Algorithm2 Updating variablesA;B;C andsum. 1: function UPDATECENTRALIZATIONVARIABLES 2: A n (n k)(n k 1) 3: C n n k 1 4: foralli insumdo 5: sum[i] sum[i] n i k 1 n k 2 6: B P i sum[i] degDistribution[i] Note thatsum needs to be refreshed with every change ofk. We first initialize both dictionaries before entering the main loop, and then we maintain their values by using the functionupdateCentralizationVariables (we treat these variables as global variables, therefore no parameters are needed). To avoid using big numbers, we update the value of sum[i] by just multiplying it by (n k i 1) (n k 2) whenever k increases by one. Using this, B can be calculated by a simple addition B = X i sum[i] degDistribution[i]; see line 6 of Algorithm 2. Note that while the graph is changing and the group S is increasing, the contributions of the remaining ver- tices also change. While the initial contribution of a ver- tex v is equal to its degree deg G (v), during the main loop the contribution of v may decrease by one several times. We handle these changes by defining a function decreaseContribution(v), see Algorithm 3. 4 Experiments In this section we describe the experiments made with six real-world networks. We present the datasets used and describe their domain. We implement the algorithm in C++ and Python with the help of the libraries “SNAP” and “SNAP.Py”, respectively. Finally, we discuss the results and establish the correlation of the results with the uni- modality law. Evaluating Group Degree Centrality. . . Informatica46 (2022) 85–94 91 Algorithm 3 Functions contribution and decreaseContribution. The former outputs the con- tribution of v while the latter refreshes the dictionary histogram whenever the contribution of v decreases by one. 1: function CONTRIBUTION(v) 2: ifv is dominatedthen 3: return deg + G (v) 1 4: else 5: return deg + G (v) 6: function DECREASECONTRIBUTION(v) 7: c CONTRIBUTION(v) 8: histogram[c] histogram[c]nfvg 9: histogram[c 1] histogram[c 1][fvg 4.1 Datasets Here we describe the datasets used for testing the degree centralization algorithm. We used six real-world networks ranging from several hundred thousand up to more than 16 million edges. Facebook is the smallest dataset in experiments contain- ing 4039 nodes and 88,234 edges. It is anonymized data collected by survey participants using a Facebook applica- tion with ten ego networks combined. The dataset was gen- erated to study social circles in ego networks [19]. Since it is generated by combining 10 ego networks, we expect its maximal centralization results to be relatively high. Cobiss dataset is a graph of scientific co-authoring of the complete national research database in Slovenia from 1970 to 2013. Two authors are connected if they publish at least one pa- per together. The graph contains 25,301 nodes and 316,587 edges. The dataset was generated by using the database maintained by ARRS (Slovenian Research Agency) and IZUM (Institute of Information Science, Maribor, Slove- nia). Twitter dataset contains a graph of followers, with 81,306 nodes and 1,768,149 edges. The dataset was col- lected from public sources for the purpose of analyzing the social circles [19]. Amazon dataset is a graph of fre- quently co-purchased products based on the Amazon web- site in June, 2003. The mentioned graph has 403,394 nodes and 3,387,388 edges and was generated for the study of viral marketing dynamics [17]. YouTube dataset con- tains a graph of user subscriptions. The graph contains 1,134,890 nodes and 2,987,624 edges. The dataset was prepared by Mislove et al. [20]. Patents dataset is a cita- tion graph of patents granted between 1975 and 1999. The graph contains 3,923,922 nodes (patents) and 16,522,438 edges (citations). The dataset was generated for the pur- pose of studying graph evolution [18], using the U.S. patent dataset maintained by the National Bureau of Economic Research [11]. 4.2 Experimentalresults Figure 2 shows the values of the centralization with dif- ferent sizes of the group, while Table 2 gives precise val- ues of results and optimal dataset sizes. In Cobiss network (Figure 2b) the maximal centralization is attained by a rel- atively small group size, and after that point adding mem- bers to the group causes a drastic decrease of the centraliza- tion. Amazon (Figure 2d) has a similar shape, but increas- ing and decreasing of centralization is less intensive. In YouTube network (Figure 2e), a relatively small group size has a high degree centralization. Further increase of the group size slowly increases centralization, which is maxi- mal only after the group dominates all the nodes. In Face- book network (Figure 2a) the maximal centralization is also achieved when the group dominates all the nodes. Patents network (Figure 2f) is similar to YouTube, but the maximal centralization is achieved before the nodes are dominated by the group. In Twitter (Figure 2c), network centraliza- tion increases up to the maximal point and then decreases with the same intensity. By its construction, the group degree centralization is comparable among various groups from different networks. Among the analyzed networks, one can observe that the biggest centralization score (0:9) is attained by the Face- book network, with the corresponding group of size 10. This result was expected, since the Facebook network was generated by combining 10 ego networks and the centers of the ego networks are the group members identified by our algorithm. YouTube and Twitter have relativity similar and high maximal centrality, which is obtained by 214,003 and 803 group members, respectively. Cobiss and Amazon graphs are in the middle range of our centralization experi- ments, while the lowest maximal centrality has the Patents network, which is also the largest and the sparsest network among our experiments. By looking at the centralization values for all networks in the experiments and thousands of different sizes of groups, we observed a correlation between the plots in Figure 2 and the unimodality law. Indeed, the shape of most of the plots in Figure 2 is unimodal, i.e. it is monotonically increasing up to a maximizing group size, while at the value dc(G) the plot begins to have a negative slope. As discussed above, this confirms the natural intuition regarding the group cen- trality scores, mentioned in unimodality law. While this property is certainly not present in all networks, as it is easy to find many graphs that do not follow the unimodality law, we believe that many of the real-world networks should in- deed have this unimodality property and that it should be studied further. 5 Discussion With respect to the important branches of the centrality the- ory mentioned above, we seek to identify the most ‘cen- tral’ group of nodes in a network, without assuming that we know in advance its cardinality. For a given networkG, 92 Informatica46 (2022) 85–94 M. Karlovˇ cec et al. (a) Facebook (b) Cobiss (c) Twitter (d) Amazon (e) YouTube (f) Patents Figure 2: The graphic representation of the Freeman centralization of the group degree centrality of six networks with different sizes of groups. G jV (G)j jE(G)j dc(G) GD 1 (S dc(G) ;G) GD(S dc(G) ;G) Facebook 4,039 88,234 10 0.905393 4029 Cobiss 25,301 316,587 204 0.654997 19635 Twitter 81,306 1,768,149 803 0.773615 78,811 Amazon 403,394 3,387,388 12,810 0.542647 320,133 YouTube 1,134,890 2,987,624 214003 0.777639 920,887 Patents 3,923,922 16,522,438 464,298 0.470009 3,105,485 Table 2: Some statistics and centrality results from our experiments. we mainly deal with three correlated questions: Q1. For a fixedk, whichk-subsetS of members ofG rep- resents the most influential group? Q2. Among all possible values ofk find the one for which the corresponding setS from Q1 is most influential. Q3. How to efficiently compute bothk andS from Q2 in real-world scenarios? Group centrality measures often have the property that, for a given groupS, one can find a personx = 2S, such that S[x has higher centrality score thanS. While this intuition applies to most of known group centrality indices, it does not provide the desired result, as the best group will in most cases correspond to the whole network. In the sense of Q2, it seems natural that groups of very small sizes should not be very influential, and we would expect the influence to grow while the size of the group increase. Stating this more explicitly, we have: Law of small groups. For most small groups S and any personx = 2 S, the extended groupS[x should be more centralthanS. However, at some point we would like to notice that the further increase of the group size starts decreasing its score in the sense of given group centrality measure, as the group is too big and consequently less tractable. The intuitive phenomena mentioned above is expressed by the following rules: Law of large groups. For most large groups S and any person x = 2 S, the extended group S[x should be less centralthanS. Of course above two laws are not meant to hold strictly, and are therefore stated in a rather mathematically non- precise way. Anyway, in order to capture both above laws, a shape of the plot of group centrality measure with respect to the group size should resemble the shape of unimodal Evaluating Group Degree Centrality. . . Informatica46 (2022) 85–94 93 functions, i.e. as the group S is growing, its value is in- creasing but, at some point it starts decreasing (see Prelim- inaries). Law of unimodality. A group centrality measure should havethepropertyofunimodality. We believe that the Law of unimodality should naturally hold for most of real-world networks, whenever a reason- able group measure is used. Again, the above law is not meant to hold strictly and is therefore not stated in a math- ematically precise form. Although the above mentioned laws seem very natural, we are not familiar with any group centrality mechanism to automatically deal with the groups of different sizes. It seems to us that the Freeman central- ization approach [16] can be accustomed to achieve this, and is hence encapsluated in this paper. We focus on a par- ticular type of group centralization, namely group degree centralization, as it is the simplest one to deal with. Our experiments on various real-world networks shows how the values of group degree centralization change with respect to the group sizes. In the experiments, the law of unimodality is observed, which suggests that the Freeman centralization of degree may be a reasonable measure to consider while solving Q2. 6 Conclusionsandfuturework With respect to the important branches of the centrality the- ory mentioned in Introduction, we seek to identify the most ‘influential’ group of nodes in a network, without fixing the group sizek in advance. While some of the related work re- garding group centrality measures (i.e. question Q1), is al- ready discussed in the Group centrality section above, there is (to our knowledge) no literature that would discuss the questions Q2 and Q3. The main problem is, that the values of two centrality measures may not be comparable, if the corresponding group cardinalities are not equal. Further- more, whileLawofsmallgroups applies to most of known group centrality indices, this is not the case with Law of unimodality. With regard to this, the Freeman centralization approach seems to be the only established method that efficiently normalize the result, so that the centrality scores can be compared in a meaningful way, regardless of their corre- sponding group cardinalities. For further work we propose the following. While in the paper we only study the group degree centralization, one may study group centrality measures of some other central- ity indices such as betweenness, closeness or eccentricity. In particular, it would be very interesting to verify whether for real-world networks the centralization variants of those proposed measures satisfy the above discussed laws, as well as the unimodality property. We study the Freeman centralization approach, as it seems the only established procedure of normalization, compatible with Q2. Alterna- tively, one could put aside the Freeman centralization and consider a different type of normalization for group cen- trality measures, that could preferably be more efficient to calculate. References [1] Bavelas, A.: A mathematical model for group structures. Hum. Organ. 7(3), 16–30 (1948), https://doi.org/10.17730/humo.7.3. f4033344851gl053 [2] Bavelas, A.: Communication patterns in task-oriented groups. J. Acoust. Soc. Am. pp. 725–730 (1950), https://doi.org/10.1121/1.1906679 [3] Bell, J.R.: Subgroup centrality measures. Network Science 2(02), 277–297 (2014), https://doi. org/10.1017/nws.2014.15 [4] Borgatti, S.P.: Identifying sets of key players in a social network. Comput. Math. Organ. Theory 12(1), 21–34 (2006), https://doi.org/10. 1007/s10588-006-7084-x [5] Brandes, U., Erlebach, T.: Network analysis: method- ological foundations, vol. 3418. Springer Science & Business Media (2005), https://doi.org/10. 1007/b106453 [6] Estrada, E., Rodriguez-Velazquez, J.A.: Subgraph centrality in complex networks. Physical Review E 71(5), 056103 (2005), https://doi.org/10. 1103/physreve.71.056103 [7] Everett, M.G., Borgatti, S.P.: The centrality of groups and classes. J. Math. Sociol. 23(3), 181–201 (1999),https://doi.org/10.1080/ 0022250x.1999.9990219 [8] Everett, M.G., Borgatti, S.P.: Extending central- ity. Models and methods in social network analy- sis35(1), 57–76 (2005),https://doi.org/10. 1017/cbo9780511811395.004 [9] Everett, M.G., Sinclair, P., Dankelmann, P.: Some centrality results new and old. J. Math. Sociol.28(4), 215–227 (2004), http://doi.org/10.1080/ 00222500490516671 [10] Freeman, L.C.: Centrality in social networks conceptual clarification. Social Networks 1(3), 215–239 (1979),https://doi.org/10.1016/ 0378-8733(78)90021-7 [11] Hall, B.H., Jaffe, A.B., Trajtenberg, M.: The NBER Patent Citation Data File: Lessons, Insights and Methodological Tools. NBER Working Papers 8498, National Bureau of Economic Research, Inc (2001), https://doi.org/10.3386/w8498 94 Informatica46 (2022) 85–94 M. Karlovˇ cec et al. [12] Hochbaum, D.S., Pathria, A.: Analysis of the greedy approach in problems of maximum k- coverage. Naval Research Logistics 45(6), 615– 627 (1998), https://doi.org/10.1002/ (SICI)1520-6750(199809)45:6<615:: AID-NAV5>3.0.CO;2-5 [13] Jordan, C.: Sur les assemblages de lignes. J. Reine Angew. Math70(185), 81 (1869),https://doi. org/10.1515/crll.1869.70.185 [14] Koschützki, D., Lehmann, K.A., Peeters, L., Richter, S., Tenfelde-Podehl, D., Zlotowski, O.: Cen- trality indices. In: Network analysis, pp. 16–61. Springer (2005),https://doi.org/10.1007/ 978-3-540-31955-9_3 [15] Krnc, M., Škrekovski, R.: Group degree cen- trality and centralization in networks. Mathemat- ics8(10) (2020),https://doi.org/10.3390/ math8101810 [16] Krnc, M., Škrekovski, R.: Group centralization of network indices. Discrete Appl. Math. 186, 147–157 (2015), https://doi.org/10.1016/j.dam. 2015.01.007 [17] Leskovec, J., Adamic, L.A., Huberman, B.A.: The dynamics of viral marketing. ACM Trans. Web 1(1) (2007), https://doi.org/10.1145/ 1134707.1134732 [18] Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over time: Densification laws, shrinking diam- eters and possible explanations. In: Proceed- ings of the Eleventh ACM SIGKDD Interna- tional Conference on Knowledge Discovery in Data Mining. pp. 177–187. KDD ’05, ACM, New York, NY , USA (2005), https://doi.org/10. 1145/1081870.1081893 [19] Leskovec, J., Mcauley, J.J.: Learning to discover so- cial circles in ego networks. In: Bartlett, P., Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 25, pp. 548–556 (2012),https://doi.org/10. 1145/2556612 [20] Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Measurement and Analysis of Online Social Networks. In: Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC’07). San Diego, CA (2007), https://doi. org/10.1145/1298306.1298311 [21] Miyano, E., Ono, H.: Maximum domination prob- lem. In: Proceedings of the Seventeenth Computing: The Australasian Theory Symposium-V olume 119. pp. 55–62. Australian Computer Society, Inc. (2011) [22] Proctor, C.H., Loomis, C.P.: Research Methods in So- cial Relations, chap. Analysis of sociometric data, pp. 561–586. Dryden Press, New York (1951) [23] Seeley, J.R.: The net of reciprocal influence; a prob- lem in treating sociometric data. Can. J. Psychol. (3), 234–240 (1949),https://doi.org/10.1037/ h0084096 [24] V ohra, R.V ., Hall, N.G.: A probabilistic analy- sis of the maximal covering location problem. Discrete Applied Mathematics 43(2), 175– 183 (1993), https://doi.org/10.1016/ 0166-218x(93)90006-a [25] White, S., Smyth, P.: Algorithms for estimating relative importance in networks. In: Proceedings of the ninth ACM SIGKDD international confer- ence on Knowledge discovery and data mining. pp. 266–275. ACM (2003), https://doi.org/10. 1145/956750.956782