Informatica 40 (2016) 353–364 353 
 
Tie Persistence in Academic Social Networks 
Djamila Mohdeb, Adelhak Boubetra and Mourad Charikhi 
Department of Computer Science, University of Bordj Bou Arreridj, Bordj Bou Arreridj, Algeria 
E-mail: djamila.mhb@gmail.com, {boubetraabd, mcharikhi}@yahoo.fr 
 
Keywords: link persistence, link prediction, link strength, social network evolution, academic social network 
Received: May 12, 2016 
This paper attempts to shed light on the importance of some social academic-related factors in 
determining the strength of links in academic social networks. Our purpose is to assess the extent to 
which the frequency of the tie, the academic closeness between its actors, and the scientific 
contributions of the actors in the tie can affect the scientific collaboration relationship between them. 
We propose a model that relies on this three link strength indicators in order to predict the tie 
persistence in academic social networks. We experimented the model on a social network extracted from 
the DBLP computer science bibliographic network. We compared the output of the model with that of 
the link prediction baseline methods.  The results show better performance of the proposed model. 
Povzetek: Prispevek analizira vpliv socialnih povezav v omrežjih na akademski uspeh s pomočjo DBLP.  
 
1 Introduction 
The investigation of academic networks is increasingly 
an important topic in the area of social networks mining. 
Comprehending these complex networks is important to 
understand the trends of knowledge production through 
the world. A typical academic network contains a set of 
multi-typed entities (scientists, papers, journals, 
institutions…etc.) linked by a set of multi-typed 
associations (Figure 1-a). The collaboration network is 
the mainly used social projection of the scientific 
academic network. It consists of a set of nodes 
representing scientists, and a set of links representing 
collaboration relations between nodes. Frequently, 
researchers use co-authorship relations to construct 
collaboration networks as they denote formal cooperation 
between scientific actors. A collaboration network is 
composed by connecting every set of authors who share 
the same publications (Figure 1-b). This type of networks 
exhibits in general the same characteristics as social 
networks. They are of “small world” type, where the 
clustering coefficient, which describes the transitivity in 
the network, is high. As a result, the average distance 
between any two scientists in the network is short, and it 
does not usually exceed five or six degrees [33]. They are 
also scale free following a power law in several node 
properties and their structures are affected by the 
preferential attachment phenomenon [18, 37]. 
Studying the evolution and the dynamics of collaboration 
networks remains a continuing concern in social 
networks mining since the advances of science depend 
crucially on this type of interactions between scientists 
[23]. Studies in this field focus on the analysis of the 
observed changes in the network structure caused by 
both the links and the nodes. Among link analysis tasks, 
the link prediction problem [28] is one of most studied 
subjects in link mining literature. A link prediction model 
attempts to predict the appearance, the persistence, and 
the disappearance of a social network links relying on 
some of its given snapshots in the past. However, in this 
paper, we do not address the entire link prediction 
problem but only the sub-question that concerns the 
driving factors behind the persistence of the ties in 
academic social networks. The tie persistence seems to 
be an occasionally studied problem despite its 
importance. This importance is related mainly to the 
existence of a minority of nodes and links that persists 
always in spite of the rapid dynamicity of the network 
overtime. Identifying the driving factors behind the 
structure persistence is as important as identifying the 
driving factors behind the structure evolution. Thus, this 
work attempts to resolve the link persistence problem 
using a link strength based technique that can measure 
the collaborative importance of the existent collaboration 
relationships in the network.  This technique relies on 
three strength indicators that have been proposed in the 
social psychology literature [8, 34]: the frequency of 
interactions between the actors, their contributions in the 
relation, and the social closeness between them. 
Furthermore, the possible validity of the important 
relation is verified according to two relevant academic-
related attributes that mostly must be taken into account 
in the context of scientific collaborations: the scientific 
productivity of the relation and the professional rank 
(status) of the scientists involved within. Our proposed 
tie persistence prediction model combines these link 
strength indicators to assess the strength of the scientific 
collaboration relationships between researchers in order 
to identify the persistent ties in a dynamic and time-
varying academic social milieu.   
354 Informatica 40 (2016) 353–364 D. Mohdeb et al.  
 
 
The remaining part of the paper is organized as follows: 
we begin with a brief overview of the previous research 
in the area. Then we explain our methodology for 
predicting the persistence of collaboration relations. We 
continue by presenting the performed experiments to 
validate the proposed model. Next, we report the findings 
of the research and discuss their implications. Then, we 
investigate the influence of the parameters of the model 
on its performance. Finally, we conclude with a brief 
summary of the findings and some suggestions for 
further research. 
2 Related work 
Social networks evolution problem addresses the 
question on how a social network evolves over time. 
Consequently, several sub-questions rise from this 
problematic. The most important are about the laws that 
govern the evolution and the factors that influence it. In 
this context, the study of the evolution of academic social 
networks has been a popular research topic in these 
recent few years. In the literature, the evolution of 
academic social networks may be analyzed on two 
levels: the macro level (the entire network) and the micro 
level (the simplest components of the network) [3]. Our 
work focuses on understanding the micro-level changes 
at the actor level. Specifically, it aims to predict the 
persistence of a tie between two nodes. This issue is a 
sub-question of the well-known link prediction problem, 
which addresses predicting the new links that join a 
social network in a given future time. Naturally, the link 
persistence issue is not independent from link prediction.  
This is for the reason that an actor’s future links (which a 
link prediction model tries to predict) may incorporate 
also the old links that continue to be present in the future.  
The earliest studies on the link prediction are that 
proposed by Adamic and Adar [1] and by Liben-Nowell 
and Kleinberg [28] for social networks. They proposed 
unsupervised models basing on computing similarity 
scores between the network nodes using graph-based 
similarity measures that rely mainly on the topological 
structure of the network or on the node attributes. Later, 
the work of Hasan et al. [20] has argued for the 
effectiveness of the supervised models rather than the 
unsupervised ones. In addition to these two classical 
approaches, researchers have developed several  models 
following various paradigms that include for example 
similarity-based models, feature-based models, 
probabilistic models, relational models, graphical 
models, linear algebraic models; and random walks 
based models [4, 16].  An in-depth survey of these 
approaches may be found in [21]. The link prediction 
models can obviously predict the link persistence 
between two nodes by restricting the model application 
to only direct (1-hop) neighbors. However, since the 
main concern of the link prediction problem is to predict 
the new relations and not the repeated ones, so predicting 
persistent links using a link prediction model may run the 
risk of providing modest results. Therefore, it will be 
interesting to develop independent models in which the 
only goal is to identify factors that drive the persistence 
of the tie between two nodes.  
In this regards, social psychology literature provided 
important evidence about the various factors that 
influence the link persistence and decay. These factors 
are mostly: structural embeddedness (common 
acquaintances) [11, 14, 32], homophily [11, 32], social 
support [38], frequent contact (interaction) [38], social 
closeness [38], distance [32], status, and experience [11]. 
Moreover, there is some evidence on the “liability  of  
newness”,  which means  that newly  formed ties  tend to 
decay  more  quickly  than  old-timer ties [11]. On the 
other hand, there are few models that tried to treat the 
link persistence prediction problem. For instance, 
Hidalgo et al. [22] used a rule-based technique to predict 
the tie persistence in mobile phone social networks 
relying on their observations about the correlation 
between network topological variables (degree, 
clustering, reciprocity and topological overlap) with the 
tie persistence. Akoglu and Dalvi [2] proposed a logistic 
regression-based model for tie persistence prediction in 
large phone and SMS networks, which takes into account 
the fact that node and link attributes like neighborhood 
overlap, reciprocity, clustering coefficient, and node 
degree affect the link persistence between the actors. 
 
Figure 1: An example of an academic social network (b) extracted from a bibliographic network (a) (Authors (A) 
who share the same papers (P) in the bibliographic bipartite graph (a) are connected with co-authorship links in 
the academic social network (b)). 
Tie Persistence in Academic Social Networks  Informatica 40 (2016) 353–364 355 
 
Using decision tree and logistic regression based models, 
Raeder et al. [35] demonstrated that  persistent  ties in a 
cell-phone  network  are  those  characterized  by  high-
levels of interaction frequency coupled with relatively 
constant re-activations of  the  tie  overtime. On the 
contrary, ties that are candidates to decay,  are  
characterized  by  relatively  low  levels of  interaction  
and  non-reciprocity. Apart from mobile phone social 
networks, Kirvan-Swaine et al. [26] studied the tie decay 
in the online social network of Twitter. Their findings 
revealed that reciprocity, embeddedness, power, and 
status influence significantly the tie breaking between 
follower-followee links in the online social network. 
Differently from these studies, this paper proposes an 
unsupervised model for the tie persistence prediction in 
academic social networks. The proposed approach 
combines academic-related tie and node attributes to 
estimate the strength of relations between scientists. The 
model then reckons on its expectation about the possible 
scientists’ collaboration preferences to validate or 
invalidate the collaborative importance of relations and 
their probable continuity in the future. 
 
Algorithm 1. Tie persistence prediction model 
  Input  𝑠 : source author 
  Output  𝐼𝑚𝑝(𝑡)  collaborative importance score of the target author 𝑡 
For each author 𝑠 in the network do 
     Extract bylines of all publications of 𝑠 
     Calculate the  collaborative importance of each byline 
     For each co-author 𝑡 belongs to the publications bylines of 𝑠 do 
          Calculate the collaborative productivity of the relation between  𝑠 and 𝑡 
          If “Productive collaboration” then  
            Assign 𝑡 the collaborative importance of the publication byline he belongs to 
          Else  
             If the professional rank (academic age) of 𝑠 equals to "𝑆𝑒𝑛𝑖𝑜𝑟" then 
             collaborative importance of 𝑡   𝐼𝑚𝑝(𝑡) = 0            // 𝑠 has no need to 𝑡 
             End if  
          End if 
     End for 
End For 
3 Tie persistence prediction model 
The strength of social links may be estimated using 
various strength indicators that can reflect the different 
dimensions of a given relationship. In the case of 
academic links, the strength may be better captured using 
indicators that are related to the knowledge production 
since the knowledge production represents the ultimate 
goal of academic collaboration relations [23]. 
To predict the persistence or the dissolution of an 
existent collaboration relationship between two 
scientists, our proposed model follows two steps (see 
Algorithm 1). First, it measures the collaborative 
importance of the existing relations of a given scientist 
using three strength indicators: the frequency of the 
relation, the contribution of the concerned actor within, 
and the social-academic closeness between its actors. 
Second, the model decides to retain or to terminate the 
existing relation depending on its expectation about the 
behavior of the concerned author toward this relation 
given its collaborative importance to him. Formally 
saying for each author 𝑎, we collect the publication 
bylines from his papers. The publication byline is the list 
of 𝑛 authors who have co-written a given paper 𝑝. For 
each set of authors in each paper, the model measures the 
collaborative importance of the relation according to the 
author 𝑎. Given the collaborative importance of the 
relation, the model then verifies its collaborative 
productivity and checks the professional rank of the 
author in order to decide finally whether it is better to 
keep or to terminate the existing relation. 
Therefore, if a given relation passes the two steps 
successfully, every co-author who belongs to its related 
publication byline will take the collaborative importance 
value of the publication byline; otherwise, the model will 
suppose the inutility of a future collaboration relation 
between the concerned co-authors. Below, we give a 
detailed explanation of the model. 
3.1 Computing collaborative importance 
 
𝐼𝑚𝑝(𝑝𝑏𝑙, 𝑎) = (𝑓𝑟𝑒𝑞(𝑝𝑏𝑙)/𝑛𝑏𝑟𝑃𝑢𝑏(𝑎))  ∗
(𝑐𝑜𝑛𝑡𝑟𝑖𝑏(𝑎, 𝑝𝑏𝑙)/𝑓𝑟𝑒𝑞(𝑝𝑏𝑙))  ∗ (𝑐𝑙(𝑝𝑏𝑙)/𝑓𝑟𝑒𝑞(𝑝𝑏𝑙) )          
(1) 
Where:  
- 𝑎 is the author, 𝑝𝑏𝑙 is the publication byline to 
which an author 𝑎 belongs. 
- 𝑓𝑟𝑒𝑞(𝑝𝑏𝑙) is the number of times the author 𝑎 
has published papers having the byline 𝑝𝑏𝑙. 
- 𝑛𝑏𝑟𝑃𝑢𝑏(𝑎) is the total number of publications 
of the author 𝑎. 
- 𝑐𝑜𝑛𝑡𝑟𝑖𝑏(𝑎, 𝑝𝑏𝑙) is the contribution of the author 
𝑎 in the paper in comparison with his co-authors 
in the publication byline 𝑝𝑏𝑙.  
- 𝑐𝑙(𝑝𝑏𝑙) is the social-academic closeness factor 
of the publication byline 𝑝𝑏𝑙. 
356 Informatica 40 (2016) 353–364 D. Mohdeb et al.  
 
 
3.1.1 Frequency 
The frequency [8, 34] is an intuitive indicator of the link 
strength. It represents the number of times a set of 
authors have participated to the publication of the same 
papers. A high value of frequency of a publication byline 
indicates some trust between its members.   
3.1.2 Social-academic closeness factor 
The closeness [8, 34] encompasses a wide variety of 
meanings characterizing the social proximity between 
actors in social networks. To estimate this proximity, 
relationship scholars have conceptualized multiple 
measures such as RCI (Relationship Closeness 
Inventory) [7], IOS (Inclusion-of-Other-in-Self Scale) 
[3], and URCS (Unidimensional Relationship Closeness 
Scale) [13]. These measures are not deterministic models 
but scoring systems relying on questionnaires attempting 
to capture the various dimensions of the relationship. 
In [3], Aron et .al (the developers of IOS measure) 
postulated that in close relationship “people are 
motivated to include another in the self in order to 
include that other’s resources”. These resources may be 
anything that can “facilitate the achievement of goals”. 
Obviously, in academic social networks, the knowledge 
is that valuable resource a scientist hope others will share 
with him/her.   Co-authored publications characterize 
scientific relations, but the type of publications may 
reveal the social-academic closeness between the actors 
of these relations. The concept of closeness in our model 
is oriented to estimate mainly the familiarity between the 
collaborators. Therefore, we suppose that a book type 
publication is more important than a journal paper type, 
and a journal paper type is more important than a 
conference paper type. This is based on the relevance of 
the “book” type as the most valuable publication and on 
previous observations [17, 18] that have shown that 
authors sharing journal papers are professionally and 
socially closer than authors sharing common conference 
papers. This is for the reason that journal papers have a 
much higher impact than conference papers as they 
receive more citations [17]. In addition, a relevant work 
requires more time to be produced and the relative length 
of the time spent in the publication production may 
multiply the chance of familiarity between the paper’s 
co-authors.  
Formally, the social-academic closeness factor for a 
given publication having a byline 𝑝𝑏𝑙 is expected to 
respect the constraint: 
𝑐𝑙(𝑝𝑏𝑙, 𝑡𝑦𝑝𝑒_𝑝𝑢𝑏 =  “𝑏𝑜𝑜𝑘”) >  𝑐𝑙(𝑝𝑏𝑙, 𝑡𝑦𝑝𝑒_𝑝𝑢𝑏
= “𝑗𝑜𝑢𝑟𝑛𝑎𝑙 𝑝𝑎𝑝𝑒𝑟”)
>  𝑐𝑙(𝑝𝑏𝑙, 𝑡𝑦𝑝𝑒_𝑝𝑢𝑏
=  “𝑐𝑜𝑛𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑝𝑎𝑝𝑒𝑟”) 
 
 Estimating the social-academic closeness 
from the type of publication 
We use in our model a scoring system bit similar to 
psychological measures described above in order to 
assess the social-academic closeness between publication 
co-authors. First, we construct an ordered list arranging 
publications types according to their relevance 𝐿 =
{1: 𝑏𝑜𝑜𝑘, 2: 𝑗𝑜𝑢𝑟𝑛𝑎𝑙 𝑝𝑎𝑝𝑒𝑟, 3: 𝑐𝑜𝑛𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑝𝑎𝑝𝑒𝑟} 
Then, we penalize a given type of publication by 
discounting from its initial default value 𝑉 a portion 
equals to 𝜃 (𝜃 is a model parameter) multiplied by the 
order of the publication type in the arrangement list of 
publication types 𝐿. 
 
𝑐𝑙 = 𝑉 − (𝑘 − 1) ∗ 𝜃      (𝑘 >= 1,   (𝑘 − 1) ∗ 𝜃 ≤ 𝑉)   
(2) 
 
- 𝑉 is the default value of publication. It is 
estimated to be 𝑉 = 1. 
- 𝜃 a regular portion the value of publication 𝑉 
loses by the degradation from a publication type 
to another. 
- 𝑘 is the order of the type of publication in the 
arrangement list. 
3.1.3 Author contribution in the relation 
The investment in the relation is another relevant 
strength indicator proposed in [8, 34]. Contribution is a 
domain-specific concept that can take different meanings 
according to the context it is used in. In academic social 
networks, the contribution of a scientist in a collaboration 
relation can be reflected in the credit that he deserves in 
the related publication in comparison with his co-authors. 
The proposed model estimates this credit using the 
Network-Based Allocation (NBA) model of co-
authorship credit proposed by Kim and Diesner [24].  
The NBA model uses the order of the author in the 
publication byline in addition to the length of the author 
list involved in to calculate his final credit. Noting that in 
many research fields, the order reveals reliable 
information about the contribution of the author in the 
publication with the exception of some disciplines such 
as Mathematics, Economics or High Energy Physics, 
which follow in their publications alphabetical order of 
authors [12, 24]. 
The NBA model is flexible in partitioning the credit 
between the co-authors of a given paper. It is based on 
the idea that each author belonging to a publication 
byline of length 𝑁 and having an initial co-authorship 
credit equals to 𝑣, distributes a portion of his credit 
(equals to 𝑣𝑡) in equal amounts to his preceding authors 
on the byline. We can calculate final credits for each 
coauthor as follows: 
 
𝑣 = 𝑉/𝑁                                                         (1 ≤ 𝑉, 2 ≤ 𝑁) 
𝑣𝑡 = 𝑑 ∗ 𝑣                                                             (0 ≤ 𝑑 ≤ 1) 
{
  
 
  
 𝑣𝑟
𝑁 = 𝑣 + 𝑣𝑡 ∑1 𝑁 − 𝑛    ⁄
𝑁−𝑟
𝑛=1
                    (𝑟 = 1, 2 ≤ 𝑁)   
𝑣𝑟
𝑁 = (𝑣 − 𝑣𝑡) + 𝑣𝑡 ∑1 𝑁 − 𝑛⁄
𝑁−𝑟
𝑛=1
   (1 < 𝑟 < 𝑁, 2 ≤ 𝑁) 
𝑣𝑟
𝑁 = 𝑣 − 𝑣𝑡                                                (𝑟 = 𝑁,   2 ≤ 𝑁)  
 
 
(3) 
Where: 
Tie Persistence in Academic Social Networks  Informatica 40 (2016) 353–364 357 
 
- 𝑣 is the initial co-authorship credit given to each 
author. 
- 𝑉 is the value of the paper (assumed equals to 
1), 𝑁 is the number of the authors on a paper. 
- 𝑣𝑡 is the transferable credit, calculated by 
assigning a distribution factor 𝑑 ∈ [0,1] to the 
initial co-authorship credit 𝑣. The distribution 
factor 𝑑 is the ratio of initial credit that should 
be distributed by each coauthor. 
- 𝑟 is the order of authors 
 
Co-writers deserve equal co-authorship credits in a given 
publication if 𝑑 = 0. If 𝑑 = 1 this means that, the first 
author have the higher possible value of contribution in 
the publication and the role of non-first authors is 
negligible. 
3.2 Predicting scientist’s collaboration 
preferences 
A collaboration relationship between two scientific 
actors becomes subject to some academic reckonings to 
be continued or terminated even if it seems strong. These 
reckonings are mainly related to the academic attributes 
of the author and the effect of the output of this relation 
on his scientific career. Our approach assumes that the 
persistence of an important collaboration relation 
between an author and his coauthor depends on two 
relevant academic-related factors: the professional rank 
of the author (status) and the collaborative productivity 
of the relationship. Relying on early observations [15, 
30], our model assumes that a newcomer or a junior 
researcher needs to conserve his important relations 
despite its unproductivity for the reason that his rank as a 
beginner obligates him to develop his coauthors network 
by exploiting these important relations for his benefit. By 
contrast, an experienced researcher has always the 
possibility to terminate any unproductive relation that 
cannot offer him a scientific advantage. 
3.2.1 Author professional rank 
The professional rank or the status of an author is related 
to his scientific and professional career. It naturally 
influences the collaboration preferences [5, 9] since the 
collaboration choices of an experienced scientist differs 
widely from the collaboration choices of a novice 
scientist. The reason behind this is the relatively large or 
small scientific network that a senior or a beginner 
scientist have respectively. 
3.2.2 Collaborative productivity 
The productivity rate is an essential factor to validate the 
importance of a collaboration relation. Nevertheless, 
considering this factor differs according to the academic 
professional experience of the scientist [5, 9]. The 
proposed model considers a co-authorship relation as a 
“productive collaboration” if the number of 
collaborations between the author and the coauthor 
equals to at least the median of the duration of their 
relationship. Noting that the duration is a useful tie 
strength indicator to estimate the strength of links 
between actors in social networks [8, 34]. 
 
𝑅 (𝑎, 𝑐)  𝑖𝑠 𝑎 ”𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑣𝑒 𝑐𝑜𝑙𝑙𝑎𝑏𝑜𝑟𝑎𝑡𝑖𝑜𝑛” 
⇒  𝑛𝑏𝑟𝐶𝑜𝑙𝑙𝑎𝑏 (𝑎, 𝑐) ≥ (𝑑 + 1)/2 
(𝑑 = 𝑡𝑙 − 𝑡𝑓) 
(4) 
Where: 
- 𝑛𝑏𝑟𝐶𝑜𝑙𝑙𝑎𝑏 (𝑎, 𝑐) is the number of 
collaborations between the author 𝑎 and his 
coauthor 𝑐. 
- 𝑑 is the duration of the relationship between 𝑎 
and 𝑐. 
- 𝑡𝑙 is the time (in year) of the last collaboration 
between the author and the coauthor. 
- 𝑡𝑓 is the time (in year) of the first collaboration 
between the author and the coauthor. 
4 Experiments 
4.1 Dataset 
To demonstrate the performance of our approach, data 
are extracted from the well-known DBLP Computer 
Science Bibliography database, a huge digital library 
from the University of Trier, which covers publications 
in various computer science fields.  
We selected randomly from the « DBLP » database a set 
of 2250 authors from different research areas in 
computer science who appeared between year 1993 and 
2008. After that, we equally divided this subset into three 
author sets basing on the academic age of the authors in 
the DBLP bibliographic network. We measured the 
academic age of a scientist as the number of years since 
his first publication. The academic age obtained from the 
DBLP do not exactly reflect the professional rank of the 
author but can offer a hint about his experience (except 
for cases where an author’s publications are not indexed 
by DBLP).  
Therefore, we had the following sets: 
- The Newcomers set: authors with an academic 
age less than six years. 
- The Juniors set: authors with an academic age 
between six and ten years. 
- The Seniors set: authors with an academic age 
greater than ten years. 
Table 1 summarizes dataset statistics. 
Table 1: Dataset statistics. 
 
N E S Avg. C 
Seniors 18 834 62 550 750 26.54 
Juniors 17 699 57 655 750 22.68 
Newcomers 17 076 56 417 750 21.78 
 
53 609 176 622 2250 23.66 
N, E: number of nodes and edges in the full network, S: 
number of source authors, Avg. C: average number of 
co-authors per source 
 
358 Informatica 40 (2016) 353–364 D. Mohdeb et al.  
 
 
We draw the reader’s attention to the fact that the quality 
of our data can be affected  by the performance of the 
method used by DBLP to resolve the author name 
disambiguation problem [27]. Relying on the recent 
study of Kim and Diesner [25], the value ranges of the 
findings may vary if we use a different method for name 
disambiguation. Fortunately, as the latter study 
confirmed, this may not have a distortive effect on the 
general trend of the network evolution on which our 
findings depend. 
4.2 Experimental setup 
First, for learning the link persistence prediction 
model, we formed a general network combining all the 
co-authorship networks from 2003 to 2014 that 
correspond to the scientists belonging to the three author 
subsets. Let an author pair be (𝑠, 𝑡), we call 𝑠 the source 
author, and 𝑡 the target author. The source authors are 
those who belong to the three author sets mentioned 
above (newcomers, juniors, and seniors). The target 
authors are the direct neighbors (i.e. 1-hop neighbors) of 
the source authors. Then, we chose the sub-network data 
between 2003 and 2008 as training set, and the sub-
network data between 2009 and 2014 as testing set. 
Second, the general parameters that we used for framing 
the link persistence model are the following: 
- The professional rank of an author was 
calculated according to his academic age. 
- As in the default setting of NBA co-authorship 
credit model [24], we assumed 𝑑 =  0.5 as the 
ratio of the initial credit that should be 
distributed by each author belonging to the 
publication byline of a given paper (Eq. 3). The 
advantage of this setting is maximizing the 
contribution of first authors as well as avoiding 
neglecting the contribution of non-first authors.  
- For the social-academic closeness factor in Eq. 
2, we assumed 𝜃 = 0.25 the ratio of the initial 
value of the publication an author loses 
according to the type of the publication. As 
such, the social-academic closeness factor is 
1,3/4,1/2 for the publication types: book, 
journal paper, and conference paper, 
respectively. We recall that our estimation of the 
social-academic closeness factor is based on a 
simple intuitive scoring system because we 
cannot exactly measure this value due to its 
psychological complex nature. 
4.3 Evaluation framework 
In order to show the effectiveness of our tie persistence 
prediction method for social academic networks, we 
compared its performance with the baseline methods 
used for the link prediction problem since they can also 
measure link persistence.  The baseline methods 
considered are Common Neighbor (CN), Jaccard's 
Coefficient (JC), Adamic/Adar (AA), Preferential 
Attachment (PA), and Page Rank (PR). Formal 
descriptions of these methods are illustrated in Table 2. 
Common Neighbor [19] is a simple metric that counts the 
number of shared neighbors (i.e. the number of paths of 
length 2) between two nodes. The Jaccard's coefficient 
[36] divides the common neighbors of a pair of nodes by 
the size of the union of their neighbors. The 
Adamic/Adar measure [1] weighs the rarer common 
features more heavily. These three metrics are related to 
the positive impact of the common acquaintances 
(structural embeddedness) on the tie formation and 
persistence between social actors. From the perspective 
of the tie persistence problem, they provide information 
about the social and scientific circles where a scientist 
moves. It is then reasonable to assume that, if two related 
scientists deal with the same scientific entourage, it is 
likely that their relation persists. The Preferential 
attachment [6] of two nodes is the product of their 
degrees. In our context, it is used to assume that a 
scientist tend to keep relations with highly connected 
scientists who have a better status [9, 12]. The PageRank 
algorithm [10] ranks the node proportionally to the 
probability that it will be attained through a random walk 
on the network. 
Table 2: Link prediction baseline metrics. 
Metrics Description 
Common 
Neighbor (CN) 
𝐶𝑁 (𝑥, 𝑦) = |Γ(𝑥) ∩ Γ(𝑦)| 
Jaccard’s 
coefficient (JC) 𝐽𝐶 (𝑥, 𝑦) =
|Γ(𝑥) ∩ Γ(𝑦)|
|Γ(𝑥) ∪ Γ(𝑦)|
 
Adamic/Adar 
(AA) 
𝐴𝐴 (𝑥, 𝑦) = ∑ 1
𝑧 ∈ Γ(𝑥)∩Γ(𝑦)
/ log |Γ(𝑧)| 
Preferential 
attachment (PA) 
𝑃𝐴 (𝑥, 𝑦) = |𝛤(𝑥) ∗ 𝛤(𝑦)| 
PageRank (named 
as Rooted 
PageRank in [28]) 
Similarity score between 𝑥 and 𝑦 
is measured as the stationary 
distribution of 𝑦 under the 
following random walk: 
 With probability 𝛽, return 
to 𝑥. 
 With probability 1 − 𝛽, 
move to a random 
neighbor. 
𝑥 and 𝑦 denote two given nodes in the social network. 
𝛤(𝑥), 𝛤(𝑦) represent the set of neighbors of 𝑥 and 𝑦 
respectively. 
 
For evaluating the methods used in this study, we 
employed a threshold curve metric: AUCPR (Area under 
the Precision Recall Curve) and two fixed threshold 
metrics: Precision and Recall. Precision is the probability 
that a randomly selected positive prediction by the 
classifier is correct. Recall is the probability that a 
randomly selected positive instance is detected by the 
classifier. A Precision-Recall (PR) curve plots precision 
vs. recall. AUCPR is thought to give a more reliable 
informative view of an algorithm's performance in 
comparison with the other common performance 
evaluation measures especially for the link prediction 
Tie Persistence in Academic Social Networks  Informatica 40 (2016) 353–364 359 
 
task [29].  This is mainly related to its fairness and 
efficiency in overcoming the class imbalance, which is 
not much present in the tie persistence prediction 
problem but very frequent in the link prediction problem. 
A high area under the curve characterizes both high 
recall and high precision. 
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃/(𝑇𝑃 + 𝐹𝑃) 
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃/(𝑇𝑃 + 𝐹𝑁) 
5 Results and discussion 
Performance results measured in Precision, Recall, and 
AUCPR for all baseline methods and link persistence 
prediction method (LPP) are presented in Tables 3, 4, 
and 5. The values in bold face indicate the best overall 
prediction performance for the corresponding dataset. 
It is evident that Precision, Recall, and AUCPR agree 
about the best method and show in general the same 
performance trend. Interestingly, we note well 
performance of link persistence prediction method 
(LPP): about 57% of persisting ties are correctly 
classified by the model (recall) and about 40%  of  the 
ties  that  the  model  predicts  to persist  do  in  fact  
persist.  
The proposed link persistence prediction method reveals 
the best performance in all the three datasets (except for 
its precision value in the Newcomers set) and provides 
significant enhancement over the link prediction baseline 
methods. AUCPR show notable performance of the 
proposed model in comparison with link prediction 
baselines with approximately 8% as relative 
improvement. The gain is remarkable from the 
perspective of Recall in which it reaches 10.5% but 
somewhat minimal from the perspective of Precision 
with only 1% as relative improvement. Apart from our 
proposed method, we note insufficient performance of 
the path-based method PageRank, which is explicable 
due to the fact that the target authors are direct neighbors. 
It is clear that a low distance such as 1-hop distance may 
have a negative impact on the effectiveness of a random 
walk based predictor. In contrast, we note good 
performance of the neighbor-based link prediction 
methods Common Neighbor, Adamic/Adar, Jaccard’s 
coefficient, and Preferential attachment. As well, the 
results show that the performance of the neighbor-based 
methods is comparable in terms of Precision but the 
Adamic/Adar beats the three other metrics in terms of 
Recall and AUCPR.  While the existing works in link 
prediction reported the performance of Adamic/Adar in 
co-authorship networks [28], the results of this predictor 
and the other neighbor-based metrics are consistent also 
with previous studies that signed the positive effect of the 
structural embeddedness and the actors’ status on the tie 
persistence in social networks [2, 11, 14, 26, 32, 35].  
Turning to the academic context of our model, these 
findings may be justified from different angles. An 
important thing to notice is that academic social 
networks are extremely dynamic networks. Rare are the 
nodes or the links that continue to accompany a scientist 
to a long period since the first collaboration even if they 
are academically strong. Moreover, the decay of a strong 
collaboration relation during a given time period 
sometimes can be confused by the tie inactivity. So, a 
possible explanation of our findings may be related to the 
social independence of the author as researcher. A 
scientist in an environment where there is no need to 
sentimental support as in real social networks seeks 
always a scientific support, which can be obtained from 
Table 3: PRECISION performance results. 
 LPP AA CN JC PA PR 
Newcomers 0.4597 0.4429 0.4615 0.4613 0.4535 0.3198 
Juniors 0.3621 0.3494 0.3542 0.3542 0.3504 0.2770 
Seniors 0.3846 0.3601 0.3731 0.3731 0.3644 0.2790 
Avg. Precision 0.4021 0.3841 0.3963 0.3962 0.3894 0.2919 
LPP: Link Persistence Prediction method, AA: Adamic-Adar, CN: Common Neighbor,  
JC: Jaccard’s Coefficient, PA: Preferential Attachment, PR: Page Rank 
 
Table 4: RECALL performance results. 
 LPP AA CN JC PA PR 
Newcomers 0.6161 0.5571 0.5003 0.5002 0.5306 0.3918 
Juniors 0.5704 0.5080 0.4561 0.4561 0.5060 0.3583 
Seniors 0.5424 0.4970 0.4394 0.4394 0.4973 0.3423 
Avg. Recall 0.5763 0.5207 0.4653 0.4653 0.5113 0.3642 
 
Table 5: AUCPR performance results. 
 LPP AA CN JC PA PR 
Newcomers 0.6264 0.5748 0.5557 0.5551 0.5558 0.4071 
Juniors 0.4823 0.4549 0.4281 0.4281 0.4283 0.3533 
Seniors 0.4902 0.4461 0.4254 0.4254 0.4253 0.3546 
Avg. AUCPR 0.5330 0.4920 0.4697 0.4695 0.4698 0.3717 
 
360 Informatica 40 (2016) 353–364 D. Mohdeb et al.  
 
 
different scientific entities or scientists who share with 
him the same ideas or the same research interests. 
Indeed, as he raises, his knowledge and expertise 
increase, his research interests develop, and his 
collaboration network evolves and gets larger. This is the 
reason behind the fact that the Recall in all the tested 
methods is higher than the Precision. All the methods can 
better expect the relevance of a collaboration relation but 
show less performance in expecting its probable 
continuity in the future. 
This is well apparent when observing the impact of the 
author’s professional experience on the persistence of his 
direct collaboration relations. In general, the results show 
that the higher the author’s professional rank the lower 
the persistence prediction accuracy. This is reasonable 
since it is more difficult to expect the collaboration 
strategies of an experienced author for the reason that his 
choices are mainly independent, irregular, and do not 
follow clear trends. On the contrary, a scientist in his 
earlier career deals only with a few number of 
collaborators for a limited number of years. Mostly, his 
collaborators consist in his advisor and some colleagues 
who work with the same advisor. As the expertise of the 
scientist increases, his scientific network expands 
including a growing number of novice, junior, and senior 
researchers providing a large collaboration network with 
a high number of weak ties and a small number of strong 
ties, logically in a future step, many links will decay and 
few links will persist.  
Certainly, the model needs to be refined and improved, 
but we can say that the present findings illustrate that the 
frequency, the contribution of the authors, and the type of 
the publication (that expresses the social-academic 
closeness between the co-authors) play a significant role 
in determining the persistence of a scientific 
collaboration relation. Furthermore, they encourage the 
consideration of the probable collaboration preferences 
that a scientist may pursue during his scientific career 
regarding some academic-related attributes such as 
academic experience and collaborative productivity in 
order to provide an additional gain in the prediction 
performance. 
6 Influence of the model parameters 
Next, we investigate the impact of co-authorship credit 
and social-academic closeness factor on the performance 
of the proposed tie persistent predictor. We depicts in 
Figure 2, 3, and 4 the plots of performance (Precision, 
Recall, and AUCPR) resulted from applying different 
values of parameters 𝑑 (NBA co-authorship credit 
model), and 𝜃 (social-academic closeness factor), and 
from changing the relevance order of publication types 
(social-academic closeness factor). When the effect of a 
parameter is under experimentation, the other parameters 
are assigned with the default values that have been 
described previously in Section 4.2. Table 6 describes the 
overall Precision, Recall, and AUCPR on the complete 
dataset (that combines the three author sets Newcomers, 
Juniors, and Seniors) according to the various values of 
the aforementioned parameters. 
6.1 The parameter 𝒅 
Our results (Figure 2, Table 6) indicate that the 
performance of the proposed method varies inversely 
with 𝑑. The more the co-authorship credits get far from 
equality between the co-authors of the same paper (𝑑 =
 0), the more the prediction accuracy of the proposed 
model gets lower. The only exception is for the Recall 
that peaks the best value when 𝑑 = 0.5 (i.e. the first co-
author has more than 50% as contribution in a given 
paper). It is a difficult conclusion to draw without careful 
investigations that computer scientists in their 
publications share equal credits with their co-authors 
because the first authors in a given publication have 
generally the greater contribution within [24]. Instead, 
we can assume that contributions may take other forms 
beyond the formal way (co-authoring the publication). 
This includes for example informal discussions between 
co-authors, supervision (advising), technical or academic 
assistance…etc. and all other practices that strengthen 
the academic relations between scientists but 
unfortunately, they are difficult to estimate formally. 
6.2 The order of publication types and the 
parameter 𝜽 
 
The three versions of tie persistence predictor (LPP) that 
are relevant to the three orders B-C-J, C-B-J, and C-J-B 
respectively (Figure 3, Table 6), show comparable 
performance results in Precision, Recall and AUCPR. As 
for the B-J-C order (the default order of publication 
types), a lower Recall, a comparable AUCPR, and a 
slightly greater Precision are marked. Two observations 
Table 6: LPP Performance results with different 
values of parameters 𝑑, 𝜃, and different settings of 
publication types order. 
𝒅 
Avg. 
PREC 
Avg. 
REC 
Avg. 
AUCPR 
d = 0 0.4220 0.5465 0.5447 
d=0.25 0.4043 0.5727 0.5385 
d = 0.5 0.4021 0.5763 0.5330 
d=0.75 0.3770 0.5730 0.5055 
d = 1 0.3790 0.4755 0.4813 
Order of Pub Types  
B-C-J 0.4008 0.5814 0.5374 
B-J-C 0.4021 0.5763 0.5330 
C-B-J 0.4012 0.5825 0.5384 
C-J-B 0.3990 0.5809 0.5347 
Theta (𝜃)   
T = 0.1 0.3947 0.5761 0.5273 
T=0.25 0.4021 0.5763 0.5330 
T = 0.3 0.3889 0.5714 0.5191 
T = 0.4  0.3784 0.5638 0.5043 
 
Tie Persistence in Academic Social Networks  Informatica 40 (2016) 353–364 361 
 
are worth noting here. First, the results reveal that 
“books” do not play a relevant role in determining the 
social-academic closeness between computer scientists. 
We think that this is not due to the unimportance of 
books but to the fact that this type of publication is not 
frequent in computer science field [18]. Second, if we 
ignore the “book” publication type, we observe that 
while using the ordinary order (B-J-C) seems yield to 
slightly more precision, the overall performance remains 
nearly comparable to the performance of the other order 
settings where conference papers are considered more 
relevant than journal papers (B-C-J, C-B-J, and C-J-B). 
In the DBLP bibliographic database, journal papers are 
less frequent than conference papers even though they 
have much higher impact [17]. Consequently, the 
outcome that we can assume from these conflicting 
findings is that the high frequency of conference papers 
in computer science collaboration networks reinforces 
positively the role of this type of publication in defining 
the social-academic closeness factor between computer 
scientists.   
The parameter 𝜃 maintains the difference in relevance 
between the three publication types: book, journal paper, 
and conference paper. We tested 𝜃 with the default order 
B-J-C. The results (Figure 4, Table 6) show that the 
greater the difference between relevance values of 
publication types, the lower the prediction accuracy. This 
means that 𝜃 should be an appropriate value, which does 
not underestimate the role of publications, whatever their 
types, in maintaining the academic closeness between 
collaborators. 𝜃 = 0.25 seems to be a suitable value 
since it gives convenient social-academic closeness 
values, which contribute to the well performance of the 
tie persistent predictor. 
 
Figure 2: Performance of LPP with different values of parameter 𝑑. 
 
Figure 3: Performance of LPP with different settings of publication types order. 
 
Figure 4: Performance of LPP with different values of parameter 𝜃. 
362 Informatica 40 (2016) 353–364 D. Mohdeb et al.  
 
 
7 Conclusion 
Studying the dynamics of a tie is a crucial step to 
comprehend the structure and the evolution overtime of 
social networks. In an academic social network, this is 
related to a number of factors that must be examined in 
order to better fix their actual effects on maintaining the 
connectivity of the network. We modeled a tie 
persistence prediction approach basing on estimating the 
tie strength using three factors: the frequency of 
collaborations, the social-academic closeness, and the 
scientific contributions of the scientists; and taking into 
account two other scientists’ academic-related attributes: 
the collaborative productivity and the professional rank. 
Experimenting the model, we found significant impact of 
the aforementioned factors on the persistence or the 
dissolution of collaboration relations between scientists 
in academic social networks. Our findings also reported 
that a strong collaboration relation does not always 
persist due to other academic reckonings that are not easy 
to expect mostly for experienced scientists. It would be 
interesting then to develop useful techniques that have 
the ability to catch such unexpected collaboration 
choices. There is much room for improvement, 
particularly designing better metrics to estimate the 
contribution of the author in the relation and the social-
academic closeness between the co-authors. As well, 
using the academic age to infer approximately the 
author’s professional rank is a limited method. It would 
be better to develop efficient schemes that may provide 
realistic information about scientists’ status in academic 
social networks. Further research might also investigate 
the impact of other academic-related attributes and other 
tie strength indicators, which have not been invested in 
this paper such as trust, reciprocity, and breadth of 
topics. Finally, applying the proposed model on larger 
academic networks and on academic networks from other 
academic fields may improve the performance of the tie 
persistence predictor and provide much understanding of 
collaboration trends in these disciplines. 
References 
[1]  Adamic, L. A., & Adar, E. (2003). Friends and 
neighbors on the Web. Social Networks, 25(3), 
211–230.  
[2]  Akoglu, L., & Dalvi, B. (2010). Structure, tie 
persistence and event detection in large phone and 
SMS networks. In Proceedings of the Eighth 
Workshop on Mining and Learning with Graphs - 
MLG ’10 (pp. 10–17).  
[3]  Aron, A., Aron, E., & Smollan, D. (1992). Inclusion 
of other in the self scale and the structure of 
interpersonal closeness. Journal of Personality and 
Social Psychology, 63, 596–612. 
[4]  Backstrom, L., & Leskovec, J. (2010). Supervised 
random walks: Predicting and recommending links 
in social networks. Proceedings of the Fourth ACM 
International Conference on Web Search and Data 
Mining, 635–644.  
[5]  Bahr, A. H., & Zemon, M. (2000). Collaborative 
authorship in the journal literature: Perspectives for 
academic librarians who wish to publish. College & 
Research Libraries, 61(5), 410–419.  
[6]  Barabãsi, A. L. et al. (2002). Evolution of the social 
network of scientific collaborations. Physica A 311, 
590–614. 
[7]  Berscheid, E., Snyder, M., & Omoto, A. M. (1989). 
The relationship closeness inventory: Assessing the 
closeness of interpersonal relationships. Journal of 
Personality and Social Psychology, 57, 792–807.  
[8]  Blumstein, P., & Kollock, P. (1988). Personal 
Relationships. Annual Review of Sociology, 14, 
467–490.  
[9]  Bozeman, B., & Corley, E. (2004). Scientists’ 
collaboration strategies: Implications for scientific 
and technical human capital. Research Policy, 
33(4), 599–616.  
[10]  Brin, S., & Page, L. (1998). The anatomy of a large-
scale hypertextual Web search engine. Computer 
Networks and ISDN Systems, 30(1/7), 107–17.  
[11]  Burt, R. S. (2000). Decay functions. Social 
Networks, 22, 1–28.  
[12]  Costas, R., & Bordons, M. (2011). Do age and 
professional rank influence the order of authorship 
in scientific publications? Some evidence from a 
micro-level perspective. Scientometrics, 88(1), 
145–161.  
[13]  Dibble, J. L., Levine, T. R., & Park, H. S. (2012). 
The Unidimensional Relationship Closeness Scale 
(URCS): Reliability and validity evidence for a new 
measure of relationship closeness. Psychological 
Assessment, 24(3), 565–572.  
[14]  Feld, S. L. (1997). Structural embeddedness and 
stability of interpersonal relations. Social Networks. 
19, 91-95. 
[15]  Fonseca, L., Velloso, S., Wofchuk, S., & De Meis, 
L. (1998). The relationship between advisors and 
students. Scientometrics, 41(3), 299–312. 
[16]  Fouss, F., Pirotte, A., Renders, J. M., & Saerens, M. 
(2007). Random-walk computation of similarities 
between nodes of a graph with application to 
collaborative recommendation. IEEE Transactions 
on Knowledge and Data Engineering, 19(3), 355–
369.  
[17]  Franceschet M. (2010). The role of conference 
publications in computer science: a bibliometric 
view. Communications of the ACM, 53(12), 129-
132. 
[18]  Franceschet, M. (2011). Collaboration in computer 
science: A network science approach. Journal of the 
American Society for Information Science and 
Technology, 62(10), 1992–2012.  
[19]  Girvan, M. & Newman, M. E. J. (2002). 
Community structure in social and biological 
networks. Proceedings of the National Academy of 
Sciences of the United States of America, 99(12), 
7821–7826.  
[20]  Hasan, M. Al, Chaoji, V., Salem, S., & Zaki, M. 
(2006). Link prediction using supervised learning. 
Tie Persistence in Academic Social Networks  Informatica 40 (2016) 353–364 363 
 
SDM’06: Workshop on Link analysis, Counter 
terrorism and Security.  
[21]  Hasan, M. Al, & Zaki, M. J. (2011). A Survey in 
link prediction in social networks. In Social 
Network Data Analytics (pp. 243–275).  
[22]  Hidalgo, C. A., & Rodriguez-Sickert, C. (2008). 
The dynamics of a mobile phone network. Physica 
A: Statistical Mechanics and Its Applications, 
387(12), 3017–3024.  
[23]  Katz, J. S., & Martin, B. R. (1997). What is 
research collaboration?. Research Policy, 26(1), 1–
18.  
[24]  Kim, J., & Diesner, J. (2014). A network-based 
approach to coauthorship credit allocation. 
Scientometrics, 1–16.  
[25]  Kim, J., & Diesner, J. (2015). The effect of data 
pre-processing on understanding the evolution of 
collaboration networks. Journal of Informetrics, 
9(1), 226-236.  
[26]  Kivran-Swaine, F., Govindan, P., & Naaman, M. 
(2011). The impact of network structure on 
breaking ties in online social networks: 
Unfollowing on Twitter. Proceedings of the 
SIGCHI Conference on Human Factors in 
Computing Systems, 1101–1104.  
[27]  Ley, M. (2009). DBLP: some lessons learned. 
Proceedings of VLDB Endow., 2(2), 1493-1500.   
[28]  Liben-Nowell, D., & Kleinberg, J. (2003). The link 
prediction problem for social networks. 
Proceedings of the Twelfth Annual ACM 
International Conference on Information and 
Knowledge Management (CIKM), 556–559.  
[29]  Lichtenwalter, R., & Chawla, N. V. (2012). Link 
prediction: Fair and effective evaluation. 
Proceedings of the 2012 IEEE/ACM International 
Conference on Advances in Social Networks 
Analysis and Mining, ASONAM 2012, 376–383.  
[30]  Long, J. S., & McGinnis, R. (1985). The effects of 
the mentor on the academic career. Scientometrics, 
7(3-6), 255–280.  
[31]  Mali, F., Kronegger, L., Doreian, P., & Ferligoj, A. 
(2012). Dynamic scientific co-authorship networks. 
Understanding Complex Systems, 195–232.  
[32]  Martin, J. L., & Yeung, K. T. (2006). Persistence of 
close personal ties over a 12-year period. Social 
Networks, 28(4), 331–362.  
[33]  Newman, M. E. J. (2001). Scientific collaboration 
networks: I. Network construction and fundamental 
results. Physical Review E, 64(1), 1–8.  
[34]  Perlman, D., & Fehr, B. (1987). The development 
of intimate relationships. In Intimate Relationships 
Development and Deterioration (pp. 13–42). 
[35]  Raeder, T., Lizardo, O., Hachen, D., & Chawla, N. 
V. (2011). Predictors of short-term decay of cell 
phone contacts in a large-scale communication 
network. Social Networks, 33(4), 245–257.  
[36]  Salton, G., & McGill, M. J. (1983). Introduction to 
modern information retrieval. Introduction to 
Modern Information Retrieval. 
[37]  Velden, T., Haque, A. U., & Lagoze, C. (2010). A 
new approach to analyzing patterns of collaboration 
in co-authorship networks: Mesoscopic analysis and 
interpretation. Scientometrics, 85(1), 219–242.  
[38]  Wellman, B., Wong, R. Y., Tindall, D., & Nazer, N. 
(1997). A decade of network change: Turnover, 
persistence and stability in personal communities. 
Social Networks, 19, 27–50.